Skip to content

Instantly share code, notes, and snippets.

@OmerFarukOruc
Created February 19, 2026 15:14
Show Gist options
  • Select an option

  • Save OmerFarukOruc/ed16caf2c6fa768c5970d38d3889ba28 to your computer and use it in GitHub Desktop.

Select an option

Save OmerFarukOruc/ed16caf2c6fa768c5970d38d3889ba28 to your computer and use it in GitHub Desktop.
QMD Query Expansion: Fine-tune LiquidAI LFM2-1.2B on free Google Colab T4 (~2.5h)
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# QMD Query Expansion: Fine-tune LiquidAI LFM2-1.2B\n",
"\n",
"Fine-tunes LFM2-1.2B on qmd's query expansion dataset to produce structured `lex:/vec:/hyde:` output.\n",
"Then converts to GGUF (Q8_0) for local inference.\n",
"\n",
"**Runtime**: Set to **T4 GPU** (Runtime β†’ Change runtime type β†’ T4)\n",
"\n",
"**Time**: ~20-30 min on T4, ~5 min on A100"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## βš™οΈ Config β€” Set your HuggingFace username here"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"SFT adapter β†’ OrcsRise/qmd-query-expansion-lfm2-sft\n",
"GGUF repo β†’ OrcsRise/qmd-query-expansion-lfm2-gguf\n"
]
}
],
"source": [
"# ============================================================\n",
"# πŸ”§ CHANGE THIS to your HuggingFace username\n",
"# ============================================================\n",
"HF_USERNAME = \"OrcsRise\" # <-- change this!\n",
"# ============================================================\n",
"\n",
"BASE_MODEL = \"LiquidAI/LFM2-1.2B\"\n",
"DATASET = \"tobil/qmd-query-expansion-train\"\n",
"OUTPUT_SFT = f\"{HF_USERNAME}/qmd-query-expansion-lfm2-sft\"\n",
"OUTPUT_GGUF_REPO = f\"{HF_USERNAME}/qmd-query-expansion-lfm2-gguf\"\n",
"MODEL_NAME = \"qmd-query-expansion-lfm2\"\n",
"\n",
"assert len(HF_USERNAME) > 0 and ' ' not in HF_USERNAME, \"πŸ‘† Set HF_USERNAME to your HuggingFace username!\"\n",
"print(f\"SFT adapter β†’ {OUTPUT_SFT}\")\n",
"print(f\"GGUF repo β†’ {OUTPUT_GGUF_REPO}\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n"
]
}
],
"source": [
"from google.colab import drive\n",
"drive.mount('/content/drive')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Install dependencies"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"!pip install -q torch trl>=0.12.0 peft>=0.7.0 \"transformers>=4.55.0\" \\\n",
" accelerate>=0.24.0 huggingface_hub>=0.20.0 datasets bitsandbytes \\\n",
" sentencepiece protobuf numpy gguf"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Login to HuggingFace"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"from huggingface_hub import login, notebook_login\n",
"import os\n",
"\n",
"# Option A: Use Colab secrets (Settings β†’ Secrets β†’ add HF_TOKEN)\n",
"try:\n",
" from google.colab import userdata\n",
" hf_token = userdata.get('HF_TOKEN')\n",
" login(token=hf_token)\n",
" print(\"βœ… Logged in via Colab secret\")\n",
"except Exception:\n",
" # Option B: Interactive login\n",
" notebook_login()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Check GPU"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"CUDA available: True\n",
"GPU: Tesla T4\n",
"VRAM: 14.6 GB\n"
]
}
],
"source": [
"import torch\n",
"print(f\"CUDA available: {torch.cuda.is_available()}\")\n",
"if torch.cuda.is_available():\n",
" print(f\"GPU: {torch.cuda.get_device_name(0)}\")\n",
" print(f\"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB\")\n",
"else:\n",
" raise RuntimeError(\"No GPU! Go to Runtime β†’ Change runtime type β†’ T4 GPU\")"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n"
]
}
],
"source": [
"from google.colab import drive\n",
"drive.mount('/content/drive')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Load dataset"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Loading dataset: tobil/qmd-query-expansion-train...\n",
"Dataset loaded: 5157 examples\n",
" Train: 4641, Eval: 516\n",
"\n",
"--- Example ---\n",
"User: Expand this search query:\n",
"\n",
"buy refurbished laptops\n",
"Assistant: lex: where to find\n",
"lex: purchase options for\n",
"vec: where to find refurbished laptops for sale?\n",
"vec: purchase options for refurbished laptops\n",
"hyde: The topic of buy refurbished laptops covers where to find refurbished laptops for sale?. Proper implementation follows established patterns and best pract\n"
]
}
],
"source": [
"from datasets import load_dataset\n",
"\n",
"print(f\"Loading dataset: {DATASET}...\")\n",
"dataset = load_dataset(DATASET, split=\"train\")\n",
"print(f\"Dataset loaded: {len(dataset)} examples\")\n",
"\n",
"split = dataset.train_test_split(test_size=0.1, seed=42)\n",
"train_dataset = split[\"train\"]\n",
"eval_dataset = split[\"test\"]\n",
"print(f\" Train: {len(train_dataset)}, Eval: {len(eval_dataset)}\")\n",
"\n",
"# Preview an example\n",
"print(\"\\n--- Example ---\")\n",
"ex = train_dataset[0][\"messages\"]\n",
"print(f\"User: {ex[0]['content'][:200]}\")\n",
"print(f\"Assistant: {ex[1]['content'][:300]}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Configure & train (SFT with LoRA)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Initializing SFT trainer...\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "d90cd3d13cf743f7a59eb55001e70d15",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Loading weights: 0%| | 0/148 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Trainable params: 11,108,352\n",
"Total params: 1,181,448,960\n",
"\n",
"πŸš€ Starting SFT training...\n"
]
},
{
"data": {
"text/html": [
"\n",
" <div>\n",
" \n",
" <progress value='1455' max='1455' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
" [1455/1455 2:44:40, Epoch 5/5]\n",
" </div>\n",
" <table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: left;\">\n",
" <th>Step</th>\n",
" <th>Training Loss</th>\n",
" <th>Validation Loss</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <td>200</td>\n",
" <td>0.528098</td>\n",
" <td>0.544523</td>\n",
" </tr>\n",
" <tr>\n",
" <td>400</td>\n",
" <td>0.483926</td>\n",
" <td>0.520171</td>\n",
" </tr>\n",
" <tr>\n",
" <td>600</td>\n",
" <td>0.383667</td>\n",
" <td>0.520883</td>\n",
" </tr>\n",
" <tr>\n",
" <td>800</td>\n",
" <td>0.386784</td>\n",
" <td>0.522161</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1000</td>\n",
" <td>0.306549</td>\n",
" <td>0.561082</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1200</td>\n",
" <td>0.250284</td>\n",
" <td>0.598272</td>\n",
" </tr>\n",
" <tr>\n",
" <td>1400</td>\n",
" <td>0.243569</td>\n",
" <td>0.605305</td>\n",
" </tr>\n",
" </tbody>\n",
"</table><p>"
],
"text/plain": [
"<IPython.core.display.HTML object>"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Pushing LoRA adapter to Hub...\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "923a43a9b5cb40e396a81ef04ef4e4ad",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Processing Files (0 / 0) : | | 0.00B / 0.00B "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "59f6753f776d44c28d78e00f01739f47",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"New Data Upload : | | 0.00B / 0.00B "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "70b34b2ad06643d49abca66912aa9b5b",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" ...on-lfm2/training_args.bin: 100%|##########| 5.65kB / 5.65kB "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "dd56256481804bd3a32d5592002355ce",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" ...adapter_model.safetensors: 75%|#######5 | 33.5MB / 44.5MB "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"No files have been modified since last commit. Skipping to prevent empty commit.\n",
"WARNING:huggingface_hub.hf_api:No files have been modified since last commit. Skipping to prevent empty commit.\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"βœ… SFT adapter: https://huggingface.co/OrcsRise/qmd-query-expansion-lfm2-sft\n"
]
}
],
"source": [
"from peft import LoraConfig\n",
"from transformers import AutoTokenizer\n",
"from trl import SFTTrainer, SFTConfig\n",
"\n",
"# SFT training config\n",
"config = SFTConfig(\n",
" output_dir=MODEL_NAME,\n",
" push_to_hub=True,\n",
" hub_model_id=OUTPUT_SFT,\n",
" hub_strategy=\"every_save\",\n",
"\n",
" # Training hyperparams\n",
" num_train_epochs=5,\n",
" per_device_train_batch_size=4,\n",
" gradient_accumulation_steps=4, # effective batch = 16\n",
" learning_rate=2e-4,\n",
" max_length=512,\n",
"\n",
" # Logging & saving\n",
" logging_steps=10,\n",
" save_strategy=\"steps\",\n",
" save_steps=200,\n",
" save_total_limit=2,\n",
" eval_strategy=\"steps\",\n",
" eval_steps=200,\n",
"\n",
" # Schedule\n",
" warmup_ratio=0.03,\n",
" lr_scheduler_type=\"cosine\",\n",
" bf16=True,\n",
" report_to=\"none\",\n",
")\n",
"\n",
"# LoRA config β€” LFM2 hybrid architecture targets\n",
"# Different from standard transformers:\n",
"# Attention: q_proj, k_proj, v_proj, out_proj\n",
"# Input projection: in_proj\n",
"# FFN/MLP (SwiGLU): w1, w2, w3\n",
"peft_config = LoraConfig(\n",
" r=16,\n",
" lora_alpha=32,\n",
" lora_dropout=0.0,\n",
" bias=\"none\",\n",
" task_type=\"CAUSAL_LM\",\n",
" target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"out_proj\", \"in_proj\", \"w1\", \"w2\", \"w3\"],\n",
")\n",
"\n",
"print(\"Initializing SFT trainer...\")\n",
"trainer = SFTTrainer(\n",
" model=BASE_MODEL,\n",
" train_dataset=train_dataset,\n",
" eval_dataset=eval_dataset,\n",
" args=config,\n",
" peft_config=peft_config,\n",
")\n",
"\n",
"print(f\"Trainable params: {sum(p.numel() for p in trainer.model.parameters() if p.requires_grad):,}\")\n",
"print(f\"Total params: {sum(p.numel() for p in trainer.model.parameters()):,}\")\n",
"print(\"\\nπŸš€ Starting SFT training...\")\n",
"trainer.train()\n",
"\n",
"print(\"\\nPushing LoRA adapter to Hub...\")\n",
"trainer.push_to_hub()\n",
"print(f\"βœ… SFT adapter: https://huggingface.co/{OUTPUT_SFT}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Merge LoRA adapter with base model"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"`torch_dtype` is deprecated! Use `dtype` instead!\n"
]
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Loading base model: LiquidAI/LFM2-1.2B...\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "833363fff1304808ba2d230da79fb87f",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Loading weights: 0%| | 0/148 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"Merging SFT adapter from: qmd-query-expansion-lfm2...\n",
"Saving merged model to /tmp/merged_model...\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "ee9e3a4369cc4a158b5d2c7845c541d1",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Writing model shards: 0%| | 0/1 [00:00<?, ?it/s]"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"βœ… Merged model saved\n"
]
}
],
"source": [
"import torch\n",
"from peft import PeftModel\n",
"from transformers import AutoModelForCausalLM, AutoTokenizer\n",
"\n",
"print(f\"Loading base model: {BASE_MODEL}...\")\n",
"base_model = AutoModelForCausalLM.from_pretrained(\n",
" BASE_MODEL, torch_dtype=torch.bfloat16, device_map=\"auto\", trust_remote_code=True,\n",
")\n",
"\n",
"print(f\"Merging SFT adapter from: {MODEL_NAME}...\")\n",
"base_model.config.tie_word_embeddings = False\n",
"model = PeftModel.from_pretrained(base_model, MODEL_NAME, local_files_only=True)\n",
"model = model.merge_and_unload()\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)\n",
"\n",
"# Save merged model\n",
"merged_dir = \"/tmp/merged_model\"\n",
"print(f\"Saving merged model to {merged_dir}...\")\n",
"model.save_pretrained(merged_dir, safe_serialization=True)\n",
"tokenizer.save_pretrained(merged_dir)\n",
"print(\"βœ… Merged model saved\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. Convert to GGUF"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Setting up llama.cpp...\n",
"W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)\n",
"Cloning into '/tmp/llama.cpp'...\n",
"remote: Enumerating objects: 2546, done.\u001b[K\n",
"remote: Counting objects: 100% (2546/2546), done.\u001b[K\n",
"remote: Compressing objects: 100% (2033/2033), done.\u001b[K\n",
"remote: Total 2546 (delta 514), reused 1659 (delta 442), pack-reused 0 (from 0)\u001b[K\n",
"Receiving objects: 100% (2546/2546), 27.54 MiB | 18.97 MiB/s, done.\n",
"Resolving deltas: 100% (514/514), done.\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m61.0/61.0 kB\u001b[0m \u001b[31m3.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m44.0/44.0 kB\u001b[0m \u001b[31m4.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m82.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m12.7/12.7 MB\u001b[0m \u001b[31m122.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\n",
"\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m18.0/18.0 MB\u001b[0m \u001b[31m93.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m00:01\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m12.0/12.0 MB\u001b[0m \u001b[31m125.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m0:01\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m294.9/294.9 kB\u001b[0m \u001b[31m32.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m178.6/178.6 MB\u001b[0m \u001b[31m6.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m00:01\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m6.2/6.2 MB\u001b[0m \u001b[31m146.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m343.6/343.6 kB\u001b[0m \u001b[31m39.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m566.4/566.4 kB\u001b[0m \u001b[31m50.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m75.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m54.5/54.5 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m45.3/45.3 kB\u001b[0m \u001b[31m5.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m98.2/98.2 kB\u001b[0m \u001b[31m13.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25h Building wheel for wget (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
"\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
"google-colab 1.0.0 requires pandas==2.2.2, but you have pandas 2.2.3 which is incompatible.\n",
"opencv-python-headless 4.13.0.92 requires numpy>=2; python_version >= \"3.9\", but you have numpy 1.26.4 which is incompatible.\n",
"jaxlib 0.7.2 requires numpy>=2.0, but you have numpy 1.26.4 which is incompatible.\n",
"typer-slim 0.23.1 requires typer>=0.23.1, but you have typer 0.15.4 which is incompatible.\n",
"shap 0.50.0 requires numpy>=2, but you have numpy 1.26.4 which is incompatible.\n",
"tobler 0.13.0 requires numpy>=2.0, but you have numpy 1.26.4 which is incompatible.\n",
"pytensor 2.37.0 requires numpy>=2.0, but you have numpy 1.26.4 which is incompatible.\n",
"grpcio-status 1.71.2 requires protobuf<6.0dev,>=5.26.1, but you have protobuf 4.25.8 which is incompatible.\n",
"opentelemetry-proto 1.38.0 requires protobuf<7.0,>=5.0, but you have protobuf 4.25.8 which is incompatible.\n",
"jax 0.7.2 requires numpy>=2.0, but you have numpy 1.26.4 which is incompatible.\n",
"ydf 0.15.0 requires protobuf<7.0.0,>=5.29.1, but you have protobuf 4.25.8 which is incompatible.\n",
"opencv-contrib-python 4.13.0.92 requires numpy>=2; python_version >= \"3.9\", but you have numpy 1.26.4 which is incompatible.\n",
"torchaudio 2.9.0+cu128 requires torch==2.9.0, but you have torch 2.6.0+cpu which is incompatible.\n",
"torchvision 0.24.0+cu128 requires torch==2.9.0, but you have torch 2.6.0+cpu which is incompatible.\n",
"opencv-python 4.13.0.92 requires numpy>=2; python_version >= \"3.9\", but you have numpy 1.26.4 which is incompatible.\n",
"rasterio 1.5.0 requires numpy>=2, but you have numpy 1.26.4 which is incompatible.\n",
"grain 0.2.15 requires protobuf>=5.28.3, but you have protobuf 4.25.8 which is incompatible.\u001b[0m\u001b[31m\n",
"\u001b[0m\n",
"Building llama-quantize...\n",
"\n",
"Converting to FP16 GGUF...\n",
"INFO:hf-to-gguf:Loading model: merged_model\n",
"INFO:hf-to-gguf:Model architecture: Lfm2ForCausalLM\n",
"INFO:hf-to-gguf:gguf: indexing model part 'model.safetensors'\n",
"INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only\n",
"INFO:hf-to-gguf:Exporting model...\n",
"INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {2048, 65536}\n",
"INFO:hf-to-gguf:token_embd_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.0.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.0.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.0.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.1.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.1.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.1.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.10.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.10.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.10.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.10.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.10.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.10.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.11.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.11.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.11.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.12.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.12.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.12.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.12.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.12.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.12.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.13.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.13.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.13.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.14.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.14.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.14.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.14.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.14.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.14.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.14.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.15.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.15.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.15.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.15.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.15.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.15.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.2.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.2.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.3.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.3.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.3.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.4.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.4.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.4.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.5.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.5.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.6.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.6.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.6.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.7.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.7.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.7.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.8.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.8.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.8.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.8.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.8.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.8.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.8.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.8.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.8.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.9.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.9.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.9.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.9.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.9.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.9.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:Set meta model\n",
"INFO:hf-to-gguf:Set model parameters\n",
"INFO:hf-to-gguf:gguf: context length = 128000\n",
"INFO:hf-to-gguf:gguf: embedding length = 2048\n",
"INFO:hf-to-gguf:gguf: feed forward length = 12288\n",
"INFO:hf-to-gguf:gguf: head count = 32\n",
"INFO:hf-to-gguf:gguf: key-value head count = [0, 0, 8, 0, 0, 8, 0, 0, 8, 0, 8, 0, 8, 0, 8, 0]\n",
"WARNING:hf-to-gguf:Unknown RoPE type: default\n",
"INFO:hf-to-gguf:gguf: rope scaling type = NONE\n",
"INFO:hf-to-gguf:gguf: rope theta = 1000000.0\n",
"INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05\n",
"INFO:hf-to-gguf:gguf: file type = 1\n",
"WARNING:gguf.gguf_writer:Duplicated key name 'lfm2.attention.layer_norm_rms_epsilon', overwriting it with new value 1e-05 of type FLOAT32\n",
"WARNING:gguf.gguf_writer:Duplicated key name 'lfm2.feed_forward_length', overwriting it with new value 8192 of type UINT32\n",
"INFO:hf-to-gguf:Set model quantization version\n",
"INFO:hf-to-gguf:Set model tokenizer\n",
"INFO:numexpr.utils:NumExpr defaulting to 2 threads.\n",
"Traceback (most recent call last):\n",
" File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 12012, in <module>\n",
" main()\n",
" File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 12006, in main\n",
" model_instance.write()\n",
" File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 689, in write\n",
" self.prepare_metadata(vocab_only=False)\n",
" File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 830, in prepare_metadata\n",
" self.set_vocab()\n",
" File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 802, in set_vocab\n",
" self._set_vocab_gpt2()\n",
" File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 1303, in _set_vocab_gpt2\n",
" tokens, toktypes, tokpre = self.get_vocab_base()\n",
" ^^^^^^^^^^^^^^^^^^^^^\n",
" File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 978, in get_vocab_base\n",
" tokenizer = AutoTokenizer.from_pretrained(self.dir_model)\n",
" ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
" File \"/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py\", line 1153, in from_pretrained\n",
" raise ValueError(\n",
"ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported.\n"
]
},
{
"ename": "FileNotFoundError",
"evalue": "[Errno 2] No such file or directory: '/tmp/gguf_output/qmd-query-expansion-lfm2-f16.gguf'",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m/tmp/ipython-input-3038749745.py\u001b[0m in \u001b[0;36m<cell line: 0>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 22\u001b[0m \u001b[0mget_ipython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msystem\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'python /tmp/llama.cpp/convert_hf_to_gguf.py /tmp/merged_model --outfile {fp16_file} --outtype f16'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 23\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 24\u001b[0;31m \u001b[0msize_mb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgetsize\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfp16_file\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;36m1024\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0;36m1024\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 25\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\" FP16: {size_mb:.1f} MB\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 26\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;32m/usr/lib/python3.12/genericpath.py\u001b[0m in \u001b[0;36mgetsize\u001b[0;34m(filename)\u001b[0m\n",
"\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: '/tmp/gguf_output/qmd-query-expansion-lfm2-f16.gguf'"
]
}
],
"source": [
"import subprocess, sys, os\n",
"\n",
"# Setup llama.cpp\n",
"print(\"Setting up llama.cpp...\")\n",
"!apt-get update -qq && apt-get install -y -qq build-essential cmake git > /dev/null 2>&1\n",
"\n",
"if not os.path.exists(\"/tmp/llama.cpp\"):\n",
" !git clone --depth 1 https://github.com/ggerganov/llama.cpp.git /tmp/llama.cpp\n",
"!pip install -q -r /tmp/llama.cpp/requirements.txt\n",
"\n",
"# Build quantize tool\n",
"print(\"\\nBuilding llama-quantize...\")\n",
"!cmake -B /tmp/llama.cpp/build -S /tmp/llama.cpp -DGGML_CUDA=OFF > /dev/null 2>&1\n",
"!cmake --build /tmp/llama.cpp/build --target llama-quantize -j 4 > /dev/null 2>&1\n",
"\n",
"# Convert to FP16 GGUF\n",
"gguf_dir = \"/tmp/gguf_output\"\n",
"os.makedirs(gguf_dir, exist_ok=True)\n",
"fp16_file = f\"{gguf_dir}/{MODEL_NAME}-f16.gguf\"\n",
"\n",
"print(\"\\nConverting to FP16 GGUF...\")\n",
"!python /tmp/llama.cpp/convert_hf_to_gguf.py /tmp/merged_model --outfile {fp16_file} --outtype f16\n",
"\n",
"size_mb = os.path.getsize(fp16_file) / (1024 * 1024)\n",
"print(f\" FP16: {size_mb:.1f} MB\")\n",
"\n",
"# Quantize to Q4_K_M, Q5_K_M, Q8_0\n",
"quantize_bin = \"/tmp/llama.cpp/build/bin/llama-quantize\"\n",
"print(\"\\nQuantizing...\")\n",
"quantized = []\n",
"for qtype in [\"Q4_K_M\", \"Q5_K_M\", \"Q8_0\"]:\n",
" out = f\"{gguf_dir}/{MODEL_NAME}-{qtype.lower()}.gguf\"\n",
" result = subprocess.run([quantize_bin, fp16_file, out, qtype], capture_output=True, text=True)\n",
" if os.path.exists(out):\n",
" qsize = os.path.getsize(out) / (1024 * 1024)\n",
" print(f\" βœ… {qtype}: {qsize:.1f} MB\")\n",
" quantized.append((out, qtype))\n",
" else:\n",
" print(f\" ❌ {qtype} failed\")\n",
"\n",
"# Cleanup FP16 to save disk\n",
"os.remove(fp16_file)\n",
"print(f\"\\nπŸŽ‰ GGUF files ready in {gguf_dir}\")"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m10.4/10.4 MB\u001b[0m \u001b[31m87.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m0:01\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m553.3/553.3 kB\u001b[0m \u001b[31m48.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m56.4/56.4 kB\u001b[0m \u001b[31m6.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[2K \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m108.3/108.3 kB\u001b[0m \u001b[31m13.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
"\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
"rasterio 1.5.0 requires numpy>=2, but you have numpy 1.26.4 which is incompatible.\u001b[0m\u001b[31m\n",
"\u001b[0mConverting to FP16 GGUF...\n",
"INFO:hf-to-gguf:Loading model: merged_model\n",
"INFO:hf-to-gguf:Model architecture: Lfm2ForCausalLM\n",
"INFO:hf-to-gguf:gguf: indexing model part 'model.safetensors'\n",
"INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only\n",
"INFO:hf-to-gguf:Exporting model...\n",
"INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {2048, 65536}\n",
"INFO:hf-to-gguf:token_embd_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.0.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.0.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.0.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.1.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.1.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.1.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.10.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.10.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.10.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.10.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.10.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.10.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.11.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.11.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.11.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.12.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.12.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.12.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.12.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.12.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.12.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.13.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.13.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.13.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.14.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.14.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.14.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.14.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.14.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.14.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.14.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.15.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.15.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.15.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.15.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.15.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.15.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.2.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.2.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.3.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.3.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.3.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.4.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.4.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.4.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.5.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.5.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.6.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.6.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.6.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.7.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.7.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.7.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.8.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.8.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.8.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.8.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.8.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.8.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.8.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n",
"INFO:hf-to-gguf:blk.8.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.8.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n",
"INFO:hf-to-gguf:blk.9.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n",
"INFO:hf-to-gguf:blk.9.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n",
"INFO:hf-to-gguf:blk.9.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
"INFO:hf-to-gguf:blk.9.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.9.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n",
"INFO:hf-to-gguf:blk.9.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n",
"INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n",
"INFO:hf-to-gguf:Set meta model\n",
"INFO:hf-to-gguf:Set model parameters\n",
"INFO:hf-to-gguf:gguf: context length = 128000\n",
"INFO:hf-to-gguf:gguf: embedding length = 2048\n",
"INFO:hf-to-gguf:gguf: feed forward length = 12288\n",
"INFO:hf-to-gguf:gguf: head count = 32\n",
"INFO:hf-to-gguf:gguf: key-value head count = [0, 0, 8, 0, 0, 8, 0, 0, 8, 0, 8, 0, 8, 0, 8, 0]\n",
"WARNING:hf-to-gguf:Unknown RoPE type: default\n",
"INFO:hf-to-gguf:gguf: rope scaling type = NONE\n",
"INFO:hf-to-gguf:gguf: rope theta = 1000000.0\n",
"INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05\n",
"INFO:hf-to-gguf:gguf: file type = 1\n",
"WARNING:gguf.gguf_writer:Duplicated key name 'lfm2.attention.layer_norm_rms_epsilon', overwriting it with new value 1e-05 of type FLOAT32\n",
"WARNING:gguf.gguf_writer:Duplicated key name 'lfm2.feed_forward_length', overwriting it with new value 8192 of type UINT32\n",
"INFO:hf-to-gguf:Set model quantization version\n",
"INFO:hf-to-gguf:Set model tokenizer\n",
"INFO:numexpr.utils:NumExpr defaulting to 2 threads.\n",
"WARNING:gguf.vocab:Unknown separator token '<|startoftext|>' in TemplateProcessing<pair>\n",
"INFO:gguf.vocab:Adding 63683 merge(s).\n",
"INFO:gguf.vocab:Setting special token type bos to 1\n",
"INFO:gguf.vocab:Setting special token type eos to 7\n",
"INFO:gguf.vocab:Setting special token type pad to 0\n",
"INFO:gguf.vocab:Setting add_bos_token to True\n",
"INFO:gguf.vocab:Setting add_sep_token to False\n",
"INFO:gguf.vocab:Setting chat_template to {{- bos_token -}}\n",
"{%- set system_prompt = \"\" -%}\n",
"{%- set ns = namespace(system_prompt=\"\") -%}\n",
"{%- if messages[0][\"role\"] == \"system\" -%}\n",
"\t{%- set ns.system_prompt = messages[0][\"content\"] -%}\n",
"\t{%- set messages = messages[1:] -%}\n",
"{%- endif -%}\n",
"{%- if tools -%}\n",
"\t{%- set ns.system_prompt = ns.system_prompt + (\"\\n\" if ns.system_prompt else \"\") + \"List of tools: <|tool_list_start|>[\" -%}\n",
"\t{%- for tool in tools -%}\n",
"\t\t{%- if tool is not string -%}\n",
" {%- set tool = tool | tojson -%}\n",
"\t\t{%- endif -%}\n",
"\t\t{%- set ns.system_prompt = ns.system_prompt + tool -%}\n",
" {%- if not loop.last -%}\n",
" {%- set ns.system_prompt = ns.system_prompt + \", \" -%}\n",
" {%- endif -%}\n",
"\t{%- endfor -%}\n",
"\t{%- set ns.system_prompt = ns.system_prompt + \"]<|tool_list_end|>\" -%}\n",
"{%- endif -%}\n",
"{%- if ns.system_prompt -%}\n",
"\t{{- \"<|im_start|>system\\n\" + ns.system_prompt + \"<|im_end|>\\n\" -}}\n",
"{%- endif -%}\n",
"{%- for message in messages -%}\n",
"\t{{- \"<|im_start|>\" + message[\"role\"] + \"\\n\" -}}\n",
"\t{%- set content = message[\"content\"] -%}\n",
"\t{%- if content is not string -%}\n",
"\t\t{%- set content = content | tojson -%}\n",
"\t{%- endif -%}\n",
"\t{%- if message[\"role\"] == \"tool\" -%}\n",
"\t\t{%- set content = \"<|tool_response_start|>\" + content + \"<|tool_response_end|>\" -%}\n",
"\t{%- endif -%}\n",
"\t{{- content + \"<|im_end|>\\n\" -}}\n",
"{%- endfor -%}\n",
"{%- if add_generation_prompt -%}\n",
"\t{{- \"<|im_start|>assistant\\n\" -}}\n",
"{%- endif -%}\n",
"INFO:gguf.gguf_writer:Writing the following files:\n",
"INFO:gguf.gguf_writer:/tmp/gguf_output/qmd-query-expansion-lfm2-f16.gguf: n_tensors = 148, total_size = 2.3G\n",
"Writing: 100% 2.34G/2.34G [00:25<00:00, 92.3Mbyte/s]\n",
"INFO:hf-to-gguf:Model successfully exported to /tmp/gguf_output/qmd-query-expansion-lfm2-f16.gguf\n",
" FP16: 2234.8 MB\n",
"Quantizing to Q8_0...\n",
"main: build = 1 (abb9f3c)\n",
"main: built with GNU 11.4.0 for Linux x86_64\n",
"main: quantizing '/tmp/gguf_output/qmd-query-expansion-lfm2-f16.gguf' to '/tmp/gguf_output/qmd-query-expansion-lfm2-q8_0.gguf' as Q8_0\n",
"llama_model_loader: loaded meta data with 27 key-value pairs and 148 tensors from /tmp/gguf_output/qmd-query-expansion-lfm2-f16.gguf (version GGUF V3 (latest))\n",
"llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.\n",
"llama_model_loader: - kv 0: general.architecture str = lfm2\n",
"llama_model_loader: - kv 1: general.type str = model\n",
"llama_model_loader: - kv 2: general.name str = Merged_Model\n",
"llama_model_loader: - kv 3: general.size_label str = 1.2B\n",
"llama_model_loader: - kv 4: lfm2.block_count u32 = 16\n",
"llama_model_loader: - kv 5: lfm2.context_length u32 = 128000\n",
"llama_model_loader: - kv 6: lfm2.embedding_length u32 = 2048\n",
"llama_model_loader: - kv 7: lfm2.feed_forward_length u32 = 8192\n",
"llama_model_loader: - kv 8: lfm2.attention.head_count u32 = 32\n",
"llama_model_loader: - kv 9: lfm2.attention.head_count_kv arr[i32,16] = [0, 0, 8, 0, 0, 8, 0, 0, 8, 0, 8, 0, ...\n",
"llama_model_loader: - kv 10: lfm2.rope.freq_base f32 = 1000000.000000\n",
"llama_model_loader: - kv 11: lfm2.attention.layer_norm_rms_epsilon f32 = 0.000010\n",
"llama_model_loader: - kv 12: general.file_type u32 = 1\n",
"llama_model_loader: - kv 13: lfm2.vocab_size u32 = 65536\n",
"llama_model_loader: - kv 14: lfm2.shortconv.l_cache u32 = 3\n",
"llama_model_loader: - kv 15: general.quantization_version u32 = 2\n",
"llama_model_loader: - kv 16: tokenizer.ggml.model str = gpt2\n",
"llama_model_loader: - kv 17: tokenizer.ggml.pre str = lfm2\n",
"llama_model_loader: - kv 18: tokenizer.ggml.tokens arr[str,65536] = [\"<|pad|>\", \"<|startoftext|>\", \"<|end...\n",
"llama_model_loader: - kv 19: tokenizer.ggml.token_type arr[i32,65536] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...\n",
"llama_model_loader: - kv 20: tokenizer.ggml.merges arr[str,63683] = [\"Ċ Ċ\", \"Ċ ĊĊ\", \"ĊĊ Ċ\", \"Ċ �...\n",
"llama_model_loader: - kv 21: tokenizer.ggml.bos_token_id u32 = 1\n",
"llama_model_loader: - kv 22: tokenizer.ggml.eos_token_id u32 = 7\n",
"llama_model_loader: - kv 23: tokenizer.ggml.padding_token_id u32 = 0\n",
"llama_model_loader: - kv 24: tokenizer.ggml.add_bos_token bool = true\n",
"llama_model_loader: - kv 25: tokenizer.ggml.add_sep_token bool = false\n",
"llama_model_loader: - kv 26: tokenizer.chat_template str = {{- bos_token -}}\\n{%- set system_prom...\n",
"llama_model_loader: - type f32: 55 tensors\n",
"llama_model_loader: - type f16: 93 tensors\n",
"[ 1/ 148] token_embd.weight - [ 2048, 65536, 1, 1], type = f16, converting to q8_0 .. size = 256.00 MiB -> 136.00 MiB\n",
"[ 2/ 148] token_embd_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 3/ 148] blk.0.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 4/ 148] blk.0.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 5/ 148] blk.0.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 6/ 148] blk.0.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 7/ 148] blk.0.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 8/ 148] blk.0.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n",
"[ 9/ 148] blk.0.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n",
"[ 10/ 148] blk.0.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 11/ 148] blk.1.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 12/ 148] blk.1.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 13/ 148] blk.1.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 14/ 148] blk.1.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 15/ 148] blk.1.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 16/ 148] blk.1.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n",
"[ 17/ 148] blk.1.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n",
"[ 18/ 148] blk.1.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 19/ 148] blk.2.attn_k.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n",
"[ 20/ 148] blk.2.attn_k_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n",
"[ 21/ 148] blk.2.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 22/ 148] blk.2.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 23/ 148] blk.2.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 24/ 148] blk.2.attn_q_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n",
"[ 25/ 148] blk.2.attn_v.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n",
"[ 26/ 148] blk.2.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 27/ 148] blk.2.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 28/ 148] blk.2.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 29/ 148] blk.2.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 30/ 148] blk.3.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 31/ 148] blk.3.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 32/ 148] blk.3.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 33/ 148] blk.3.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 34/ 148] blk.3.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 35/ 148] blk.3.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n",
"[ 36/ 148] blk.3.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n",
"[ 37/ 148] blk.3.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 38/ 148] blk.4.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 39/ 148] blk.4.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 40/ 148] blk.4.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 41/ 148] blk.4.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 42/ 148] blk.4.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 43/ 148] blk.4.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n",
"[ 44/ 148] blk.4.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n",
"[ 45/ 148] blk.4.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 46/ 148] blk.5.attn_k.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n",
"[ 47/ 148] blk.5.attn_k_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n",
"[ 48/ 148] blk.5.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 49/ 148] blk.5.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 50/ 148] blk.5.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 51/ 148] blk.5.attn_q_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n",
"[ 52/ 148] blk.5.attn_v.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n",
"[ 53/ 148] blk.5.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 54/ 148] blk.5.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 55/ 148] blk.5.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 56/ 148] blk.5.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 57/ 148] blk.6.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 58/ 148] blk.6.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 59/ 148] blk.6.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 60/ 148] blk.6.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 61/ 148] blk.6.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 62/ 148] blk.6.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n",
"[ 63/ 148] blk.6.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n",
"[ 64/ 148] blk.6.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 65/ 148] blk.7.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 66/ 148] blk.7.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 67/ 148] blk.7.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 68/ 148] blk.7.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 69/ 148] blk.7.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 70/ 148] blk.7.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n",
"[ 71/ 148] blk.7.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n",
"[ 72/ 148] blk.7.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 73/ 148] blk.8.attn_k.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n",
"[ 74/ 148] blk.8.attn_k_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n",
"[ 75/ 148] blk.8.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 76/ 148] blk.8.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 77/ 148] blk.8.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 78/ 148] blk.8.attn_q_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n",
"[ 79/ 148] blk.8.attn_v.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n",
"[ 80/ 148] blk.8.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 81/ 148] blk.8.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 82/ 148] blk.8.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 83/ 148] blk.8.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 84/ 148] blk.9.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 85/ 148] blk.9.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 86/ 148] blk.9.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 87/ 148] blk.9.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 88/ 148] blk.9.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 89/ 148] blk.9.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n",
"[ 90/ 148] blk.9.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n",
"[ 91/ 148] blk.9.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 92/ 148] blk.10.attn_k.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n",
"[ 93/ 148] blk.10.attn_k_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n",
"[ 94/ 148] blk.10.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 95/ 148] blk.10.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 96/ 148] blk.10.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 97/ 148] blk.10.attn_q_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n",
"[ 98/ 148] blk.10.attn_v.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n",
"[ 99/ 148] blk.10.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 100/ 148] blk.10.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 101/ 148] blk.10.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 102/ 148] blk.10.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 103/ 148] blk.11.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 104/ 148] blk.11.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 105/ 148] blk.11.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 106/ 148] blk.11.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 107/ 148] blk.11.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 108/ 148] blk.11.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n",
"[ 109/ 148] blk.11.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n",
"[ 110/ 148] blk.11.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 111/ 148] blk.12.attn_k.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n",
"[ 112/ 148] blk.12.attn_k_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n",
"[ 113/ 148] blk.12.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 114/ 148] blk.12.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 115/ 148] blk.12.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 116/ 148] blk.12.attn_q_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n",
"[ 117/ 148] blk.12.attn_v.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n",
"[ 118/ 148] blk.12.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 119/ 148] blk.12.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 120/ 148] blk.12.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 121/ 148] blk.12.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 122/ 148] blk.13.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 123/ 148] blk.13.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 124/ 148] blk.13.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 125/ 148] blk.13.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 126/ 148] blk.13.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 127/ 148] blk.13.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n",
"[ 128/ 148] blk.13.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n",
"[ 129/ 148] blk.13.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 130/ 148] blk.14.attn_k.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n",
"[ 131/ 148] blk.14.attn_k_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n",
"[ 132/ 148] blk.14.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 133/ 148] blk.14.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 134/ 148] blk.14.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"[ 135/ 148] blk.14.attn_q_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n",
"[ 136/ 148] blk.14.attn_v.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n",
"[ 137/ 148] blk.14.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 138/ 148] blk.14.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 139/ 148] blk.14.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 140/ 148] blk.14.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 141/ 148] blk.15.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 142/ 148] blk.15.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 143/ 148] blk.15.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 144/ 148] blk.15.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n",
"[ 145/ 148] blk.15.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n",
"[ 146/ 148] blk.15.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n",
"[ 147/ 148] blk.15.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n",
"[ 148/ 148] blk.15.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n",
"llama_model_quantize_impl: model size = 2232.50 MiB\n",
"llama_model_quantize_impl: quant size = 1186.25 MiB\n",
"\n",
"main: quantize time = 20068.41 ms\n",
"main: total time = 20068.42 ms\n",
" Q8_0: 1188.5 MB\n",
"βœ… GGUF conversion complete!\n"
]
}
],
"source": [
"!pip install -q --upgrade tokenizers transformers\n",
"import os, subprocess\n",
"gguf_dir = \"/tmp/gguf_output\"\n",
"merged_dir = \"/tmp/merged_model\"\n",
"fp16_file = f\"{gguf_dir}/{MODEL_NAME}-f16.gguf\"\n",
"# Retry FP16 conversion\n",
"print(\"Converting to FP16 GGUF...\")\n",
"!python /tmp/llama.cpp/convert_hf_to_gguf.py {merged_dir} --outfile {fp16_file} --outtype f16\n",
"size_mb = os.path.getsize(fp16_file) / (1024 * 1024)\n",
"print(f\" FP16: {size_mb:.1f} MB\")\n",
"# Quantize to Q8_0\n",
"q8_file = f\"{gguf_dir}/{MODEL_NAME}-q8_0.gguf\"\n",
"print(\"Quantizing to Q8_0...\")\n",
"!chmod +x /tmp/llama.cpp/build/bin/llama-quantize\n",
"!/tmp/llama.cpp/build/bin/llama-quantize {fp16_file} {q8_file} q8_0\n",
"size_mb = os.path.getsize(q8_file) / (1024 * 1024)\n",
"print(f\" Q8_0: {size_mb:.1f} MB\")\n",
"print(\"βœ… GGUF conversion complete!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 8. Upload GGUFs to HuggingFace"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Uploading to OrcsRise/qmd-query-expansion-lfm2-gguf...\n",
" Uploading qmd-query-expansion-lfm2-q8_0.gguf (1189 MB)...\n"
]
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "c5b7c5b973454cccae5027074ba75274",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Processing Files (0 / 0) : | | 0.00B / 0.00B "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "406bd64d6a574d59a365fb6893e4fbb0",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"New Data Upload : | | 0.00B / 0.00B "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "18c004aa0a9d4db3ba14e7c4b3e0a7d3",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
" ...-expansion-lfm2-q8_0.gguf: 4%|4 | 50.2MB / 1.25GB "
]
},
"metadata": {},
"output_type": "display_data"
},
{
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"βœ… Uploaded to: https://huggingface.co/OrcsRise/qmd-query-expansion-lfm2-gguf\n",
"\n",
"πŸ“‹ Add to ~/.zshrc:\n",
"export QMD_GEN_MODEL=\"hf:OrcsRise/qmd-query-expansion-lfm2-gguf/qmd-query-expansion-lfm2-q8_0.gguf\"\n"
]
}
],
"source": [
"from huggingface_hub import HfApi\n",
"import os\n",
"api = HfApi()\n",
"# Create repo if needed\n",
"api.create_repo(OUTPUT_GGUF_REPO, exist_ok=True)\n",
"print(f\"Uploading to {OUTPUT_GGUF_REPO}...\")\n",
"# Upload Q8_0\n",
"q8_file = f\"/tmp/gguf_output/{MODEL_NAME}-q8_0.gguf\"\n",
"filename = os.path.basename(q8_file)\n",
"print(f\" Uploading {filename} ({os.path.getsize(q8_file) / 1024**2:.0f} MB)...\")\n",
"api.upload_file(\n",
" path_or_fileobj=q8_file,\n",
" path_in_repo=filename,\n",
" repo_id=OUTPUT_GGUF_REPO,\n",
")\n",
"print(f\"\\nβœ… Uploaded to: https://huggingface.co/{OUTPUT_GGUF_REPO}\")\n",
"print(f\"\\nπŸ“‹ Add to ~/.zshrc:\")\n",
"print(f'export QMD_GEN_MODEL=\"hf:{OUTPUT_GGUF_REPO}/{filename}\"')"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Uploading to OrcsRise/qmd-query-expansion-lfm2-gguf...\n"
]
},
{
"ename": "NameError",
"evalue": "name 'quantized' is not defined",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
"\u001b[0;32m/tmp/ipython-input-4233011047.py\u001b[0m in \u001b[0;36m<cell line: 0>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\"Uploading to {OUTPUT_GGUF_REPO}...\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0;32mfor\u001b[0m \u001b[0mqfile\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mqtype\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mquantized\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 8\u001b[0m \u001b[0mfilename\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbasename\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mqfile\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\" Uploading {filename}...\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
"\u001b[0;31mNameError\u001b[0m: name 'quantized' is not defined"
]
}
],
"source": [
"from huggingface_hub import HfApi\n",
"\n",
"api = HfApi()\n",
"api.create_repo(repo_id=OUTPUT_GGUF_REPO, repo_type=\"model\", exist_ok=True)\n",
"\n",
"print(f\"Uploading to {OUTPUT_GGUF_REPO}...\")\n",
"for qfile, qtype in quantized:\n",
" filename = os.path.basename(qfile)\n",
" print(f\" Uploading {filename}...\")\n",
" api.upload_file(\n",
" path_or_fileobj=qfile,\n",
" path_in_repo=filename,\n",
" repo_id=OUTPUT_GGUF_REPO,\n",
" )\n",
"\n",
"# Upload README\n",
"readme = f\"\"\"---\n",
"base_model: {BASE_MODEL}\n",
"tags: [gguf, llama.cpp, quantized, query-expansion, qmd, lfm2]\n",
"---\n",
"# {MODEL_NAME} (GGUF)\n",
"\n",
"Fine-tuned LiquidAI LFM2-1.2B for QMD query expansion.\n",
"\n",
"## Details\n",
"- **Base:** {BASE_MODEL}\n",
"- **Training:** SFT with LoRA (rank 16) on {DATASET}\n",
"- **Task:** Query expansion producing lex/vec/hyde format\n",
"\n",
"## Usage with qmd\n",
"```bash\n",
"export QMD_GEN_MODEL=\"hf:{OUTPUT_GGUF_REPO}/{MODEL_NAME}-q8_0.gguf\"\n",
"qmd query \"your search\"\n",
"```\n",
"\"\"\"\n",
"api.upload_file(\n",
" path_or_fileobj=readme.encode(),\n",
" path_in_repo=\"README.md\",\n",
" repo_id=OUTPUT_GGUF_REPO,\n",
")\n",
"\n",
"print(f\"\\nπŸŽ‰ Done! https://huggingface.co/{OUTPUT_GGUF_REPO}\")\n",
"print(f\"\\nπŸ“‹ Add this to your ~/.zshrc:\")\n",
"print(f'export QMD_GEN_MODEL=\"hf:{OUTPUT_GGUF_REPO}/{MODEL_NAME}-q8_0.gguf\"')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## βœ… Done!\n",
"\n",
"Copy the export line above and add it to your `~/.zshrc`, then:\n",
"\n",
"```bash\n",
"source ~/.zshrc\n",
"qmd query \"test\"\n",
"```\n",
"\n",
"The fine-tuned LFM2 will produce clean, diverse `lex:/vec:/hyde:` expansions β€” 2x faster than Qwen3."
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "T4",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.12"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment