OmerFarukOruc/qmd_finetune_lfm2.ipynb

## qmd_finetune_lfm2.ipynb
{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# QMD Query Expansion: Fine-tune LiquidAI LFM2-1.2B\n",
        "\n",
        "Fine-tunes LFM2-1.2B on qmd's query expansion dataset to produce structured `lex:/vec:/hyde:` output.\n",
        "Then converts to GGUF (Q8_0) for local inference.\n",
        "\n",
        "**Runtime**: Set to **T4 GPU** (Runtime → Change runtime type → T4)\n",
        "\n",
        "**Time**: ~20-30 min on T4, ~5 min on A100"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## ⚙️ Config — Set your HuggingFace username here"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "SFT adapter → OrcsRise/qmd-query-expansion-lfm2-sft\n",
            "GGUF repo   → OrcsRise/qmd-query-expansion-lfm2-gguf\n"
          ]
        }
      ],
      "source": [
        "# ============================================================\n",
        "# 🔧 CHANGE THIS to your HuggingFace username\n",
        "# ============================================================\n",
        "HF_USERNAME = \"OrcsRise\"  # <-- change this!\n",
        "# ============================================================\n",
        "\n",
        "BASE_MODEL = \"LiquidAI/LFM2-1.2B\"\n",
        "DATASET = \"tobil/qmd-query-expansion-train\"\n",
        "OUTPUT_SFT = f\"{HF_USERNAME}/qmd-query-expansion-lfm2-sft\"\n",
        "OUTPUT_GGUF_REPO = f\"{HF_USERNAME}/qmd-query-expansion-lfm2-gguf\"\n",
        "MODEL_NAME = \"qmd-query-expansion-lfm2\"\n",
        "\n",
        "assert len(HF_USERNAME) > 0 and ' ' not in HF_USERNAME, \"👆 Set HF_USERNAME to your HuggingFace username!\"\n",
        "print(f\"SFT adapter → {OUTPUT_SFT}\")\n",
        "print(f\"GGUF repo   → {OUTPUT_GGUF_REPO}\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": []
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n"
          ]
        }
      ],
      "source": [
        "from google.colab import drive\n",
        "drive.mount('/content/drive')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 1. Install dependencies"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {},
      "outputs": [],
      "source": [
        "!pip install -q torch trl>=0.12.0 peft>=0.7.0 \"transformers>=4.55.0\" \\\n",
        "    accelerate>=0.24.0 huggingface_hub>=0.20.0 datasets bitsandbytes \\\n",
        "    sentencepiece protobuf numpy gguf"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 2. Login to HuggingFace"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {},
      "outputs": [],
      "source": [
        "from huggingface_hub import login, notebook_login\n",
        "import os\n",
        "\n",
        "# Option A: Use Colab secrets (Settings → Secrets → add HF_TOKEN)\n",
        "try:\n",
        "    from google.colab import userdata\n",
        "    hf_token = userdata.get('HF_TOKEN')\n",
        "    login(token=hf_token)\n",
        "    print(\"✅ Logged in via Colab secret\")\n",
        "except Exception:\n",
        "    # Option B: Interactive login\n",
        "    notebook_login()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 3. Check GPU"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "CUDA available: True\n",
            "GPU: Tesla T4\n",
            "VRAM: 14.6 GB\n"
          ]
        }
      ],
      "source": [
        "import torch\n",
        "print(f\"CUDA available: {torch.cuda.is_available()}\")\n",
        "if torch.cuda.is_available():\n",
        "    print(f\"GPU: {torch.cuda.get_device_name(0)}\")\n",
        "    print(f\"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB\")\n",
        "else:\n",
        "    raise RuntimeError(\"No GPU! Go to Runtime → Change runtime type → T4 GPU\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n"
          ]
        }
      ],
      "source": [
        "from google.colab import drive\n",
        "drive.mount('/content/drive')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 4. Load dataset"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Loading dataset: tobil/qmd-query-expansion-train...\n",
            "Dataset loaded: 5157 examples\n",
            "  Train: 4641, Eval: 516\n",
            "\n",
            "--- Example ---\n",
            "User: Expand this search query:\n",
            "\n",
            "buy refurbished laptops\n",
            "Assistant: lex: where to find\n",
            "lex: purchase options for\n",
            "vec: where to find refurbished laptops for sale?\n",
            "vec: purchase options for refurbished laptops\n",
            "hyde: The topic of buy refurbished laptops covers where to find refurbished laptops for sale?. Proper implementation follows established patterns and best pract\n"
          ]
        }
      ],
      "source": [
        "from datasets import load_dataset\n",
        "\n",
        "print(f\"Loading dataset: {DATASET}...\")\n",
        "dataset = load_dataset(DATASET, split=\"train\")\n",
        "print(f\"Dataset loaded: {len(dataset)} examples\")\n",
        "\n",
        "split = dataset.train_test_split(test_size=0.1, seed=42)\n",
        "train_dataset = split[\"train\"]\n",
        "eval_dataset = split[\"test\"]\n",
        "print(f\"  Train: {len(train_dataset)}, Eval: {len(eval_dataset)}\")\n",
        "\n",
        "# Preview an example\n",
        "print(\"\\n--- Example ---\")\n",
        "ex = train_dataset[0][\"messages\"]\n",
        "print(f\"User: {ex[0]['content'][:200]}\")\n",
        "print(f\"Assistant: {ex[1]['content'][:300]}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 5. Configure & train (SFT with LoRA)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {},
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Initializing SFT trainer...\n"
          ]
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "d90cd3d13cf743f7a59eb55001e70d15",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Loading weights:   0%|          | 0/148 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Trainable params: 11,108,352\n",
            "Total params: 1,181,448,960\n",
            "\n",
            "🚀 Starting SFT training...\n"
          ]
        },
        {
          "data": {
            "text/html": [
              "\n",
              "    <div>\n",
              "      \n",
              "      <progress value='1455' max='1455' style='width:300px; height:20px; vertical-align: middle;'></progress>\n",
              "      [1455/1455 2:44:40, Epoch 5/5]\n",
              "    </div>\n",
              "    <table border=\"1\" class=\"dataframe\">\n",
              "  <thead>\n",
              " <tr style=\"text-align: left;\">\n",
              "      <th>Step</th>\n",
              "      <th>Training Loss</th>\n",
              "      <th>Validation Loss</th>\n",
              "    </tr>\n",
              "  </thead>\n",
              "  <tbody>\n",
              "    <tr>\n",
              "      <td>200</td>\n",
              "      <td>0.528098</td>\n",
              "      <td>0.544523</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>400</td>\n",
              "      <td>0.483926</td>\n",
              "      <td>0.520171</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>600</td>\n",
              "      <td>0.383667</td>\n",
              "      <td>0.520883</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>800</td>\n",
              "      <td>0.386784</td>\n",
              "      <td>0.522161</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>1000</td>\n",
              "      <td>0.306549</td>\n",
              "      <td>0.561082</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>1200</td>\n",
              "      <td>0.250284</td>\n",
              "      <td>0.598272</td>\n",
              "    </tr>\n",
              "    <tr>\n",
              "      <td>1400</td>\n",
              "      <td>0.243569</td>\n",
              "      <td>0.605305</td>\n",
              "    </tr>\n",
              "  </tbody>\n",
              "</table><p>"
            ],
            "text/plain": [
              "<IPython.core.display.HTML object>"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "\n",
            "Pushing LoRA adapter to Hub...\n"
          ]
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "923a43a9b5cb40e396a81ef04ef4e4ad",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Processing Files (0 / 0)      : |          |  0.00B /  0.00B            "
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "59f6753f776d44c28d78e00f01739f47",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "New Data Upload               : |          |  0.00B /  0.00B            "
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "70b34b2ad06643d49abca66912aa9b5b",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "  ...on-lfm2/training_args.bin: 100%|##########| 5.65kB / 5.65kB            "
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "dd56256481804bd3a32d5592002355ce",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "  ...adapter_model.safetensors:  75%|#######5  | 33.5MB / 44.5MB            "
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "No files have been modified since last commit. Skipping to prevent empty commit.\n",
            "WARNING:huggingface_hub.hf_api:No files have been modified since last commit. Skipping to prevent empty commit.\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "✅ SFT adapter: https://huggingface.co/OrcsRise/qmd-query-expansion-lfm2-sft\n"
          ]
        }
      ],
      "source": [
        "from peft import LoraConfig\n",
        "from transformers import AutoTokenizer\n",
        "from trl import SFTTrainer, SFTConfig\n",
        "\n",
        "# SFT training config\n",
        "config = SFTConfig(\n",
        "    output_dir=MODEL_NAME,\n",
        "    push_to_hub=True,\n",
        "    hub_model_id=OUTPUT_SFT,\n",
        "    hub_strategy=\"every_save\",\n",
        "\n",
        "    # Training hyperparams\n",
        "    num_train_epochs=5,\n",
        "    per_device_train_batch_size=4,\n",
        "    gradient_accumulation_steps=4,  # effective batch = 16\n",
        "    learning_rate=2e-4,\n",
        "    max_length=512,\n",
        "\n",
        "    # Logging & saving\n",
        "    logging_steps=10,\n",
        "    save_strategy=\"steps\",\n",
        "    save_steps=200,\n",
        "    save_total_limit=2,\n",
        "    eval_strategy=\"steps\",\n",
        "    eval_steps=200,\n",
        "\n",
        "    # Schedule\n",
        "    warmup_ratio=0.03,\n",
        "    lr_scheduler_type=\"cosine\",\n",
        "    bf16=True,\n",
        "    report_to=\"none\",\n",
        ")\n",
        "\n",
        "# LoRA config — LFM2 hybrid architecture targets\n",
        "# Different from standard transformers:\n",
        "#   Attention: q_proj, k_proj, v_proj, out_proj\n",
        "#   Input projection: in_proj\n",
        "#   FFN/MLP (SwiGLU): w1, w2, w3\n",
        "peft_config = LoraConfig(\n",
        "    r=16,\n",
        "    lora_alpha=32,\n",
        "    lora_dropout=0.0,\n",
        "    bias=\"none\",\n",
        "    task_type=\"CAUSAL_LM\",\n",
        "    target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"out_proj\", \"in_proj\", \"w1\", \"w2\", \"w3\"],\n",
        ")\n",
        "\n",
        "print(\"Initializing SFT trainer...\")\n",
        "trainer = SFTTrainer(\n",
        "    model=BASE_MODEL,\n",
        "    train_dataset=train_dataset,\n",
        "    eval_dataset=eval_dataset,\n",
        "    args=config,\n",
        "    peft_config=peft_config,\n",
        ")\n",
        "\n",
        "print(f\"Trainable params: {sum(p.numel() for p in trainer.model.parameters() if p.requires_grad):,}\")\n",
        "print(f\"Total params: {sum(p.numel() for p in trainer.model.parameters()):,}\")\n",
        "print(\"\\n🚀 Starting SFT training...\")\n",
        "trainer.train()\n",
        "\n",
        "print(\"\\nPushing LoRA adapter to Hub...\")\n",
        "trainer.push_to_hub()\n",
        "print(f\"✅ SFT adapter: https://huggingface.co/{OUTPUT_SFT}\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 6. Merge LoRA adapter with base model"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {},
      "outputs": [
        {
          "name": "stderr",
          "output_type": "stream",
          "text": [
            "`torch_dtype` is deprecated! Use `dtype` instead!\n"
          ]
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Loading base model: LiquidAI/LFM2-1.2B...\n"
          ]
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "833363fff1304808ba2d230da79fb87f",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Loading weights:   0%|          | 0/148 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Merging SFT adapter from: qmd-query-expansion-lfm2...\n",
            "Saving merged model to /tmp/merged_model...\n"
          ]
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "ee9e3a4369cc4a158b5d2c7845c541d1",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Writing model shards:   0%|          | 0/1 [00:00<?, ?it/s]"
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "✅ Merged model saved\n"
          ]
        }
      ],
      "source": [
        "import torch\n",
        "from peft import PeftModel\n",
        "from transformers import AutoModelForCausalLM, AutoTokenizer\n",
        "\n",
        "print(f\"Loading base model: {BASE_MODEL}...\")\n",
        "base_model = AutoModelForCausalLM.from_pretrained(\n",
        "    BASE_MODEL, torch_dtype=torch.bfloat16, device_map=\"auto\", trust_remote_code=True,\n",
        ")\n",
        "\n",
        "print(f\"Merging SFT adapter from: {MODEL_NAME}...\")\n",
        "base_model.config.tie_word_embeddings = False\n",
        "model = PeftModel.from_pretrained(base_model, MODEL_NAME, local_files_only=True)\n",
        "model = model.merge_and_unload()\n",
        "\n",
        "tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)\n",
        "\n",
        "# Save merged model\n",
        "merged_dir = \"/tmp/merged_model\"\n",
        "print(f\"Saving merged model to {merged_dir}...\")\n",
        "model.save_pretrained(merged_dir, safe_serialization=True)\n",
        "tokenizer.save_pretrained(merged_dir)\n",
        "print(\"✅ Merged model saved\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 7. Convert to GGUF"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Setting up llama.cpp...\n",
            "W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)\n",
            "Cloning into '/tmp/llama.cpp'...\n",
            "remote: Enumerating objects: 2546, done.\u001b[K\n",
            "remote: Counting objects: 100% (2546/2546), done.\u001b[K\n",
            "remote: Compressing objects: 100% (2033/2033), done.\u001b[K\n",
            "remote: Total 2546 (delta 514), reused 1659 (delta 442), pack-reused 0 (from 0)\u001b[K\n",
            "Receiving objects: 100% (2546/2546), 27.54 MiB | 18.97 MiB/s, done.\n",
            "Resolving deltas: 100% (514/514), done.\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m61.0/61.0 kB\u001b[0m \u001b[31m3.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m44.0/44.0 kB\u001b[0m \u001b[31m4.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m82.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K     \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m12.7/12.7 MB\u001b[0m \u001b[31m122.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\n",
            "\u001b[?25h  Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m18.0/18.0 MB\u001b[0m \u001b[31m93.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m00:01\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m12.0/12.0 MB\u001b[0m \u001b[31m125.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m0:01\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m294.9/294.9 kB\u001b[0m \u001b[31m32.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m178.6/178.6 MB\u001b[0m \u001b[31m6.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m00:01\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m6.2/6.2 MB\u001b[0m \u001b[31m146.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m343.6/343.6 kB\u001b[0m \u001b[31m39.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m566.4/566.4 kB\u001b[0m \u001b[31m50.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m75.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m54.5/54.5 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m45.3/45.3 kB\u001b[0m \u001b[31m5.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m98.2/98.2 kB\u001b[0m \u001b[31m13.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h  Building wheel for wget (setup.py) ... \u001b[?25l\u001b[?25hdone\n",
            "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
            "google-colab 1.0.0 requires pandas==2.2.2, but you have pandas 2.2.3 which is incompatible.\n",
            "opencv-python-headless 4.13.0.92 requires numpy>=2; python_version >= \"3.9\", but you have numpy 1.26.4 which is incompatible.\n",
            "jaxlib 0.7.2 requires numpy>=2.0, but you have numpy 1.26.4 which is incompatible.\n",
            "typer-slim 0.23.1 requires typer>=0.23.1, but you have typer 0.15.4 which is incompatible.\n",
            "shap 0.50.0 requires numpy>=2, but you have numpy 1.26.4 which is incompatible.\n",
            "tobler 0.13.0 requires numpy>=2.0, but you have numpy 1.26.4 which is incompatible.\n",
            "pytensor 2.37.0 requires numpy>=2.0, but you have numpy 1.26.4 which is incompatible.\n",
            "grpcio-status 1.71.2 requires protobuf<6.0dev,>=5.26.1, but you have protobuf 4.25.8 which is incompatible.\n",
            "opentelemetry-proto 1.38.0 requires protobuf<7.0,>=5.0, but you have protobuf 4.25.8 which is incompatible.\n",
            "jax 0.7.2 requires numpy>=2.0, but you have numpy 1.26.4 which is incompatible.\n",
            "ydf 0.15.0 requires protobuf<7.0.0,>=5.29.1, but you have protobuf 4.25.8 which is incompatible.\n",
            "opencv-contrib-python 4.13.0.92 requires numpy>=2; python_version >= \"3.9\", but you have numpy 1.26.4 which is incompatible.\n",
            "torchaudio 2.9.0+cu128 requires torch==2.9.0, but you have torch 2.6.0+cpu which is incompatible.\n",
            "torchvision 0.24.0+cu128 requires torch==2.9.0, but you have torch 2.6.0+cpu which is incompatible.\n",
            "opencv-python 4.13.0.92 requires numpy>=2; python_version >= \"3.9\", but you have numpy 1.26.4 which is incompatible.\n",
            "rasterio 1.5.0 requires numpy>=2, but you have numpy 1.26.4 which is incompatible.\n",
            "grain 0.2.15 requires protobuf>=5.28.3, but you have protobuf 4.25.8 which is incompatible.\u001b[0m\u001b[31m\n",
            "\u001b[0m\n",
            "Building llama-quantize...\n",
            "\n",
            "Converting to FP16 GGUF...\n",
            "INFO:hf-to-gguf:Loading model: merged_model\n",
            "INFO:hf-to-gguf:Model architecture: Lfm2ForCausalLM\n",
            "INFO:hf-to-gguf:gguf: indexing model part 'model.safetensors'\n",
            "INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only\n",
            "INFO:hf-to-gguf:Exporting model...\n",
            "INFO:hf-to-gguf:token_embd.weight,                torch.bfloat16 --> F16, shape = {2048, 65536}\n",
            "INFO:hf-to-gguf:token_embd_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.0.shortconv.conv.weight,      torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.0.shortconv.in_proj.weight,   torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.0.shortconv.out_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.0.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.0.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.0.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.0.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.0.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.1.shortconv.conv.weight,      torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.1.shortconv.in_proj.weight,   torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.1.shortconv.out_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.1.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.1.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.1.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.1.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.1.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.10.ffn_gate.weight,           torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.10.ffn_down.weight,           torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.10.ffn_up.weight,             torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.10.ffn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.10.attn_norm.weight,          torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.10.attn_k_norm.weight,        torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.10.attn_k.weight,             torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.10.attn_output.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.10.attn_q_norm.weight,        torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.10.attn_q.weight,             torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.10.attn_v.weight,             torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.11.shortconv.conv.weight,     torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.11.shortconv.in_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.11.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.11.ffn_gate.weight,           torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.11.ffn_down.weight,           torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.11.ffn_up.weight,             torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.11.ffn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.11.attn_norm.weight,          torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.12.ffn_gate.weight,           torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.12.ffn_down.weight,           torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.12.ffn_up.weight,             torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.12.ffn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.12.attn_norm.weight,          torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.12.attn_k_norm.weight,        torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.12.attn_k.weight,             torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.12.attn_output.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.12.attn_q_norm.weight,        torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.12.attn_q.weight,             torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.12.attn_v.weight,             torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.13.shortconv.conv.weight,     torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.13.shortconv.in_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.13.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.13.ffn_gate.weight,           torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.13.ffn_down.weight,           torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.13.ffn_up.weight,             torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.13.ffn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.13.attn_norm.weight,          torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.14.ffn_gate.weight,           torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.14.ffn_down.weight,           torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.14.ffn_up.weight,             torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.14.ffn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.14.attn_norm.weight,          torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.14.attn_k_norm.weight,        torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.14.attn_k.weight,             torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.14.attn_output.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.14.attn_q_norm.weight,        torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.14.attn_q.weight,             torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.14.attn_v.weight,             torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.15.shortconv.conv.weight,     torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.15.shortconv.in_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.15.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.15.ffn_gate.weight,           torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.15.ffn_down.weight,           torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.15.ffn_up.weight,             torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.15.ffn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.15.attn_norm.weight,          torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.2.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.2.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.2.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.2.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.2.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.2.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.2.attn_k.weight,              torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.2.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.2.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.2.attn_q.weight,              torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.2.attn_v.weight,              torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.3.shortconv.conv.weight,      torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.3.shortconv.in_proj.weight,   torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.3.shortconv.out_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.3.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.3.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.3.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.3.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.3.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.4.shortconv.conv.weight,      torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.4.shortconv.in_proj.weight,   torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.4.shortconv.out_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.4.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.4.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.4.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.4.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.4.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.5.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.5.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.5.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.5.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.5.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.5.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.5.attn_k.weight,              torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.5.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.5.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.5.attn_q.weight,              torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.5.attn_v.weight,              torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.6.shortconv.conv.weight,      torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.6.shortconv.in_proj.weight,   torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.6.shortconv.out_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.6.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.6.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.6.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.6.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.6.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.7.shortconv.conv.weight,      torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.7.shortconv.in_proj.weight,   torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.7.shortconv.out_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.7.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.7.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.7.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.7.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.7.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.8.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.8.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.8.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.8.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.8.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.8.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.8.attn_k.weight,              torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.8.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.8.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.8.attn_q.weight,              torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.8.attn_v.weight,              torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.9.shortconv.conv.weight,      torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.9.shortconv.in_proj.weight,   torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.9.shortconv.out_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.9.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.9.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.9.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.9.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.9.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:Set meta model\n",
            "INFO:hf-to-gguf:Set model parameters\n",
            "INFO:hf-to-gguf:gguf: context length = 128000\n",
            "INFO:hf-to-gguf:gguf: embedding length = 2048\n",
            "INFO:hf-to-gguf:gguf: feed forward length = 12288\n",
            "INFO:hf-to-gguf:gguf: head count = 32\n",
            "INFO:hf-to-gguf:gguf: key-value head count = [0, 0, 8, 0, 0, 8, 0, 0, 8, 0, 8, 0, 8, 0, 8, 0]\n",
            "WARNING:hf-to-gguf:Unknown RoPE type: default\n",
            "INFO:hf-to-gguf:gguf: rope scaling type = NONE\n",
            "INFO:hf-to-gguf:gguf: rope theta = 1000000.0\n",
            "INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05\n",
            "INFO:hf-to-gguf:gguf: file type = 1\n",
            "WARNING:gguf.gguf_writer:Duplicated key name 'lfm2.attention.layer_norm_rms_epsilon', overwriting it with new value 1e-05 of type FLOAT32\n",
            "WARNING:gguf.gguf_writer:Duplicated key name 'lfm2.feed_forward_length', overwriting it with new value 8192 of type UINT32\n",
            "INFO:hf-to-gguf:Set model quantization version\n",
            "INFO:hf-to-gguf:Set model tokenizer\n",
            "INFO:numexpr.utils:NumExpr defaulting to 2 threads.\n",
            "Traceback (most recent call last):\n",
            "  File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 12012, in <module>\n",
            "    main()\n",
            "  File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 12006, in main\n",
            "    model_instance.write()\n",
            "  File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 689, in write\n",
            "    self.prepare_metadata(vocab_only=False)\n",
            "  File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 830, in prepare_metadata\n",
            "    self.set_vocab()\n",
            "  File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 802, in set_vocab\n",
            "    self._set_vocab_gpt2()\n",
            "  File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 1303, in _set_vocab_gpt2\n",
            "    tokens, toktypes, tokpre = self.get_vocab_base()\n",
            "                               ^^^^^^^^^^^^^^^^^^^^^\n",
            "  File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 978, in get_vocab_base\n",
            "    tokenizer = AutoTokenizer.from_pretrained(self.dir_model)\n",
            "                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
            "  File \"/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py\", line 1153, in from_pretrained\n",
            "    raise ValueError(\n",
            "ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported.\n"
          ]
        },
        {
          "ename": "FileNotFoundError",
          "evalue": "[Errno 2] No such file or directory: '/tmp/gguf_output/qmd-query-expansion-lfm2-f16.gguf'",
          "output_type": "error",
          "traceback": [
            "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
            "\u001b[0;31mFileNotFoundError\u001b[0m                         Traceback (most recent call last)",
            "\u001b[0;32m/tmp/ipython-input-3038749745.py\u001b[0m in \u001b[0;36m<cell line: 0>\u001b[0;34m()\u001b[0m\n\u001b[1;32m     22\u001b[0m \u001b[0mget_ipython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msystem\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'python /tmp/llama.cpp/convert_hf_to_gguf.py /tmp/merged_model --outfile {fp16_file} --outtype f16'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     23\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 24\u001b[0;31m \u001b[0msize_mb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgetsize\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfp16_file\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;36m1024\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0;36m1024\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m     25\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\"  FP16: {size_mb:.1f} MB\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m     26\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
            "\u001b[0;32m/usr/lib/python3.12/genericpath.py\u001b[0m in \u001b[0;36mgetsize\u001b[0;34m(filename)\u001b[0m\n",
            "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: '/tmp/gguf_output/qmd-query-expansion-lfm2-f16.gguf'"
          ]
        }
      ],
      "source": [
        "import subprocess, sys, os\n",
        "\n",
        "# Setup llama.cpp\n",
        "print(\"Setting up llama.cpp...\")\n",
        "!apt-get update -qq && apt-get install -y -qq build-essential cmake git > /dev/null 2>&1\n",
        "\n",
        "if not os.path.exists(\"/tmp/llama.cpp\"):\n",
        "    !git clone --depth 1 https://github.com/ggerganov/llama.cpp.git /tmp/llama.cpp\n",
        "!pip install -q -r /tmp/llama.cpp/requirements.txt\n",
        "\n",
        "# Build quantize tool\n",
        "print(\"\\nBuilding llama-quantize...\")\n",
        "!cmake -B /tmp/llama.cpp/build -S /tmp/llama.cpp -DGGML_CUDA=OFF > /dev/null 2>&1\n",
        "!cmake --build /tmp/llama.cpp/build --target llama-quantize -j 4 > /dev/null 2>&1\n",
        "\n",
        "# Convert to FP16 GGUF\n",
        "gguf_dir = \"/tmp/gguf_output\"\n",
        "os.makedirs(gguf_dir, exist_ok=True)\n",
        "fp16_file = f\"{gguf_dir}/{MODEL_NAME}-f16.gguf\"\n",
        "\n",
        "print(\"\\nConverting to FP16 GGUF...\")\n",
        "!python /tmp/llama.cpp/convert_hf_to_gguf.py /tmp/merged_model --outfile {fp16_file} --outtype f16\n",
        "\n",
        "size_mb = os.path.getsize(fp16_file) / (1024 * 1024)\n",
        "print(f\"  FP16: {size_mb:.1f} MB\")\n",
        "\n",
        "# Quantize to Q4_K_M, Q5_K_M, Q8_0\n",
        "quantize_bin = \"/tmp/llama.cpp/build/bin/llama-quantize\"\n",
        "print(\"\\nQuantizing...\")\n",
        "quantized = []\n",
        "for qtype in [\"Q4_K_M\", \"Q5_K_M\", \"Q8_0\"]:\n",
        "    out = f\"{gguf_dir}/{MODEL_NAME}-{qtype.lower()}.gguf\"\n",
        "    result = subprocess.run([quantize_bin, fp16_file, out, qtype], capture_output=True, text=True)\n",
        "    if os.path.exists(out):\n",
        "        qsize = os.path.getsize(out) / (1024 * 1024)\n",
        "        print(f\"  ✅ {qtype}: {qsize:.1f} MB\")\n",
        "        quantized.append((out, qtype))\n",
        "    else:\n",
        "        print(f\"  ❌ {qtype} failed\")\n",
        "\n",
        "# Cleanup FP16 to save disk\n",
        "os.remove(fp16_file)\n",
        "print(f\"\\n🎉 GGUF files ready in {gguf_dir}\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 19,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m10.4/10.4 MB\u001b[0m \u001b[31m87.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m0:01\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m553.3/553.3 kB\u001b[0m \u001b[31m48.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m56.4/56.4 kB\u001b[0m \u001b[31m6.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[2K   \u001b[90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━\u001b[0m \u001b[32m108.3/108.3 kB\u001b[0m \u001b[31m13.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n",
            "\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n",
            "rasterio 1.5.0 requires numpy>=2, but you have numpy 1.26.4 which is incompatible.\u001b[0m\u001b[31m\n",
            "\u001b[0mConverting to FP16 GGUF...\n",
            "INFO:hf-to-gguf:Loading model: merged_model\n",
            "INFO:hf-to-gguf:Model architecture: Lfm2ForCausalLM\n",
            "INFO:hf-to-gguf:gguf: indexing model part 'model.safetensors'\n",
            "INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only\n",
            "INFO:hf-to-gguf:Exporting model...\n",
            "INFO:hf-to-gguf:token_embd.weight,                torch.bfloat16 --> F16, shape = {2048, 65536}\n",
            "INFO:hf-to-gguf:token_embd_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.0.shortconv.conv.weight,      torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.0.shortconv.in_proj.weight,   torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.0.shortconv.out_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.0.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.0.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.0.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.0.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.0.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.1.shortconv.conv.weight,      torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.1.shortconv.in_proj.weight,   torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.1.shortconv.out_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.1.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.1.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.1.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.1.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.1.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.10.ffn_gate.weight,           torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.10.ffn_down.weight,           torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.10.ffn_up.weight,             torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.10.ffn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.10.attn_norm.weight,          torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.10.attn_k_norm.weight,        torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.10.attn_k.weight,             torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.10.attn_output.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.10.attn_q_norm.weight,        torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.10.attn_q.weight,             torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.10.attn_v.weight,             torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.11.shortconv.conv.weight,     torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.11.shortconv.in_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.11.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.11.ffn_gate.weight,           torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.11.ffn_down.weight,           torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.11.ffn_up.weight,             torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.11.ffn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.11.attn_norm.weight,          torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.12.ffn_gate.weight,           torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.12.ffn_down.weight,           torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.12.ffn_up.weight,             torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.12.ffn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.12.attn_norm.weight,          torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.12.attn_k_norm.weight,        torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.12.attn_k.weight,             torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.12.attn_output.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.12.attn_q_norm.weight,        torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.12.attn_q.weight,             torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.12.attn_v.weight,             torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.13.shortconv.conv.weight,     torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.13.shortconv.in_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.13.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.13.ffn_gate.weight,           torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.13.ffn_down.weight,           torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.13.ffn_up.weight,             torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.13.ffn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.13.attn_norm.weight,          torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.14.ffn_gate.weight,           torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.14.ffn_down.weight,           torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.14.ffn_up.weight,             torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.14.ffn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.14.attn_norm.weight,          torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.14.attn_k_norm.weight,        torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.14.attn_k.weight,             torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.14.attn_output.weight,        torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.14.attn_q_norm.weight,        torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.14.attn_q.weight,             torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.14.attn_v.weight,             torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.15.shortconv.conv.weight,     torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.15.shortconv.in_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.15.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.15.ffn_gate.weight,           torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.15.ffn_down.weight,           torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.15.ffn_up.weight,             torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.15.ffn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.15.attn_norm.weight,          torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.2.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.2.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.2.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.2.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.2.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.2.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.2.attn_k.weight,              torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.2.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.2.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.2.attn_q.weight,              torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.2.attn_v.weight,              torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.3.shortconv.conv.weight,      torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.3.shortconv.in_proj.weight,   torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.3.shortconv.out_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.3.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.3.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.3.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.3.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.3.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.4.shortconv.conv.weight,      torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.4.shortconv.in_proj.weight,   torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.4.shortconv.out_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.4.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.4.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.4.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.4.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.4.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.5.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.5.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.5.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.5.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.5.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.5.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.5.attn_k.weight,              torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.5.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.5.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.5.attn_q.weight,              torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.5.attn_v.weight,              torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.6.shortconv.conv.weight,      torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.6.shortconv.in_proj.weight,   torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.6.shortconv.out_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.6.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.6.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.6.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.6.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.6.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.7.shortconv.conv.weight,      torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.7.shortconv.in_proj.weight,   torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.7.shortconv.out_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.7.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.7.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.7.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.7.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.7.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.8.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.8.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.8.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.8.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.8.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.8.attn_k_norm.weight,         torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.8.attn_k.weight,              torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.8.attn_output.weight,         torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.8.attn_q_norm.weight,         torch.bfloat16 --> F32, shape = {64}\n",
            "INFO:hf-to-gguf:blk.8.attn_q.weight,              torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.8.attn_v.weight,              torch.bfloat16 --> F16, shape = {2048, 512}\n",
            "INFO:hf-to-gguf:blk.9.shortconv.conv.weight,      torch.bfloat16 --> F32, shape = {3, 2048}\n",
            "INFO:hf-to-gguf:blk.9.shortconv.in_proj.weight,   torch.bfloat16 --> F16, shape = {2048, 6144}\n",
            "INFO:hf-to-gguf:blk.9.shortconv.out_proj.weight,  torch.bfloat16 --> F16, shape = {2048, 2048}\n",
            "INFO:hf-to-gguf:blk.9.ffn_gate.weight,            torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.9.ffn_down.weight,            torch.bfloat16 --> F16, shape = {8192, 2048}\n",
            "INFO:hf-to-gguf:blk.9.ffn_up.weight,              torch.bfloat16 --> F16, shape = {2048, 8192}\n",
            "INFO:hf-to-gguf:blk.9.ffn_norm.weight,            torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:blk.9.attn_norm.weight,           torch.bfloat16 --> F32, shape = {2048}\n",
            "INFO:hf-to-gguf:Set meta model\n",
            "INFO:hf-to-gguf:Set model parameters\n",
            "INFO:hf-to-gguf:gguf: context length = 128000\n",
            "INFO:hf-to-gguf:gguf: embedding length = 2048\n",
            "INFO:hf-to-gguf:gguf: feed forward length = 12288\n",
            "INFO:hf-to-gguf:gguf: head count = 32\n",
            "INFO:hf-to-gguf:gguf: key-value head count = [0, 0, 8, 0, 0, 8, 0, 0, 8, 0, 8, 0, 8, 0, 8, 0]\n",
            "WARNING:hf-to-gguf:Unknown RoPE type: default\n",
            "INFO:hf-to-gguf:gguf: rope scaling type = NONE\n",
            "INFO:hf-to-gguf:gguf: rope theta = 1000000.0\n",
            "INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05\n",
            "INFO:hf-to-gguf:gguf: file type = 1\n",
            "WARNING:gguf.gguf_writer:Duplicated key name 'lfm2.attention.layer_norm_rms_epsilon', overwriting it with new value 1e-05 of type FLOAT32\n",
            "WARNING:gguf.gguf_writer:Duplicated key name 'lfm2.feed_forward_length', overwriting it with new value 8192 of type UINT32\n",
            "INFO:hf-to-gguf:Set model quantization version\n",
            "INFO:hf-to-gguf:Set model tokenizer\n",
            "INFO:numexpr.utils:NumExpr defaulting to 2 threads.\n",
            "WARNING:gguf.vocab:Unknown separator token '<|startoftext|>' in TemplateProcessing<pair>\n",
            "INFO:gguf.vocab:Adding 63683 merge(s).\n",
            "INFO:gguf.vocab:Setting special token type bos to 1\n",
            "INFO:gguf.vocab:Setting special token type eos to 7\n",
            "INFO:gguf.vocab:Setting special token type pad to 0\n",
            "INFO:gguf.vocab:Setting add_bos_token to True\n",
            "INFO:gguf.vocab:Setting add_sep_token to False\n",
            "INFO:gguf.vocab:Setting chat_template to {{- bos_token -}}\n",
            "{%- set system_prompt = \"\" -%}\n",
            "{%- set ns = namespace(system_prompt=\"\") -%}\n",
            "{%- if messages[0][\"role\"] == \"system\" -%}\n",
            "\t{%- set ns.system_prompt = messages[0][\"content\"] -%}\n",
            "\t{%- set messages = messages[1:] -%}\n",
            "{%- endif -%}\n",
            "{%- if tools -%}\n",
            "\t{%- set ns.system_prompt = ns.system_prompt + (\"\\n\" if ns.system_prompt else \"\") + \"List of tools: <|tool_list_start|>[\" -%}\n",
            "\t{%- for tool in tools -%}\n",
            "\t\t{%- if tool is not string -%}\n",
            "            {%- set tool = tool | tojson -%}\n",
            "\t\t{%- endif -%}\n",
            "\t\t{%- set ns.system_prompt = ns.system_prompt + tool -%}\n",
            "        {%- if not loop.last -%}\n",
            "            {%- set ns.system_prompt = ns.system_prompt + \", \" -%}\n",
            "        {%- endif -%}\n",
            "\t{%- endfor -%}\n",
            "\t{%- set ns.system_prompt = ns.system_prompt + \"]<|tool_list_end|>\" -%}\n",
            "{%- endif -%}\n",
            "{%- if ns.system_prompt -%}\n",
            "\t{{- \"<|im_start|>system\\n\" + ns.system_prompt + \"<|im_end|>\\n\" -}}\n",
            "{%- endif -%}\n",
            "{%- for message in messages -%}\n",
            "\t{{- \"<|im_start|>\" + message[\"role\"] + \"\\n\" -}}\n",
            "\t{%- set content = message[\"content\"] -%}\n",
            "\t{%- if content is not string -%}\n",
            "\t\t{%- set content = content | tojson -%}\n",
            "\t{%- endif -%}\n",
            "\t{%- if message[\"role\"] == \"tool\" -%}\n",
            "\t\t{%- set content = \"<|tool_response_start|>\" + content + \"<|tool_response_end|>\" -%}\n",
            "\t{%- endif -%}\n",
            "\t{{- content + \"<|im_end|>\\n\" -}}\n",
            "{%- endfor -%}\n",
            "{%- if add_generation_prompt -%}\n",
            "\t{{- \"<|im_start|>assistant\\n\" -}}\n",
            "{%- endif -%}\n",
            "INFO:gguf.gguf_writer:Writing the following files:\n",
            "INFO:gguf.gguf_writer:/tmp/gguf_output/qmd-query-expansion-lfm2-f16.gguf: n_tensors = 148, total_size = 2.3G\n",
            "Writing: 100% 2.34G/2.34G [00:25<00:00, 92.3Mbyte/s]\n",
            "INFO:hf-to-gguf:Model successfully exported to /tmp/gguf_output/qmd-query-expansion-lfm2-f16.gguf\n",
            "  FP16: 2234.8 MB\n",
            "Quantizing to Q8_0...\n",
            "main: build = 1 (abb9f3c)\n",
            "main: built with GNU 11.4.0 for Linux x86_64\n",
            "main: quantizing '/tmp/gguf_output/qmd-query-expansion-lfm2-f16.gguf' to '/tmp/gguf_output/qmd-query-expansion-lfm2-q8_0.gguf' as Q8_0\n",
            "llama_model_loader: loaded meta data with 27 key-value pairs and 148 tensors from /tmp/gguf_output/qmd-query-expansion-lfm2-f16.gguf (version GGUF V3 (latest))\n",
            "llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.\n",
            "llama_model_loader: - kv   0:                       general.architecture str              = lfm2\n",
            "llama_model_loader: - kv   1:                               general.type str              = model\n",
            "llama_model_loader: - kv   2:                               general.name str              = Merged_Model\n",
            "llama_model_loader: - kv   3:                         general.size_label str              = 1.2B\n",
            "llama_model_loader: - kv   4:                           lfm2.block_count u32              = 16\n",
            "llama_model_loader: - kv   5:                        lfm2.context_length u32              = 128000\n",
            "llama_model_loader: - kv   6:                      lfm2.embedding_length u32              = 2048\n",
            "llama_model_loader: - kv   7:                   lfm2.feed_forward_length u32              = 8192\n",
            "llama_model_loader: - kv   8:                  lfm2.attention.head_count u32              = 32\n",
            "llama_model_loader: - kv   9:               lfm2.attention.head_count_kv arr[i32,16]      = [0, 0, 8, 0, 0, 8, 0, 0, 8, 0, 8, 0, ...\n",
            "llama_model_loader: - kv  10:                        lfm2.rope.freq_base f32              = 1000000.000000\n",
            "llama_model_loader: - kv  11:      lfm2.attention.layer_norm_rms_epsilon f32              = 0.000010\n",
            "llama_model_loader: - kv  12:                          general.file_type u32              = 1\n",
            "llama_model_loader: - kv  13:                            lfm2.vocab_size u32              = 65536\n",
            "llama_model_loader: - kv  14:                     lfm2.shortconv.l_cache u32              = 3\n",
            "llama_model_loader: - kv  15:               general.quantization_version u32              = 2\n",
            "llama_model_loader: - kv  16:                       tokenizer.ggml.model str              = gpt2\n",
            "llama_model_loader: - kv  17:                         tokenizer.ggml.pre str              = lfm2\n",
            "llama_model_loader: - kv  18:                      tokenizer.ggml.tokens arr[str,65536]   = [\"<|pad|>\", \"<|startoftext|>\", \"<|end...\n",
            "llama_model_loader: - kv  19:                  tokenizer.ggml.token_type arr[i32,65536]   = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...\n",
            "llama_model_loader: - kv  20:                      tokenizer.ggml.merges arr[str,63683]   = [\"Ċ Ċ\", \"Ċ ĊĊ\", \"ĊĊ Ċ\", \"Ċ �...\n",
            "llama_model_loader: - kv  21:                tokenizer.ggml.bos_token_id u32              = 1\n",
            "llama_model_loader: - kv  22:                tokenizer.ggml.eos_token_id u32              = 7\n",
            "llama_model_loader: - kv  23:            tokenizer.ggml.padding_token_id u32              = 0\n",
            "llama_model_loader: - kv  24:               tokenizer.ggml.add_bos_token bool             = true\n",
            "llama_model_loader: - kv  25:               tokenizer.ggml.add_sep_token bool             = false\n",
            "llama_model_loader: - kv  26:                    tokenizer.chat_template str              = {{- bos_token -}}\\n{%- set system_prom...\n",
            "llama_model_loader: - type  f32:   55 tensors\n",
            "llama_model_loader: - type  f16:   93 tensors\n",
            "[   1/ 148]                    token_embd.weight - [ 2048, 65536,     1,     1], type =    f16, converting to q8_0 .. size =   256.00 MiB ->   136.00 MiB\n",
            "[   2/ 148]               token_embd_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[   3/ 148]               blk.0.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[   4/ 148]                blk.0.ffn_down.weight - [ 8192,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[   5/ 148]                blk.0.ffn_gate.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[   6/ 148]                blk.0.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[   7/ 148]                  blk.0.ffn_up.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[   8/ 148]          blk.0.shortconv.conv.weight - [    3,  2048,     1,     1], type =    f32, size =    0.023 MiB\n",
            "[   9/ 148]       blk.0.shortconv.in_proj.weight - [ 2048,  6144,     1,     1], type =    f16, converting to q8_0 .. size =    24.00 MiB ->    12.75 MiB\n",
            "[  10/ 148]      blk.0.shortconv.out_proj.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[  11/ 148]               blk.1.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[  12/ 148]                blk.1.ffn_down.weight - [ 8192,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  13/ 148]                blk.1.ffn_gate.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  14/ 148]                blk.1.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[  15/ 148]                  blk.1.ffn_up.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  16/ 148]          blk.1.shortconv.conv.weight - [    3,  2048,     1,     1], type =    f32, size =    0.023 MiB\n",
            "[  17/ 148]       blk.1.shortconv.in_proj.weight - [ 2048,  6144,     1,     1], type =    f16, converting to q8_0 .. size =    24.00 MiB ->    12.75 MiB\n",
            "[  18/ 148]      blk.1.shortconv.out_proj.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[  19/ 148]                  blk.2.attn_k.weight - [ 2048,   512,     1,     1], type =    f16, converting to q8_0 .. size =     2.00 MiB ->     1.06 MiB\n",
            "[  20/ 148]             blk.2.attn_k_norm.weight - [   64,     1,     1,     1], type =    f32, size =    0.000 MiB\n",
            "[  21/ 148]               blk.2.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[  22/ 148]             blk.2.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[  23/ 148]                  blk.2.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[  24/ 148]             blk.2.attn_q_norm.weight - [   64,     1,     1,     1], type =    f32, size =    0.000 MiB\n",
            "[  25/ 148]                  blk.2.attn_v.weight - [ 2048,   512,     1,     1], type =    f16, converting to q8_0 .. size =     2.00 MiB ->     1.06 MiB\n",
            "[  26/ 148]                blk.2.ffn_down.weight - [ 8192,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  27/ 148]                blk.2.ffn_gate.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  28/ 148]                blk.2.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[  29/ 148]                  blk.2.ffn_up.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  30/ 148]               blk.3.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[  31/ 148]                blk.3.ffn_down.weight - [ 8192,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  32/ 148]                blk.3.ffn_gate.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  33/ 148]                blk.3.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[  34/ 148]                  blk.3.ffn_up.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  35/ 148]          blk.3.shortconv.conv.weight - [    3,  2048,     1,     1], type =    f32, size =    0.023 MiB\n",
            "[  36/ 148]       blk.3.shortconv.in_proj.weight - [ 2048,  6144,     1,     1], type =    f16, converting to q8_0 .. size =    24.00 MiB ->    12.75 MiB\n",
            "[  37/ 148]      blk.3.shortconv.out_proj.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[  38/ 148]               blk.4.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[  39/ 148]                blk.4.ffn_down.weight - [ 8192,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  40/ 148]                blk.4.ffn_gate.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  41/ 148]                blk.4.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[  42/ 148]                  blk.4.ffn_up.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  43/ 148]          blk.4.shortconv.conv.weight - [    3,  2048,     1,     1], type =    f32, size =    0.023 MiB\n",
            "[  44/ 148]       blk.4.shortconv.in_proj.weight - [ 2048,  6144,     1,     1], type =    f16, converting to q8_0 .. size =    24.00 MiB ->    12.75 MiB\n",
            "[  45/ 148]      blk.4.shortconv.out_proj.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[  46/ 148]                  blk.5.attn_k.weight - [ 2048,   512,     1,     1], type =    f16, converting to q8_0 .. size =     2.00 MiB ->     1.06 MiB\n",
            "[  47/ 148]             blk.5.attn_k_norm.weight - [   64,     1,     1,     1], type =    f32, size =    0.000 MiB\n",
            "[  48/ 148]               blk.5.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[  49/ 148]             blk.5.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[  50/ 148]                  blk.5.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[  51/ 148]             blk.5.attn_q_norm.weight - [   64,     1,     1,     1], type =    f32, size =    0.000 MiB\n",
            "[  52/ 148]                  blk.5.attn_v.weight - [ 2048,   512,     1,     1], type =    f16, converting to q8_0 .. size =     2.00 MiB ->     1.06 MiB\n",
            "[  53/ 148]                blk.5.ffn_down.weight - [ 8192,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  54/ 148]                blk.5.ffn_gate.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  55/ 148]                blk.5.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[  56/ 148]                  blk.5.ffn_up.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  57/ 148]               blk.6.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[  58/ 148]                blk.6.ffn_down.weight - [ 8192,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  59/ 148]                blk.6.ffn_gate.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  60/ 148]                blk.6.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[  61/ 148]                  blk.6.ffn_up.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  62/ 148]          blk.6.shortconv.conv.weight - [    3,  2048,     1,     1], type =    f32, size =    0.023 MiB\n",
            "[  63/ 148]       blk.6.shortconv.in_proj.weight - [ 2048,  6144,     1,     1], type =    f16, converting to q8_0 .. size =    24.00 MiB ->    12.75 MiB\n",
            "[  64/ 148]      blk.6.shortconv.out_proj.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[  65/ 148]               blk.7.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[  66/ 148]                blk.7.ffn_down.weight - [ 8192,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  67/ 148]                blk.7.ffn_gate.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  68/ 148]                blk.7.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[  69/ 148]                  blk.7.ffn_up.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  70/ 148]          blk.7.shortconv.conv.weight - [    3,  2048,     1,     1], type =    f32, size =    0.023 MiB\n",
            "[  71/ 148]       blk.7.shortconv.in_proj.weight - [ 2048,  6144,     1,     1], type =    f16, converting to q8_0 .. size =    24.00 MiB ->    12.75 MiB\n",
            "[  72/ 148]      blk.7.shortconv.out_proj.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[  73/ 148]                  blk.8.attn_k.weight - [ 2048,   512,     1,     1], type =    f16, converting to q8_0 .. size =     2.00 MiB ->     1.06 MiB\n",
            "[  74/ 148]             blk.8.attn_k_norm.weight - [   64,     1,     1,     1], type =    f32, size =    0.000 MiB\n",
            "[  75/ 148]               blk.8.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[  76/ 148]             blk.8.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[  77/ 148]                  blk.8.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[  78/ 148]             blk.8.attn_q_norm.weight - [   64,     1,     1,     1], type =    f32, size =    0.000 MiB\n",
            "[  79/ 148]                  blk.8.attn_v.weight - [ 2048,   512,     1,     1], type =    f16, converting to q8_0 .. size =     2.00 MiB ->     1.06 MiB\n",
            "[  80/ 148]                blk.8.ffn_down.weight - [ 8192,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  81/ 148]                blk.8.ffn_gate.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  82/ 148]                blk.8.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[  83/ 148]                  blk.8.ffn_up.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  84/ 148]               blk.9.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[  85/ 148]                blk.9.ffn_down.weight - [ 8192,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  86/ 148]                blk.9.ffn_gate.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  87/ 148]                blk.9.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[  88/ 148]                  blk.9.ffn_up.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[  89/ 148]          blk.9.shortconv.conv.weight - [    3,  2048,     1,     1], type =    f32, size =    0.023 MiB\n",
            "[  90/ 148]       blk.9.shortconv.in_proj.weight - [ 2048,  6144,     1,     1], type =    f16, converting to q8_0 .. size =    24.00 MiB ->    12.75 MiB\n",
            "[  91/ 148]      blk.9.shortconv.out_proj.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[  92/ 148]                 blk.10.attn_k.weight - [ 2048,   512,     1,     1], type =    f16, converting to q8_0 .. size =     2.00 MiB ->     1.06 MiB\n",
            "[  93/ 148]            blk.10.attn_k_norm.weight - [   64,     1,     1,     1], type =    f32, size =    0.000 MiB\n",
            "[  94/ 148]              blk.10.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[  95/ 148]            blk.10.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[  96/ 148]                 blk.10.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[  97/ 148]            blk.10.attn_q_norm.weight - [   64,     1,     1,     1], type =    f32, size =    0.000 MiB\n",
            "[  98/ 148]                 blk.10.attn_v.weight - [ 2048,   512,     1,     1], type =    f16, converting to q8_0 .. size =     2.00 MiB ->     1.06 MiB\n",
            "[  99/ 148]               blk.10.ffn_down.weight - [ 8192,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[ 100/ 148]               blk.10.ffn_gate.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[ 101/ 148]               blk.10.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[ 102/ 148]                 blk.10.ffn_up.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[ 103/ 148]              blk.11.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[ 104/ 148]               blk.11.ffn_down.weight - [ 8192,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[ 105/ 148]               blk.11.ffn_gate.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[ 106/ 148]               blk.11.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[ 107/ 148]                 blk.11.ffn_up.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[ 108/ 148]         blk.11.shortconv.conv.weight - [    3,  2048,     1,     1], type =    f32, size =    0.023 MiB\n",
            "[ 109/ 148]      blk.11.shortconv.in_proj.weight - [ 2048,  6144,     1,     1], type =    f16, converting to q8_0 .. size =    24.00 MiB ->    12.75 MiB\n",
            "[ 110/ 148]     blk.11.shortconv.out_proj.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[ 111/ 148]                 blk.12.attn_k.weight - [ 2048,   512,     1,     1], type =    f16, converting to q8_0 .. size =     2.00 MiB ->     1.06 MiB\n",
            "[ 112/ 148]            blk.12.attn_k_norm.weight - [   64,     1,     1,     1], type =    f32, size =    0.000 MiB\n",
            "[ 113/ 148]              blk.12.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[ 114/ 148]            blk.12.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[ 115/ 148]                 blk.12.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[ 116/ 148]            blk.12.attn_q_norm.weight - [   64,     1,     1,     1], type =    f32, size =    0.000 MiB\n",
            "[ 117/ 148]                 blk.12.attn_v.weight - [ 2048,   512,     1,     1], type =    f16, converting to q8_0 .. size =     2.00 MiB ->     1.06 MiB\n",
            "[ 118/ 148]               blk.12.ffn_down.weight - [ 8192,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[ 119/ 148]               blk.12.ffn_gate.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[ 120/ 148]               blk.12.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[ 121/ 148]                 blk.12.ffn_up.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[ 122/ 148]              blk.13.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[ 123/ 148]               blk.13.ffn_down.weight - [ 8192,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[ 124/ 148]               blk.13.ffn_gate.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[ 125/ 148]               blk.13.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[ 126/ 148]                 blk.13.ffn_up.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[ 127/ 148]         blk.13.shortconv.conv.weight - [    3,  2048,     1,     1], type =    f32, size =    0.023 MiB\n",
            "[ 128/ 148]      blk.13.shortconv.in_proj.weight - [ 2048,  6144,     1,     1], type =    f16, converting to q8_0 .. size =    24.00 MiB ->    12.75 MiB\n",
            "[ 129/ 148]     blk.13.shortconv.out_proj.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[ 130/ 148]                 blk.14.attn_k.weight - [ 2048,   512,     1,     1], type =    f16, converting to q8_0 .. size =     2.00 MiB ->     1.06 MiB\n",
            "[ 131/ 148]            blk.14.attn_k_norm.weight - [   64,     1,     1,     1], type =    f32, size =    0.000 MiB\n",
            "[ 132/ 148]              blk.14.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[ 133/ 148]            blk.14.attn_output.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[ 134/ 148]                 blk.14.attn_q.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "[ 135/ 148]            blk.14.attn_q_norm.weight - [   64,     1,     1,     1], type =    f32, size =    0.000 MiB\n",
            "[ 136/ 148]                 blk.14.attn_v.weight - [ 2048,   512,     1,     1], type =    f16, converting to q8_0 .. size =     2.00 MiB ->     1.06 MiB\n",
            "[ 137/ 148]               blk.14.ffn_down.weight - [ 8192,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[ 138/ 148]               blk.14.ffn_gate.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[ 139/ 148]               blk.14.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[ 140/ 148]                 blk.14.ffn_up.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[ 141/ 148]              blk.15.attn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[ 142/ 148]               blk.15.ffn_down.weight - [ 8192,  2048,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[ 143/ 148]               blk.15.ffn_gate.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[ 144/ 148]               blk.15.ffn_norm.weight - [ 2048,     1,     1,     1], type =    f32, size =    0.008 MiB\n",
            "[ 145/ 148]                 blk.15.ffn_up.weight - [ 2048,  8192,     1,     1], type =    f16, converting to q8_0 .. size =    32.00 MiB ->    17.00 MiB\n",
            "[ 146/ 148]         blk.15.shortconv.conv.weight - [    3,  2048,     1,     1], type =    f32, size =    0.023 MiB\n",
            "[ 147/ 148]      blk.15.shortconv.in_proj.weight - [ 2048,  6144,     1,     1], type =    f16, converting to q8_0 .. size =    24.00 MiB ->    12.75 MiB\n",
            "[ 148/ 148]     blk.15.shortconv.out_proj.weight - [ 2048,  2048,     1,     1], type =    f16, converting to q8_0 .. size =     8.00 MiB ->     4.25 MiB\n",
            "llama_model_quantize_impl: model size  =  2232.50 MiB\n",
            "llama_model_quantize_impl: quant size  =  1186.25 MiB\n",
            "\n",
            "main: quantize time = 20068.41 ms\n",
            "main:    total time = 20068.42 ms\n",
            "  Q8_0: 1188.5 MB\n",
            "✅ GGUF conversion complete!\n"
          ]
        }
      ],
      "source": [
        "!pip install -q --upgrade tokenizers transformers\n",
        "import os, subprocess\n",
        "gguf_dir = \"/tmp/gguf_output\"\n",
        "merged_dir = \"/tmp/merged_model\"\n",
        "fp16_file = f\"{gguf_dir}/{MODEL_NAME}-f16.gguf\"\n",
        "# Retry FP16 conversion\n",
        "print(\"Converting to FP16 GGUF...\")\n",
        "!python /tmp/llama.cpp/convert_hf_to_gguf.py {merged_dir} --outfile {fp16_file} --outtype f16\n",
        "size_mb = os.path.getsize(fp16_file) / (1024 * 1024)\n",
        "print(f\"  FP16: {size_mb:.1f} MB\")\n",
        "# Quantize to Q8_0\n",
        "q8_file = f\"{gguf_dir}/{MODEL_NAME}-q8_0.gguf\"\n",
        "print(\"Quantizing to Q8_0...\")\n",
        "!chmod +x /tmp/llama.cpp/build/bin/llama-quantize\n",
        "!/tmp/llama.cpp/build/bin/llama-quantize {fp16_file} {q8_file} q8_0\n",
        "size_mb = os.path.getsize(q8_file) / (1024 * 1024)\n",
        "print(f\"  Q8_0: {size_mb:.1f} MB\")\n",
        "print(\"✅ GGUF conversion complete!\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 8. Upload GGUFs to HuggingFace"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 24,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Uploading to OrcsRise/qmd-query-expansion-lfm2-gguf...\n",
            "  Uploading qmd-query-expansion-lfm2-q8_0.gguf (1189 MB)...\n"
          ]
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "c5b7c5b973454cccae5027074ba75274",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "Processing Files (0 / 0)      : |          |  0.00B /  0.00B            "
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "406bd64d6a574d59a365fb6893e4fbb0",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "New Data Upload               : |          |  0.00B /  0.00B            "
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "data": {
            "application/vnd.jupyter.widget-view+json": {
              "model_id": "18c004aa0a9d4db3ba14e7c4b3e0a7d3",
              "version_major": 2,
              "version_minor": 0
            },
            "text/plain": [
              "  ...-expansion-lfm2-q8_0.gguf:   4%|4         | 50.2MB / 1.25GB            "
            ]
          },
          "metadata": {},
          "output_type": "display_data"
        },
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "\n",
            "✅ Uploaded to: https://huggingface.co/OrcsRise/qmd-query-expansion-lfm2-gguf\n",
            "\n",
            "📋 Add to ~/.zshrc:\n",
            "export QMD_GEN_MODEL=\"hf:OrcsRise/qmd-query-expansion-lfm2-gguf/qmd-query-expansion-lfm2-q8_0.gguf\"\n"
          ]
        }
      ],
      "source": [
        "from huggingface_hub import HfApi\n",
        "import os\n",
        "api = HfApi()\n",
        "# Create repo if needed\n",
        "api.create_repo(OUTPUT_GGUF_REPO, exist_ok=True)\n",
        "print(f\"Uploading to {OUTPUT_GGUF_REPO}...\")\n",
        "# Upload Q8_0\n",
        "q8_file = f\"/tmp/gguf_output/{MODEL_NAME}-q8_0.gguf\"\n",
        "filename = os.path.basename(q8_file)\n",
        "print(f\"  Uploading {filename} ({os.path.getsize(q8_file) / 1024**2:.0f} MB)...\")\n",
        "api.upload_file(\n",
        "    path_or_fileobj=q8_file,\n",
        "    path_in_repo=filename,\n",
        "    repo_id=OUTPUT_GGUF_REPO,\n",
        ")\n",
        "print(f\"\\n✅ Uploaded to: https://huggingface.co/{OUTPUT_GGUF_REPO}\")\n",
        "print(f\"\\n📋 Add to ~/.zshrc:\")\n",
        "print(f'export QMD_GEN_MODEL=\"hf:{OUTPUT_GGUF_REPO}/{filename}\"')"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 23,
      "metadata": {},
      "outputs": [
        {
          "name": "stdout",
          "output_type": "stream",
          "text": [
            "Uploading to OrcsRise/qmd-query-expansion-lfm2-gguf...\n"
          ]
        },
        {
          "ename": "NameError",
          "evalue": "name 'quantized' is not defined",
          "output_type": "error",
          "traceback": [
            "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
            "\u001b[0;31mNameError\u001b[0m                                 Traceback (most recent call last)",
            "\u001b[0;32m/tmp/ipython-input-4233011047.py\u001b[0m in \u001b[0;36m<cell line: 0>\u001b[0;34m()\u001b[0m\n\u001b[1;32m      5\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      6\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\"Uploading to {OUTPUT_GGUF_REPO}...\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0;32mfor\u001b[0m \u001b[0mqfile\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mqtype\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mquantized\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m      8\u001b[0m     \u001b[0mfilename\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbasename\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mqfile\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m      9\u001b[0m     \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\"  Uploading {filename}...\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
            "\u001b[0;31mNameError\u001b[0m: name 'quantized' is not defined"
          ]
        }
      ],
      "source": [
        "from huggingface_hub import HfApi\n",
        "\n",
        "api = HfApi()\n",
        "api.create_repo(repo_id=OUTPUT_GGUF_REPO, repo_type=\"model\", exist_ok=True)\n",
        "\n",
        "print(f\"Uploading to {OUTPUT_GGUF_REPO}...\")\n",
        "for qfile, qtype in quantized:\n",
        "    filename = os.path.basename(qfile)\n",
        "    print(f\"  Uploading {filename}...\")\n",
        "    api.upload_file(\n",
        "        path_or_fileobj=qfile,\n",
        "        path_in_repo=filename,\n",
        "        repo_id=OUTPUT_GGUF_REPO,\n",
        "    )\n",
        "\n",
        "# Upload README\n",
        "readme = f\"\"\"---\n",
        "base_model: {BASE_MODEL}\n",
        "tags: [gguf, llama.cpp, quantized, query-expansion, qmd, lfm2]\n",
        "---\n",
        "# {MODEL_NAME} (GGUF)\n",
        "\n",
        "Fine-tuned LiquidAI LFM2-1.2B for QMD query expansion.\n",
        "\n",
        "## Details\n",
        "- **Base:** {BASE_MODEL}\n",
        "- **Training:** SFT with LoRA (rank 16) on {DATASET}\n",
        "- **Task:** Query expansion producing lex/vec/hyde format\n",
        "\n",
        "## Usage with qmd\n",
        "```bash\n",
        "export QMD_GEN_MODEL=\"hf:{OUTPUT_GGUF_REPO}/{MODEL_NAME}-q8_0.gguf\"\n",
        "qmd query \"your search\"\n",
        "```\n",
        "\"\"\"\n",
        "api.upload_file(\n",
        "    path_or_fileobj=readme.encode(),\n",
        "    path_in_repo=\"README.md\",\n",
        "    repo_id=OUTPUT_GGUF_REPO,\n",
        ")\n",
        "\n",
        "print(f\"\\n🎉 Done! https://huggingface.co/{OUTPUT_GGUF_REPO}\")\n",
        "print(f\"\\n📋 Add this to your ~/.zshrc:\")\n",
        "print(f'export QMD_GEN_MODEL=\"hf:{OUTPUT_GGUF_REPO}/{MODEL_NAME}-q8_0.gguf\"')"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## ✅ Done!\n",
        "\n",
        "Copy the export line above and add it to your `~/.zshrc`, then:\n",
        "\n",
        "```bash\n",
        "source ~/.zshrc\n",
        "qmd query \"test\"\n",
        "```\n",
        "\n",
        "The fine-tuned LFM2 will produce clean, diverse `lex:/vec:/hyde:` expansions — 2x faster than Qwen3."
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "gpuType": "T4",
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3 (ipykernel)",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.12.12"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
No results found