Created
February 19, 2026 15:14
-
-
Save OmerFarukOruc/ed16caf2c6fa768c5970d38d3889ba28 to your computer and use it in GitHub Desktop.
QMD Query Expansion: Fine-tune LiquidAI LFM2-1.2B on free Google Colab T4 (~2.5h)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# QMD Query Expansion: Fine-tune LiquidAI LFM2-1.2B\n", | |
| "\n", | |
| "Fine-tunes LFM2-1.2B on qmd's query expansion dataset to produce structured `lex:/vec:/hyde:` output.\n", | |
| "Then converts to GGUF (Q8_0) for local inference.\n", | |
| "\n", | |
| "**Runtime**: Set to **T4 GPU** (Runtime β Change runtime type β T4)\n", | |
| "\n", | |
| "**Time**: ~20-30 min on T4, ~5 min on A100" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## βοΈ Config β Set your HuggingFace username here" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 9, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "SFT adapter β OrcsRise/qmd-query-expansion-lfm2-sft\n", | |
| "GGUF repo β OrcsRise/qmd-query-expansion-lfm2-gguf\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "# ============================================================\n", | |
| "# π§ CHANGE THIS to your HuggingFace username\n", | |
| "# ============================================================\n", | |
| "HF_USERNAME = \"OrcsRise\" # <-- change this!\n", | |
| "# ============================================================\n", | |
| "\n", | |
| "BASE_MODEL = \"LiquidAI/LFM2-1.2B\"\n", | |
| "DATASET = \"tobil/qmd-query-expansion-train\"\n", | |
| "OUTPUT_SFT = f\"{HF_USERNAME}/qmd-query-expansion-lfm2-sft\"\n", | |
| "OUTPUT_GGUF_REPO = f\"{HF_USERNAME}/qmd-query-expansion-lfm2-gguf\"\n", | |
| "MODEL_NAME = \"qmd-query-expansion-lfm2\"\n", | |
| "\n", | |
| "assert len(HF_USERNAME) > 0 and ' ' not in HF_USERNAME, \"π Set HF_USERNAME to your HuggingFace username!\"\n", | |
| "print(f\"SFT adapter β {OUTPUT_SFT}\")\n", | |
| "print(f\"GGUF repo β {OUTPUT_GGUF_REPO}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 10, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "from google.colab import drive\n", | |
| "drive.mount('/content/drive')" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 1. Install dependencies" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 11, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "!pip install -q torch trl>=0.12.0 peft>=0.7.0 \"transformers>=4.55.0\" \\\n", | |
| " accelerate>=0.24.0 huggingface_hub>=0.20.0 datasets bitsandbytes \\\n", | |
| " sentencepiece protobuf numpy gguf" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 2. Login to HuggingFace" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 12, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "from huggingface_hub import login, notebook_login\n", | |
| "import os\n", | |
| "\n", | |
| "# Option A: Use Colab secrets (Settings β Secrets β add HF_TOKEN)\n", | |
| "try:\n", | |
| " from google.colab import userdata\n", | |
| " hf_token = userdata.get('HF_TOKEN')\n", | |
| " login(token=hf_token)\n", | |
| " print(\"β Logged in via Colab secret\")\n", | |
| "except Exception:\n", | |
| " # Option B: Interactive login\n", | |
| " notebook_login()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 3. Check GPU" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 13, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "CUDA available: True\n", | |
| "GPU: Tesla T4\n", | |
| "VRAM: 14.6 GB\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "import torch\n", | |
| "print(f\"CUDA available: {torch.cuda.is_available()}\")\n", | |
| "if torch.cuda.is_available():\n", | |
| " print(f\"GPU: {torch.cuda.get_device_name(0)}\")\n", | |
| " print(f\"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB\")\n", | |
| "else:\n", | |
| " raise RuntimeError(\"No GPU! Go to Runtime β Change runtime type β T4 GPU\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 14, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount(\"/content/drive\", force_remount=True).\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "from google.colab import drive\n", | |
| "drive.mount('/content/drive')" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 4. Load dataset" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 15, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Loading dataset: tobil/qmd-query-expansion-train...\n", | |
| "Dataset loaded: 5157 examples\n", | |
| " Train: 4641, Eval: 516\n", | |
| "\n", | |
| "--- Example ---\n", | |
| "User: Expand this search query:\n", | |
| "\n", | |
| "buy refurbished laptops\n", | |
| "Assistant: lex: where to find\n", | |
| "lex: purchase options for\n", | |
| "vec: where to find refurbished laptops for sale?\n", | |
| "vec: purchase options for refurbished laptops\n", | |
| "hyde: The topic of buy refurbished laptops covers where to find refurbished laptops for sale?. Proper implementation follows established patterns and best pract\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "from datasets import load_dataset\n", | |
| "\n", | |
| "print(f\"Loading dataset: {DATASET}...\")\n", | |
| "dataset = load_dataset(DATASET, split=\"train\")\n", | |
| "print(f\"Dataset loaded: {len(dataset)} examples\")\n", | |
| "\n", | |
| "split = dataset.train_test_split(test_size=0.1, seed=42)\n", | |
| "train_dataset = split[\"train\"]\n", | |
| "eval_dataset = split[\"test\"]\n", | |
| "print(f\" Train: {len(train_dataset)}, Eval: {len(eval_dataset)}\")\n", | |
| "\n", | |
| "# Preview an example\n", | |
| "print(\"\\n--- Example ---\")\n", | |
| "ex = train_dataset[0][\"messages\"]\n", | |
| "print(f\"User: {ex[0]['content'][:200]}\")\n", | |
| "print(f\"Assistant: {ex[1]['content'][:300]}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 5. Configure & train (SFT with LoRA)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 16, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stderr", | |
| "output_type": "stream", | |
| "text": [ | |
| "warmup_ratio is deprecated and will be removed in v5.2. Use `warmup_steps` instead.\n" | |
| ] | |
| }, | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Initializing SFT trainer...\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "application/vnd.jupyter.widget-view+json": { | |
| "model_id": "d90cd3d13cf743f7a59eb55001e70d15", | |
| "version_major": 2, | |
| "version_minor": 0 | |
| }, | |
| "text/plain": [ | |
| "Loading weights: 0%| | 0/148 [00:00<?, ?it/s]" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Trainable params: 11,108,352\n", | |
| "Total params: 1,181,448,960\n", | |
| "\n", | |
| "π Starting SFT training...\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "text/html": [ | |
| "\n", | |
| " <div>\n", | |
| " \n", | |
| " <progress value='1455' max='1455' style='width:300px; height:20px; vertical-align: middle;'></progress>\n", | |
| " [1455/1455 2:44:40, Epoch 5/5]\n", | |
| " </div>\n", | |
| " <table border=\"1\" class=\"dataframe\">\n", | |
| " <thead>\n", | |
| " <tr style=\"text-align: left;\">\n", | |
| " <th>Step</th>\n", | |
| " <th>Training Loss</th>\n", | |
| " <th>Validation Loss</th>\n", | |
| " </tr>\n", | |
| " </thead>\n", | |
| " <tbody>\n", | |
| " <tr>\n", | |
| " <td>200</td>\n", | |
| " <td>0.528098</td>\n", | |
| " <td>0.544523</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <td>400</td>\n", | |
| " <td>0.483926</td>\n", | |
| " <td>0.520171</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <td>600</td>\n", | |
| " <td>0.383667</td>\n", | |
| " <td>0.520883</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <td>800</td>\n", | |
| " <td>0.386784</td>\n", | |
| " <td>0.522161</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <td>1000</td>\n", | |
| " <td>0.306549</td>\n", | |
| " <td>0.561082</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <td>1200</td>\n", | |
| " <td>0.250284</td>\n", | |
| " <td>0.598272</td>\n", | |
| " </tr>\n", | |
| " <tr>\n", | |
| " <td>1400</td>\n", | |
| " <td>0.243569</td>\n", | |
| " <td>0.605305</td>\n", | |
| " </tr>\n", | |
| " </tbody>\n", | |
| "</table><p>" | |
| ], | |
| "text/plain": [ | |
| "<IPython.core.display.HTML object>" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "\n", | |
| "Pushing LoRA adapter to Hub...\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "application/vnd.jupyter.widget-view+json": { | |
| "model_id": "923a43a9b5cb40e396a81ef04ef4e4ad", | |
| "version_major": 2, | |
| "version_minor": 0 | |
| }, | |
| "text/plain": [ | |
| "Processing Files (0 / 0) : | | 0.00B / 0.00B " | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "data": { | |
| "application/vnd.jupyter.widget-view+json": { | |
| "model_id": "59f6753f776d44c28d78e00f01739f47", | |
| "version_major": 2, | |
| "version_minor": 0 | |
| }, | |
| "text/plain": [ | |
| "New Data Upload : | | 0.00B / 0.00B " | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "data": { | |
| "application/vnd.jupyter.widget-view+json": { | |
| "model_id": "70b34b2ad06643d49abca66912aa9b5b", | |
| "version_major": 2, | |
| "version_minor": 0 | |
| }, | |
| "text/plain": [ | |
| " ...on-lfm2/training_args.bin: 100%|##########| 5.65kB / 5.65kB " | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "data": { | |
| "application/vnd.jupyter.widget-view+json": { | |
| "model_id": "dd56256481804bd3a32d5592002355ce", | |
| "version_major": 2, | |
| "version_minor": 0 | |
| }, | |
| "text/plain": [ | |
| " ...adapter_model.safetensors: 75%|#######5 | 33.5MB / 44.5MB " | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "name": "stderr", | |
| "output_type": "stream", | |
| "text": [ | |
| "No files have been modified since last commit. Skipping to prevent empty commit.\n", | |
| "WARNING:huggingface_hub.hf_api:No files have been modified since last commit. Skipping to prevent empty commit.\n" | |
| ] | |
| }, | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "β SFT adapter: https://huggingface.co/OrcsRise/qmd-query-expansion-lfm2-sft\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "from peft import LoraConfig\n", | |
| "from transformers import AutoTokenizer\n", | |
| "from trl import SFTTrainer, SFTConfig\n", | |
| "\n", | |
| "# SFT training config\n", | |
| "config = SFTConfig(\n", | |
| " output_dir=MODEL_NAME,\n", | |
| " push_to_hub=True,\n", | |
| " hub_model_id=OUTPUT_SFT,\n", | |
| " hub_strategy=\"every_save\",\n", | |
| "\n", | |
| " # Training hyperparams\n", | |
| " num_train_epochs=5,\n", | |
| " per_device_train_batch_size=4,\n", | |
| " gradient_accumulation_steps=4, # effective batch = 16\n", | |
| " learning_rate=2e-4,\n", | |
| " max_length=512,\n", | |
| "\n", | |
| " # Logging & saving\n", | |
| " logging_steps=10,\n", | |
| " save_strategy=\"steps\",\n", | |
| " save_steps=200,\n", | |
| " save_total_limit=2,\n", | |
| " eval_strategy=\"steps\",\n", | |
| " eval_steps=200,\n", | |
| "\n", | |
| " # Schedule\n", | |
| " warmup_ratio=0.03,\n", | |
| " lr_scheduler_type=\"cosine\",\n", | |
| " bf16=True,\n", | |
| " report_to=\"none\",\n", | |
| ")\n", | |
| "\n", | |
| "# LoRA config β LFM2 hybrid architecture targets\n", | |
| "# Different from standard transformers:\n", | |
| "# Attention: q_proj, k_proj, v_proj, out_proj\n", | |
| "# Input projection: in_proj\n", | |
| "# FFN/MLP (SwiGLU): w1, w2, w3\n", | |
| "peft_config = LoraConfig(\n", | |
| " r=16,\n", | |
| " lora_alpha=32,\n", | |
| " lora_dropout=0.0,\n", | |
| " bias=\"none\",\n", | |
| " task_type=\"CAUSAL_LM\",\n", | |
| " target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"out_proj\", \"in_proj\", \"w1\", \"w2\", \"w3\"],\n", | |
| ")\n", | |
| "\n", | |
| "print(\"Initializing SFT trainer...\")\n", | |
| "trainer = SFTTrainer(\n", | |
| " model=BASE_MODEL,\n", | |
| " train_dataset=train_dataset,\n", | |
| " eval_dataset=eval_dataset,\n", | |
| " args=config,\n", | |
| " peft_config=peft_config,\n", | |
| ")\n", | |
| "\n", | |
| "print(f\"Trainable params: {sum(p.numel() for p in trainer.model.parameters() if p.requires_grad):,}\")\n", | |
| "print(f\"Total params: {sum(p.numel() for p in trainer.model.parameters()):,}\")\n", | |
| "print(\"\\nπ Starting SFT training...\")\n", | |
| "trainer.train()\n", | |
| "\n", | |
| "print(\"\\nPushing LoRA adapter to Hub...\")\n", | |
| "trainer.push_to_hub()\n", | |
| "print(f\"β SFT adapter: https://huggingface.co/{OUTPUT_SFT}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 6. Merge LoRA adapter with base model" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 17, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stderr", | |
| "output_type": "stream", | |
| "text": [ | |
| "`torch_dtype` is deprecated! Use `dtype` instead!\n" | |
| ] | |
| }, | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Loading base model: LiquidAI/LFM2-1.2B...\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "application/vnd.jupyter.widget-view+json": { | |
| "model_id": "833363fff1304808ba2d230da79fb87f", | |
| "version_major": 2, | |
| "version_minor": 0 | |
| }, | |
| "text/plain": [ | |
| "Loading weights: 0%| | 0/148 [00:00<?, ?it/s]" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Merging SFT adapter from: qmd-query-expansion-lfm2...\n", | |
| "Saving merged model to /tmp/merged_model...\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "application/vnd.jupyter.widget-view+json": { | |
| "model_id": "ee9e3a4369cc4a158b5d2c7845c541d1", | |
| "version_major": 2, | |
| "version_minor": 0 | |
| }, | |
| "text/plain": [ | |
| "Writing model shards: 0%| | 0/1 [00:00<?, ?it/s]" | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "β Merged model saved\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "import torch\n", | |
| "from peft import PeftModel\n", | |
| "from transformers import AutoModelForCausalLM, AutoTokenizer\n", | |
| "\n", | |
| "print(f\"Loading base model: {BASE_MODEL}...\")\n", | |
| "base_model = AutoModelForCausalLM.from_pretrained(\n", | |
| " BASE_MODEL, torch_dtype=torch.bfloat16, device_map=\"auto\", trust_remote_code=True,\n", | |
| ")\n", | |
| "\n", | |
| "print(f\"Merging SFT adapter from: {MODEL_NAME}...\")\n", | |
| "base_model.config.tie_word_embeddings = False\n", | |
| "model = PeftModel.from_pretrained(base_model, MODEL_NAME, local_files_only=True)\n", | |
| "model = model.merge_and_unload()\n", | |
| "\n", | |
| "tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)\n", | |
| "\n", | |
| "# Save merged model\n", | |
| "merged_dir = \"/tmp/merged_model\"\n", | |
| "print(f\"Saving merged model to {merged_dir}...\")\n", | |
| "model.save_pretrained(merged_dir, safe_serialization=True)\n", | |
| "tokenizer.save_pretrained(merged_dir)\n", | |
| "print(\"β Merged model saved\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 7. Convert to GGUF" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 18, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Setting up llama.cpp...\n", | |
| "W: Skipping acquire of configured file 'main/source/Sources' as repository 'https://r2u.stat.illinois.edu/ubuntu jammy InRelease' does not seem to provide it (sources.list entry misspelt?)\n", | |
| "Cloning into '/tmp/llama.cpp'...\n", | |
| "remote: Enumerating objects: 2546, done.\u001b[K\n", | |
| "remote: Counting objects: 100% (2546/2546), done.\u001b[K\n", | |
| "remote: Compressing objects: 100% (2033/2033), done.\u001b[K\n", | |
| "remote: Total 2546 (delta 514), reused 1659 (delta 442), pack-reused 0 (from 0)\u001b[K\n", | |
| "Receiving objects: 100% (2546/2546), 27.54 MiB | 18.97 MiB/s, done.\n", | |
| "Resolving deltas: 100% (514/514), done.\n", | |
| "\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m61.0/61.0 kB\u001b[0m \u001b[31m3.7 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", | |
| "\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m44.0/44.0 kB\u001b[0m \u001b[31m4.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", | |
| "\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m1.3/1.3 MB\u001b[0m \u001b[31m82.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", | |
| "\u001b[2K \u001b[90mβββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m12.7/12.7 MB\u001b[0m \u001b[31m122.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m00:01\u001b[0m\n", | |
| "\u001b[?25h Preparing metadata (setup.py) ... \u001b[?25l\u001b[?25hdone\n", | |
| "\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m18.0/18.0 MB\u001b[0m \u001b[31m93.9 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m00:01\u001b[0m\n", | |
| "\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m12.0/12.0 MB\u001b[0m \u001b[31m125.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m0:01\u001b[0m\n", | |
| "\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m294.9/294.9 kB\u001b[0m \u001b[31m32.5 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", | |
| "\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m178.6/178.6 MB\u001b[0m \u001b[31m6.3 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m00:01\u001b[0m\n", | |
| "\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m6.2/6.2 MB\u001b[0m \u001b[31m146.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m00:01\u001b[0m\n", | |
| "\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m343.6/343.6 kB\u001b[0m \u001b[31m39.0 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", | |
| "\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m566.4/566.4 kB\u001b[0m \u001b[31m50.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", | |
| "\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m1.1/1.1 MB\u001b[0m \u001b[31m75.4 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", | |
| "\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m54.5/54.5 kB\u001b[0m \u001b[31m7.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", | |
| "\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m45.3/45.3 kB\u001b[0m \u001b[31m5.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", | |
| "\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m98.2/98.2 kB\u001b[0m \u001b[31m13.2 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", | |
| "\u001b[?25h Building wheel for wget (setup.py) ... \u001b[?25l\u001b[?25hdone\n", | |
| "\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", | |
| "google-colab 1.0.0 requires pandas==2.2.2, but you have pandas 2.2.3 which is incompatible.\n", | |
| "opencv-python-headless 4.13.0.92 requires numpy>=2; python_version >= \"3.9\", but you have numpy 1.26.4 which is incompatible.\n", | |
| "jaxlib 0.7.2 requires numpy>=2.0, but you have numpy 1.26.4 which is incompatible.\n", | |
| "typer-slim 0.23.1 requires typer>=0.23.1, but you have typer 0.15.4 which is incompatible.\n", | |
| "shap 0.50.0 requires numpy>=2, but you have numpy 1.26.4 which is incompatible.\n", | |
| "tobler 0.13.0 requires numpy>=2.0, but you have numpy 1.26.4 which is incompatible.\n", | |
| "pytensor 2.37.0 requires numpy>=2.0, but you have numpy 1.26.4 which is incompatible.\n", | |
| "grpcio-status 1.71.2 requires protobuf<6.0dev,>=5.26.1, but you have protobuf 4.25.8 which is incompatible.\n", | |
| "opentelemetry-proto 1.38.0 requires protobuf<7.0,>=5.0, but you have protobuf 4.25.8 which is incompatible.\n", | |
| "jax 0.7.2 requires numpy>=2.0, but you have numpy 1.26.4 which is incompatible.\n", | |
| "ydf 0.15.0 requires protobuf<7.0.0,>=5.29.1, but you have protobuf 4.25.8 which is incompatible.\n", | |
| "opencv-contrib-python 4.13.0.92 requires numpy>=2; python_version >= \"3.9\", but you have numpy 1.26.4 which is incompatible.\n", | |
| "torchaudio 2.9.0+cu128 requires torch==2.9.0, but you have torch 2.6.0+cpu which is incompatible.\n", | |
| "torchvision 0.24.0+cu128 requires torch==2.9.0, but you have torch 2.6.0+cpu which is incompatible.\n", | |
| "opencv-python 4.13.0.92 requires numpy>=2; python_version >= \"3.9\", but you have numpy 1.26.4 which is incompatible.\n", | |
| "rasterio 1.5.0 requires numpy>=2, but you have numpy 1.26.4 which is incompatible.\n", | |
| "grain 0.2.15 requires protobuf>=5.28.3, but you have protobuf 4.25.8 which is incompatible.\u001b[0m\u001b[31m\n", | |
| "\u001b[0m\n", | |
| "Building llama-quantize...\n", | |
| "\n", | |
| "Converting to FP16 GGUF...\n", | |
| "INFO:hf-to-gguf:Loading model: merged_model\n", | |
| "INFO:hf-to-gguf:Model architecture: Lfm2ForCausalLM\n", | |
| "INFO:hf-to-gguf:gguf: indexing model part 'model.safetensors'\n", | |
| "INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only\n", | |
| "INFO:hf-to-gguf:Exporting model...\n", | |
| "INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {2048, 65536}\n", | |
| "INFO:hf-to-gguf:token_embd_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.0.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.0.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.0.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.1.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.1.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.1.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.10.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.10.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.10.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.10.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.10.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.10.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.11.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.11.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.11.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.12.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.12.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.12.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.12.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.12.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.12.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.13.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.13.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.13.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.14.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.14.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.14.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.14.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.14.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.14.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.14.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.15.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.15.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.15.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.15.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.15.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.15.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.2.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.2.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.3.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.3.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.3.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.4.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.4.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.4.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.5.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.5.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.6.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.6.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.6.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.7.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.7.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.7.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.8.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.8.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.8.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.8.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.8.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.8.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.8.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.8.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.8.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.9.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.9.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.9.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.9.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.9.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.9.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:Set meta model\n", | |
| "INFO:hf-to-gguf:Set model parameters\n", | |
| "INFO:hf-to-gguf:gguf: context length = 128000\n", | |
| "INFO:hf-to-gguf:gguf: embedding length = 2048\n", | |
| "INFO:hf-to-gguf:gguf: feed forward length = 12288\n", | |
| "INFO:hf-to-gguf:gguf: head count = 32\n", | |
| "INFO:hf-to-gguf:gguf: key-value head count = [0, 0, 8, 0, 0, 8, 0, 0, 8, 0, 8, 0, 8, 0, 8, 0]\n", | |
| "WARNING:hf-to-gguf:Unknown RoPE type: default\n", | |
| "INFO:hf-to-gguf:gguf: rope scaling type = NONE\n", | |
| "INFO:hf-to-gguf:gguf: rope theta = 1000000.0\n", | |
| "INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05\n", | |
| "INFO:hf-to-gguf:gguf: file type = 1\n", | |
| "WARNING:gguf.gguf_writer:Duplicated key name 'lfm2.attention.layer_norm_rms_epsilon', overwriting it with new value 1e-05 of type FLOAT32\n", | |
| "WARNING:gguf.gguf_writer:Duplicated key name 'lfm2.feed_forward_length', overwriting it with new value 8192 of type UINT32\n", | |
| "INFO:hf-to-gguf:Set model quantization version\n", | |
| "INFO:hf-to-gguf:Set model tokenizer\n", | |
| "INFO:numexpr.utils:NumExpr defaulting to 2 threads.\n", | |
| "Traceback (most recent call last):\n", | |
| " File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 12012, in <module>\n", | |
| " main()\n", | |
| " File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 12006, in main\n", | |
| " model_instance.write()\n", | |
| " File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 689, in write\n", | |
| " self.prepare_metadata(vocab_only=False)\n", | |
| " File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 830, in prepare_metadata\n", | |
| " self.set_vocab()\n", | |
| " File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 802, in set_vocab\n", | |
| " self._set_vocab_gpt2()\n", | |
| " File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 1303, in _set_vocab_gpt2\n", | |
| " tokens, toktypes, tokpre = self.get_vocab_base()\n", | |
| " ^^^^^^^^^^^^^^^^^^^^^\n", | |
| " File \"/tmp/llama.cpp/convert_hf_to_gguf.py\", line 978, in get_vocab_base\n", | |
| " tokenizer = AutoTokenizer.from_pretrained(self.dir_model)\n", | |
| " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n", | |
| " File \"/usr/local/lib/python3.12/dist-packages/transformers/models/auto/tokenization_auto.py\", line 1153, in from_pretrained\n", | |
| " raise ValueError(\n", | |
| "ValueError: Tokenizer class TokenizersBackend does not exist or is not currently imported.\n" | |
| ] | |
| }, | |
| { | |
| "ename": "FileNotFoundError", | |
| "evalue": "[Errno 2] No such file or directory: '/tmp/gguf_output/qmd-query-expansion-lfm2-f16.gguf'", | |
| "output_type": "error", | |
| "traceback": [ | |
| "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", | |
| "\u001b[0;31mFileNotFoundError\u001b[0m Traceback (most recent call last)", | |
| "\u001b[0;32m/tmp/ipython-input-3038749745.py\u001b[0m in \u001b[0;36m<cell line: 0>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 22\u001b[0m \u001b[0mget_ipython\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0msystem\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m'python /tmp/llama.cpp/convert_hf_to_gguf.py /tmp/merged_model --outfile {fp16_file} --outtype f16'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 23\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 24\u001b[0;31m \u001b[0msize_mb\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mgetsize\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mfp16_file\u001b[0m\u001b[0;34m)\u001b[0m \u001b[0;34m/\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0;36m1024\u001b[0m \u001b[0;34m*\u001b[0m \u001b[0;36m1024\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 25\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\" FP16: {size_mb:.1f} MB\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 26\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n", | |
| "\u001b[0;32m/usr/lib/python3.12/genericpath.py\u001b[0m in \u001b[0;36mgetsize\u001b[0;34m(filename)\u001b[0m\n", | |
| "\u001b[0;31mFileNotFoundError\u001b[0m: [Errno 2] No such file or directory: '/tmp/gguf_output/qmd-query-expansion-lfm2-f16.gguf'" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "import subprocess, sys, os\n", | |
| "\n", | |
| "# Setup llama.cpp\n", | |
| "print(\"Setting up llama.cpp...\")\n", | |
| "!apt-get update -qq && apt-get install -y -qq build-essential cmake git > /dev/null 2>&1\n", | |
| "\n", | |
| "if not os.path.exists(\"/tmp/llama.cpp\"):\n", | |
| " !git clone --depth 1 https://github.com/ggerganov/llama.cpp.git /tmp/llama.cpp\n", | |
| "!pip install -q -r /tmp/llama.cpp/requirements.txt\n", | |
| "\n", | |
| "# Build quantize tool\n", | |
| "print(\"\\nBuilding llama-quantize...\")\n", | |
| "!cmake -B /tmp/llama.cpp/build -S /tmp/llama.cpp -DGGML_CUDA=OFF > /dev/null 2>&1\n", | |
| "!cmake --build /tmp/llama.cpp/build --target llama-quantize -j 4 > /dev/null 2>&1\n", | |
| "\n", | |
| "# Convert to FP16 GGUF\n", | |
| "gguf_dir = \"/tmp/gguf_output\"\n", | |
| "os.makedirs(gguf_dir, exist_ok=True)\n", | |
| "fp16_file = f\"{gguf_dir}/{MODEL_NAME}-f16.gguf\"\n", | |
| "\n", | |
| "print(\"\\nConverting to FP16 GGUF...\")\n", | |
| "!python /tmp/llama.cpp/convert_hf_to_gguf.py /tmp/merged_model --outfile {fp16_file} --outtype f16\n", | |
| "\n", | |
| "size_mb = os.path.getsize(fp16_file) / (1024 * 1024)\n", | |
| "print(f\" FP16: {size_mb:.1f} MB\")\n", | |
| "\n", | |
| "# Quantize to Q4_K_M, Q5_K_M, Q8_0\n", | |
| "quantize_bin = \"/tmp/llama.cpp/build/bin/llama-quantize\"\n", | |
| "print(\"\\nQuantizing...\")\n", | |
| "quantized = []\n", | |
| "for qtype in [\"Q4_K_M\", \"Q5_K_M\", \"Q8_0\"]:\n", | |
| " out = f\"{gguf_dir}/{MODEL_NAME}-{qtype.lower()}.gguf\"\n", | |
| " result = subprocess.run([quantize_bin, fp16_file, out, qtype], capture_output=True, text=True)\n", | |
| " if os.path.exists(out):\n", | |
| " qsize = os.path.getsize(out) / (1024 * 1024)\n", | |
| " print(f\" β {qtype}: {qsize:.1f} MB\")\n", | |
| " quantized.append((out, qtype))\n", | |
| " else:\n", | |
| " print(f\" β {qtype} failed\")\n", | |
| "\n", | |
| "# Cleanup FP16 to save disk\n", | |
| "os.remove(fp16_file)\n", | |
| "print(f\"\\nπ GGUF files ready in {gguf_dir}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 19, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m10.4/10.4 MB\u001b[0m \u001b[31m87.1 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m:00:01\u001b[0m0:01\u001b[0m\n", | |
| "\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m553.3/553.3 kB\u001b[0m \u001b[31m48.6 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", | |
| "\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m56.4/56.4 kB\u001b[0m \u001b[31m6.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", | |
| "\u001b[2K \u001b[90mββββββββββββββββββββββββββββββββββββββββ\u001b[0m \u001b[32m108.3/108.3 kB\u001b[0m \u001b[31m13.8 MB/s\u001b[0m eta \u001b[36m0:00:00\u001b[0m\n", | |
| "\u001b[?25h\u001b[31mERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.\n", | |
| "rasterio 1.5.0 requires numpy>=2, but you have numpy 1.26.4 which is incompatible.\u001b[0m\u001b[31m\n", | |
| "\u001b[0mConverting to FP16 GGUF...\n", | |
| "INFO:hf-to-gguf:Loading model: merged_model\n", | |
| "INFO:hf-to-gguf:Model architecture: Lfm2ForCausalLM\n", | |
| "INFO:hf-to-gguf:gguf: indexing model part 'model.safetensors'\n", | |
| "INFO:gguf.gguf_writer:gguf: This GGUF file is for Little Endian only\n", | |
| "INFO:hf-to-gguf:Exporting model...\n", | |
| "INFO:hf-to-gguf:token_embd.weight, torch.bfloat16 --> F16, shape = {2048, 65536}\n", | |
| "INFO:hf-to-gguf:token_embd_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.0.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.0.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.0.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.0.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.0.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.0.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.0.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.0.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.1.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.1.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.1.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.1.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.1.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.1.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.1.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.1.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.10.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.10.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.10.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.10.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.10.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.10.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.10.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.10.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.10.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.10.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.10.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.11.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.11.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.11.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.11.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.11.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.11.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.11.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.11.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.12.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.12.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.12.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.12.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.12.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.12.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.12.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.12.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.12.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.12.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.12.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.13.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.13.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.13.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.13.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.13.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.13.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.13.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.13.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.14.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.14.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.14.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.14.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.14.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.14.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.14.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.14.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.14.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.14.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.14.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.15.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.15.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.15.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.15.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.15.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.15.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.15.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.15.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.2.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.2.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.2.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.2.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.2.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.2.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.2.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.2.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.2.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.2.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.2.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.3.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.3.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.3.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.3.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.3.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.3.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.3.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.3.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.4.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.4.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.4.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.4.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.4.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.4.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.4.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.4.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.5.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.5.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.5.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.5.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.5.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.5.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.5.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.5.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.5.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.5.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.5.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.6.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.6.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.6.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.6.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.6.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.6.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.6.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.6.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.7.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.7.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.7.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.7.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.7.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.7.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.7.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.7.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.8.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.8.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.8.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.8.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.8.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.8.attn_k_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.8.attn_k.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.8.attn_output.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.8.attn_q_norm.weight, torch.bfloat16 --> F32, shape = {64}\n", | |
| "INFO:hf-to-gguf:blk.8.attn_q.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.8.attn_v.weight, torch.bfloat16 --> F16, shape = {2048, 512}\n", | |
| "INFO:hf-to-gguf:blk.9.shortconv.conv.weight, torch.bfloat16 --> F32, shape = {3, 2048}\n", | |
| "INFO:hf-to-gguf:blk.9.shortconv.in_proj.weight, torch.bfloat16 --> F16, shape = {2048, 6144}\n", | |
| "INFO:hf-to-gguf:blk.9.shortconv.out_proj.weight, torch.bfloat16 --> F16, shape = {2048, 2048}\n", | |
| "INFO:hf-to-gguf:blk.9.ffn_gate.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.9.ffn_down.weight, torch.bfloat16 --> F16, shape = {8192, 2048}\n", | |
| "INFO:hf-to-gguf:blk.9.ffn_up.weight, torch.bfloat16 --> F16, shape = {2048, 8192}\n", | |
| "INFO:hf-to-gguf:blk.9.ffn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:blk.9.attn_norm.weight, torch.bfloat16 --> F32, shape = {2048}\n", | |
| "INFO:hf-to-gguf:Set meta model\n", | |
| "INFO:hf-to-gguf:Set model parameters\n", | |
| "INFO:hf-to-gguf:gguf: context length = 128000\n", | |
| "INFO:hf-to-gguf:gguf: embedding length = 2048\n", | |
| "INFO:hf-to-gguf:gguf: feed forward length = 12288\n", | |
| "INFO:hf-to-gguf:gguf: head count = 32\n", | |
| "INFO:hf-to-gguf:gguf: key-value head count = [0, 0, 8, 0, 0, 8, 0, 0, 8, 0, 8, 0, 8, 0, 8, 0]\n", | |
| "WARNING:hf-to-gguf:Unknown RoPE type: default\n", | |
| "INFO:hf-to-gguf:gguf: rope scaling type = NONE\n", | |
| "INFO:hf-to-gguf:gguf: rope theta = 1000000.0\n", | |
| "INFO:hf-to-gguf:gguf: rms norm epsilon = 1e-05\n", | |
| "INFO:hf-to-gguf:gguf: file type = 1\n", | |
| "WARNING:gguf.gguf_writer:Duplicated key name 'lfm2.attention.layer_norm_rms_epsilon', overwriting it with new value 1e-05 of type FLOAT32\n", | |
| "WARNING:gguf.gguf_writer:Duplicated key name 'lfm2.feed_forward_length', overwriting it with new value 8192 of type UINT32\n", | |
| "INFO:hf-to-gguf:Set model quantization version\n", | |
| "INFO:hf-to-gguf:Set model tokenizer\n", | |
| "INFO:numexpr.utils:NumExpr defaulting to 2 threads.\n", | |
| "WARNING:gguf.vocab:Unknown separator token '<|startoftext|>' in TemplateProcessing<pair>\n", | |
| "INFO:gguf.vocab:Adding 63683 merge(s).\n", | |
| "INFO:gguf.vocab:Setting special token type bos to 1\n", | |
| "INFO:gguf.vocab:Setting special token type eos to 7\n", | |
| "INFO:gguf.vocab:Setting special token type pad to 0\n", | |
| "INFO:gguf.vocab:Setting add_bos_token to True\n", | |
| "INFO:gguf.vocab:Setting add_sep_token to False\n", | |
| "INFO:gguf.vocab:Setting chat_template to {{- bos_token -}}\n", | |
| "{%- set system_prompt = \"\" -%}\n", | |
| "{%- set ns = namespace(system_prompt=\"\") -%}\n", | |
| "{%- if messages[0][\"role\"] == \"system\" -%}\n", | |
| "\t{%- set ns.system_prompt = messages[0][\"content\"] -%}\n", | |
| "\t{%- set messages = messages[1:] -%}\n", | |
| "{%- endif -%}\n", | |
| "{%- if tools -%}\n", | |
| "\t{%- set ns.system_prompt = ns.system_prompt + (\"\\n\" if ns.system_prompt else \"\") + \"List of tools: <|tool_list_start|>[\" -%}\n", | |
| "\t{%- for tool in tools -%}\n", | |
| "\t\t{%- if tool is not string -%}\n", | |
| " {%- set tool = tool | tojson -%}\n", | |
| "\t\t{%- endif -%}\n", | |
| "\t\t{%- set ns.system_prompt = ns.system_prompt + tool -%}\n", | |
| " {%- if not loop.last -%}\n", | |
| " {%- set ns.system_prompt = ns.system_prompt + \", \" -%}\n", | |
| " {%- endif -%}\n", | |
| "\t{%- endfor -%}\n", | |
| "\t{%- set ns.system_prompt = ns.system_prompt + \"]<|tool_list_end|>\" -%}\n", | |
| "{%- endif -%}\n", | |
| "{%- if ns.system_prompt -%}\n", | |
| "\t{{- \"<|im_start|>system\\n\" + ns.system_prompt + \"<|im_end|>\\n\" -}}\n", | |
| "{%- endif -%}\n", | |
| "{%- for message in messages -%}\n", | |
| "\t{{- \"<|im_start|>\" + message[\"role\"] + \"\\n\" -}}\n", | |
| "\t{%- set content = message[\"content\"] -%}\n", | |
| "\t{%- if content is not string -%}\n", | |
| "\t\t{%- set content = content | tojson -%}\n", | |
| "\t{%- endif -%}\n", | |
| "\t{%- if message[\"role\"] == \"tool\" -%}\n", | |
| "\t\t{%- set content = \"<|tool_response_start|>\" + content + \"<|tool_response_end|>\" -%}\n", | |
| "\t{%- endif -%}\n", | |
| "\t{{- content + \"<|im_end|>\\n\" -}}\n", | |
| "{%- endfor -%}\n", | |
| "{%- if add_generation_prompt -%}\n", | |
| "\t{{- \"<|im_start|>assistant\\n\" -}}\n", | |
| "{%- endif -%}\n", | |
| "INFO:gguf.gguf_writer:Writing the following files:\n", | |
| "INFO:gguf.gguf_writer:/tmp/gguf_output/qmd-query-expansion-lfm2-f16.gguf: n_tensors = 148, total_size = 2.3G\n", | |
| "Writing: 100% 2.34G/2.34G [00:25<00:00, 92.3Mbyte/s]\n", | |
| "INFO:hf-to-gguf:Model successfully exported to /tmp/gguf_output/qmd-query-expansion-lfm2-f16.gguf\n", | |
| " FP16: 2234.8 MB\n", | |
| "Quantizing to Q8_0...\n", | |
| "main: build = 1 (abb9f3c)\n", | |
| "main: built with GNU 11.4.0 for Linux x86_64\n", | |
| "main: quantizing '/tmp/gguf_output/qmd-query-expansion-lfm2-f16.gguf' to '/tmp/gguf_output/qmd-query-expansion-lfm2-q8_0.gguf' as Q8_0\n", | |
| "llama_model_loader: loaded meta data with 27 key-value pairs and 148 tensors from /tmp/gguf_output/qmd-query-expansion-lfm2-f16.gguf (version GGUF V3 (latest))\n", | |
| "llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.\n", | |
| "llama_model_loader: - kv 0: general.architecture str = lfm2\n", | |
| "llama_model_loader: - kv 1: general.type str = model\n", | |
| "llama_model_loader: - kv 2: general.name str = Merged_Model\n", | |
| "llama_model_loader: - kv 3: general.size_label str = 1.2B\n", | |
| "llama_model_loader: - kv 4: lfm2.block_count u32 = 16\n", | |
| "llama_model_loader: - kv 5: lfm2.context_length u32 = 128000\n", | |
| "llama_model_loader: - kv 6: lfm2.embedding_length u32 = 2048\n", | |
| "llama_model_loader: - kv 7: lfm2.feed_forward_length u32 = 8192\n", | |
| "llama_model_loader: - kv 8: lfm2.attention.head_count u32 = 32\n", | |
| "llama_model_loader: - kv 9: lfm2.attention.head_count_kv arr[i32,16] = [0, 0, 8, 0, 0, 8, 0, 0, 8, 0, 8, 0, ...\n", | |
| "llama_model_loader: - kv 10: lfm2.rope.freq_base f32 = 1000000.000000\n", | |
| "llama_model_loader: - kv 11: lfm2.attention.layer_norm_rms_epsilon f32 = 0.000010\n", | |
| "llama_model_loader: - kv 12: general.file_type u32 = 1\n", | |
| "llama_model_loader: - kv 13: lfm2.vocab_size u32 = 65536\n", | |
| "llama_model_loader: - kv 14: lfm2.shortconv.l_cache u32 = 3\n", | |
| "llama_model_loader: - kv 15: general.quantization_version u32 = 2\n", | |
| "llama_model_loader: - kv 16: tokenizer.ggml.model str = gpt2\n", | |
| "llama_model_loader: - kv 17: tokenizer.ggml.pre str = lfm2\n", | |
| "llama_model_loader: - kv 18: tokenizer.ggml.tokens arr[str,65536] = [\"<|pad|>\", \"<|startoftext|>\", \"<|end...\n", | |
| "llama_model_loader: - kv 19: tokenizer.ggml.token_type arr[i32,65536] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...\n", | |
| "llama_model_loader: - kv 20: tokenizer.ggml.merges arr[str,63683] = [\"Δ Δ\", \"Δ ΔΔ\", \"ΔΔ Δ\", \"Δ οΏ½...\n", | |
| "llama_model_loader: - kv 21: tokenizer.ggml.bos_token_id u32 = 1\n", | |
| "llama_model_loader: - kv 22: tokenizer.ggml.eos_token_id u32 = 7\n", | |
| "llama_model_loader: - kv 23: tokenizer.ggml.padding_token_id u32 = 0\n", | |
| "llama_model_loader: - kv 24: tokenizer.ggml.add_bos_token bool = true\n", | |
| "llama_model_loader: - kv 25: tokenizer.ggml.add_sep_token bool = false\n", | |
| "llama_model_loader: - kv 26: tokenizer.chat_template str = {{- bos_token -}}\\n{%- set system_prom...\n", | |
| "llama_model_loader: - type f32: 55 tensors\n", | |
| "llama_model_loader: - type f16: 93 tensors\n", | |
| "[ 1/ 148] token_embd.weight - [ 2048, 65536, 1, 1], type = f16, converting to q8_0 .. size = 256.00 MiB -> 136.00 MiB\n", | |
| "[ 2/ 148] token_embd_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 3/ 148] blk.0.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 4/ 148] blk.0.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 5/ 148] blk.0.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 6/ 148] blk.0.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 7/ 148] blk.0.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 8/ 148] blk.0.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n", | |
| "[ 9/ 148] blk.0.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n", | |
| "[ 10/ 148] blk.0.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 11/ 148] blk.1.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 12/ 148] blk.1.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 13/ 148] blk.1.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 14/ 148] blk.1.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 15/ 148] blk.1.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 16/ 148] blk.1.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n", | |
| "[ 17/ 148] blk.1.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n", | |
| "[ 18/ 148] blk.1.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 19/ 148] blk.2.attn_k.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n", | |
| "[ 20/ 148] blk.2.attn_k_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n", | |
| "[ 21/ 148] blk.2.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 22/ 148] blk.2.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 23/ 148] blk.2.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 24/ 148] blk.2.attn_q_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n", | |
| "[ 25/ 148] blk.2.attn_v.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n", | |
| "[ 26/ 148] blk.2.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 27/ 148] blk.2.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 28/ 148] blk.2.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 29/ 148] blk.2.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 30/ 148] blk.3.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 31/ 148] blk.3.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 32/ 148] blk.3.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 33/ 148] blk.3.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 34/ 148] blk.3.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 35/ 148] blk.3.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n", | |
| "[ 36/ 148] blk.3.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n", | |
| "[ 37/ 148] blk.3.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 38/ 148] blk.4.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 39/ 148] blk.4.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 40/ 148] blk.4.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 41/ 148] blk.4.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 42/ 148] blk.4.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 43/ 148] blk.4.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n", | |
| "[ 44/ 148] blk.4.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n", | |
| "[ 45/ 148] blk.4.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 46/ 148] blk.5.attn_k.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n", | |
| "[ 47/ 148] blk.5.attn_k_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n", | |
| "[ 48/ 148] blk.5.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 49/ 148] blk.5.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 50/ 148] blk.5.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 51/ 148] blk.5.attn_q_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n", | |
| "[ 52/ 148] blk.5.attn_v.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n", | |
| "[ 53/ 148] blk.5.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 54/ 148] blk.5.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 55/ 148] blk.5.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 56/ 148] blk.5.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 57/ 148] blk.6.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 58/ 148] blk.6.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 59/ 148] blk.6.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 60/ 148] blk.6.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 61/ 148] blk.6.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 62/ 148] blk.6.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n", | |
| "[ 63/ 148] blk.6.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n", | |
| "[ 64/ 148] blk.6.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 65/ 148] blk.7.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 66/ 148] blk.7.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 67/ 148] blk.7.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 68/ 148] blk.7.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 69/ 148] blk.7.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 70/ 148] blk.7.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n", | |
| "[ 71/ 148] blk.7.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n", | |
| "[ 72/ 148] blk.7.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 73/ 148] blk.8.attn_k.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n", | |
| "[ 74/ 148] blk.8.attn_k_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n", | |
| "[ 75/ 148] blk.8.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 76/ 148] blk.8.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 77/ 148] blk.8.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 78/ 148] blk.8.attn_q_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n", | |
| "[ 79/ 148] blk.8.attn_v.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n", | |
| "[ 80/ 148] blk.8.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 81/ 148] blk.8.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 82/ 148] blk.8.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 83/ 148] blk.8.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 84/ 148] blk.9.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 85/ 148] blk.9.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 86/ 148] blk.9.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 87/ 148] blk.9.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 88/ 148] blk.9.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 89/ 148] blk.9.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n", | |
| "[ 90/ 148] blk.9.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n", | |
| "[ 91/ 148] blk.9.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 92/ 148] blk.10.attn_k.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n", | |
| "[ 93/ 148] blk.10.attn_k_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n", | |
| "[ 94/ 148] blk.10.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 95/ 148] blk.10.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 96/ 148] blk.10.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 97/ 148] blk.10.attn_q_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n", | |
| "[ 98/ 148] blk.10.attn_v.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n", | |
| "[ 99/ 148] blk.10.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 100/ 148] blk.10.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 101/ 148] blk.10.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 102/ 148] blk.10.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 103/ 148] blk.11.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 104/ 148] blk.11.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 105/ 148] blk.11.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 106/ 148] blk.11.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 107/ 148] blk.11.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 108/ 148] blk.11.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n", | |
| "[ 109/ 148] blk.11.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n", | |
| "[ 110/ 148] blk.11.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 111/ 148] blk.12.attn_k.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n", | |
| "[ 112/ 148] blk.12.attn_k_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n", | |
| "[ 113/ 148] blk.12.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 114/ 148] blk.12.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 115/ 148] blk.12.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 116/ 148] blk.12.attn_q_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n", | |
| "[ 117/ 148] blk.12.attn_v.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n", | |
| "[ 118/ 148] blk.12.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 119/ 148] blk.12.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 120/ 148] blk.12.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 121/ 148] blk.12.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 122/ 148] blk.13.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 123/ 148] blk.13.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 124/ 148] blk.13.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 125/ 148] blk.13.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 126/ 148] blk.13.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 127/ 148] blk.13.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n", | |
| "[ 128/ 148] blk.13.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n", | |
| "[ 129/ 148] blk.13.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 130/ 148] blk.14.attn_k.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n", | |
| "[ 131/ 148] blk.14.attn_k_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n", | |
| "[ 132/ 148] blk.14.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 133/ 148] blk.14.attn_output.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 134/ 148] blk.14.attn_q.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "[ 135/ 148] blk.14.attn_q_norm.weight - [ 64, 1, 1, 1], type = f32, size = 0.000 MiB\n", | |
| "[ 136/ 148] blk.14.attn_v.weight - [ 2048, 512, 1, 1], type = f16, converting to q8_0 .. size = 2.00 MiB -> 1.06 MiB\n", | |
| "[ 137/ 148] blk.14.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 138/ 148] blk.14.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 139/ 148] blk.14.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 140/ 148] blk.14.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 141/ 148] blk.15.attn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 142/ 148] blk.15.ffn_down.weight - [ 8192, 2048, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 143/ 148] blk.15.ffn_gate.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 144/ 148] blk.15.ffn_norm.weight - [ 2048, 1, 1, 1], type = f32, size = 0.008 MiB\n", | |
| "[ 145/ 148] blk.15.ffn_up.weight - [ 2048, 8192, 1, 1], type = f16, converting to q8_0 .. size = 32.00 MiB -> 17.00 MiB\n", | |
| "[ 146/ 148] blk.15.shortconv.conv.weight - [ 3, 2048, 1, 1], type = f32, size = 0.023 MiB\n", | |
| "[ 147/ 148] blk.15.shortconv.in_proj.weight - [ 2048, 6144, 1, 1], type = f16, converting to q8_0 .. size = 24.00 MiB -> 12.75 MiB\n", | |
| "[ 148/ 148] blk.15.shortconv.out_proj.weight - [ 2048, 2048, 1, 1], type = f16, converting to q8_0 .. size = 8.00 MiB -> 4.25 MiB\n", | |
| "llama_model_quantize_impl: model size = 2232.50 MiB\n", | |
| "llama_model_quantize_impl: quant size = 1186.25 MiB\n", | |
| "\n", | |
| "main: quantize time = 20068.41 ms\n", | |
| "main: total time = 20068.42 ms\n", | |
| " Q8_0: 1188.5 MB\n", | |
| "β GGUF conversion complete!\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "!pip install -q --upgrade tokenizers transformers\n", | |
| "import os, subprocess\n", | |
| "gguf_dir = \"/tmp/gguf_output\"\n", | |
| "merged_dir = \"/tmp/merged_model\"\n", | |
| "fp16_file = f\"{gguf_dir}/{MODEL_NAME}-f16.gguf\"\n", | |
| "# Retry FP16 conversion\n", | |
| "print(\"Converting to FP16 GGUF...\")\n", | |
| "!python /tmp/llama.cpp/convert_hf_to_gguf.py {merged_dir} --outfile {fp16_file} --outtype f16\n", | |
| "size_mb = os.path.getsize(fp16_file) / (1024 * 1024)\n", | |
| "print(f\" FP16: {size_mb:.1f} MB\")\n", | |
| "# Quantize to Q8_0\n", | |
| "q8_file = f\"{gguf_dir}/{MODEL_NAME}-q8_0.gguf\"\n", | |
| "print(\"Quantizing to Q8_0...\")\n", | |
| "!chmod +x /tmp/llama.cpp/build/bin/llama-quantize\n", | |
| "!/tmp/llama.cpp/build/bin/llama-quantize {fp16_file} {q8_file} q8_0\n", | |
| "size_mb = os.path.getsize(q8_file) / (1024 * 1024)\n", | |
| "print(f\" Q8_0: {size_mb:.1f} MB\")\n", | |
| "print(\"β GGUF conversion complete!\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 8. Upload GGUFs to HuggingFace" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 24, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Uploading to OrcsRise/qmd-query-expansion-lfm2-gguf...\n", | |
| " Uploading qmd-query-expansion-lfm2-q8_0.gguf (1189 MB)...\n" | |
| ] | |
| }, | |
| { | |
| "data": { | |
| "application/vnd.jupyter.widget-view+json": { | |
| "model_id": "c5b7c5b973454cccae5027074ba75274", | |
| "version_major": 2, | |
| "version_minor": 0 | |
| }, | |
| "text/plain": [ | |
| "Processing Files (0 / 0) : | | 0.00B / 0.00B " | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "data": { | |
| "application/vnd.jupyter.widget-view+json": { | |
| "model_id": "406bd64d6a574d59a365fb6893e4fbb0", | |
| "version_major": 2, | |
| "version_minor": 0 | |
| }, | |
| "text/plain": [ | |
| "New Data Upload : | | 0.00B / 0.00B " | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "data": { | |
| "application/vnd.jupyter.widget-view+json": { | |
| "model_id": "18c004aa0a9d4db3ba14e7c4b3e0a7d3", | |
| "version_major": 2, | |
| "version_minor": 0 | |
| }, | |
| "text/plain": [ | |
| " ...-expansion-lfm2-q8_0.gguf: 4%|4 | 50.2MB / 1.25GB " | |
| ] | |
| }, | |
| "metadata": {}, | |
| "output_type": "display_data" | |
| }, | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "\n", | |
| "β Uploaded to: https://huggingface.co/OrcsRise/qmd-query-expansion-lfm2-gguf\n", | |
| "\n", | |
| "π Add to ~/.zshrc:\n", | |
| "export QMD_GEN_MODEL=\"hf:OrcsRise/qmd-query-expansion-lfm2-gguf/qmd-query-expansion-lfm2-q8_0.gguf\"\n" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "from huggingface_hub import HfApi\n", | |
| "import os\n", | |
| "api = HfApi()\n", | |
| "# Create repo if needed\n", | |
| "api.create_repo(OUTPUT_GGUF_REPO, exist_ok=True)\n", | |
| "print(f\"Uploading to {OUTPUT_GGUF_REPO}...\")\n", | |
| "# Upload Q8_0\n", | |
| "q8_file = f\"/tmp/gguf_output/{MODEL_NAME}-q8_0.gguf\"\n", | |
| "filename = os.path.basename(q8_file)\n", | |
| "print(f\" Uploading {filename} ({os.path.getsize(q8_file) / 1024**2:.0f} MB)...\")\n", | |
| "api.upload_file(\n", | |
| " path_or_fileobj=q8_file,\n", | |
| " path_in_repo=filename,\n", | |
| " repo_id=OUTPUT_GGUF_REPO,\n", | |
| ")\n", | |
| "print(f\"\\nβ Uploaded to: https://huggingface.co/{OUTPUT_GGUF_REPO}\")\n", | |
| "print(f\"\\nπ Add to ~/.zshrc:\")\n", | |
| "print(f'export QMD_GEN_MODEL=\"hf:{OUTPUT_GGUF_REPO}/{filename}\"')" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": 23, | |
| "metadata": {}, | |
| "outputs": [ | |
| { | |
| "name": "stdout", | |
| "output_type": "stream", | |
| "text": [ | |
| "Uploading to OrcsRise/qmd-query-expansion-lfm2-gguf...\n" | |
| ] | |
| }, | |
| { | |
| "ename": "NameError", | |
| "evalue": "name 'quantized' is not defined", | |
| "output_type": "error", | |
| "traceback": [ | |
| "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", | |
| "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", | |
| "\u001b[0;32m/tmp/ipython-input-4233011047.py\u001b[0m in \u001b[0;36m<cell line: 0>\u001b[0;34m()\u001b[0m\n\u001b[1;32m 5\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 6\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\"Uploading to {OUTPUT_GGUF_REPO}...\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 7\u001b[0;31m \u001b[0;32mfor\u001b[0m \u001b[0mqfile\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mqtype\u001b[0m \u001b[0;32min\u001b[0m \u001b[0mquantized\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 8\u001b[0m \u001b[0mfilename\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mos\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mpath\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mbasename\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mqfile\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 9\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34mf\" Uploading {filename}...\"\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", | |
| "\u001b[0;31mNameError\u001b[0m: name 'quantized' is not defined" | |
| ] | |
| } | |
| ], | |
| "source": [ | |
| "from huggingface_hub import HfApi\n", | |
| "\n", | |
| "api = HfApi()\n", | |
| "api.create_repo(repo_id=OUTPUT_GGUF_REPO, repo_type=\"model\", exist_ok=True)\n", | |
| "\n", | |
| "print(f\"Uploading to {OUTPUT_GGUF_REPO}...\")\n", | |
| "for qfile, qtype in quantized:\n", | |
| " filename = os.path.basename(qfile)\n", | |
| " print(f\" Uploading {filename}...\")\n", | |
| " api.upload_file(\n", | |
| " path_or_fileobj=qfile,\n", | |
| " path_in_repo=filename,\n", | |
| " repo_id=OUTPUT_GGUF_REPO,\n", | |
| " )\n", | |
| "\n", | |
| "# Upload README\n", | |
| "readme = f\"\"\"---\n", | |
| "base_model: {BASE_MODEL}\n", | |
| "tags: [gguf, llama.cpp, quantized, query-expansion, qmd, lfm2]\n", | |
| "---\n", | |
| "# {MODEL_NAME} (GGUF)\n", | |
| "\n", | |
| "Fine-tuned LiquidAI LFM2-1.2B for QMD query expansion.\n", | |
| "\n", | |
| "## Details\n", | |
| "- **Base:** {BASE_MODEL}\n", | |
| "- **Training:** SFT with LoRA (rank 16) on {DATASET}\n", | |
| "- **Task:** Query expansion producing lex/vec/hyde format\n", | |
| "\n", | |
| "## Usage with qmd\n", | |
| "```bash\n", | |
| "export QMD_GEN_MODEL=\"hf:{OUTPUT_GGUF_REPO}/{MODEL_NAME}-q8_0.gguf\"\n", | |
| "qmd query \"your search\"\n", | |
| "```\n", | |
| "\"\"\"\n", | |
| "api.upload_file(\n", | |
| " path_or_fileobj=readme.encode(),\n", | |
| " path_in_repo=\"README.md\",\n", | |
| " repo_id=OUTPUT_GGUF_REPO,\n", | |
| ")\n", | |
| "\n", | |
| "print(f\"\\nπ Done! https://huggingface.co/{OUTPUT_GGUF_REPO}\")\n", | |
| "print(f\"\\nπ Add this to your ~/.zshrc:\")\n", | |
| "print(f'export QMD_GEN_MODEL=\"hf:{OUTPUT_GGUF_REPO}/{MODEL_NAME}-q8_0.gguf\"')" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## β Done!\n", | |
| "\n", | |
| "Copy the export line above and add it to your `~/.zshrc`, then:\n", | |
| "\n", | |
| "```bash\n", | |
| "source ~/.zshrc\n", | |
| "qmd query \"test\"\n", | |
| "```\n", | |
| "\n", | |
| "The fine-tuned LFM2 will produce clean, diverse `lex:/vec:/hyde:` expansions β 2x faster than Qwen3." | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "accelerator": "GPU", | |
| "colab": { | |
| "gpuType": "T4", | |
| "provenance": [] | |
| }, | |
| "kernelspec": { | |
| "display_name": "Python 3 (ipykernel)", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "codemirror_mode": { | |
| "name": "ipython", | |
| "version": 3 | |
| }, | |
| "file_extension": ".py", | |
| "mimetype": "text/x-python", | |
| "name": "python", | |
| "nbconvert_exporter": "python", | |
| "pygments_lexer": "ipython3", | |
| "version": "3.12.12" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 0 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment