Created
February 19, 2026 18:54
-
-
Save OmerFarukOruc/52db840a05c9dd51991b95642a76858c to your computer and use it in GitHub Desktop.
QMD Query Expansion: Fine-tune LiquidAI LFM2-1.2B (v2 - fixed overfitting)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| { | |
| "cells": [ | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "# QMD Query Expansion: Fine-tune LiquidAI LFM2-1.2B\n", | |
| "\n", | |
| "Fine-tunes LFM2-1.2B on qmd's query expansion dataset to produce structured `lex:/vec:/hyde:` output.\n", | |
| "Then converts to GGUF (Q8_0) for local inference.\n", | |
| "\n", | |
| "**Runtime**: Set to **T4 GPU** (Runtime > Change runtime type > T4)\n", | |
| "\n", | |
| "**Time**: ~1 hour on T4, ~15 min on A100" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Config" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "# ============================================================\n", | |
| "# CHANGE THIS to your HuggingFace username\n", | |
| "# ============================================================\n", | |
| "HF_USERNAME = \"OrcsRise\" # <-- change this!\n", | |
| "# ============================================================\n", | |
| "\n", | |
| "BASE_MODEL = \"LiquidAI/LFM2-1.2B\"\n", | |
| "DATASET = \"tobil/qmd-query-expansion-train\"\n", | |
| "OUTPUT_SFT = f\"{HF_USERNAME}/qmd-query-expansion-lfm2-sft\"\n", | |
| "OUTPUT_GGUF_REPO = f\"{HF_USERNAME}/qmd-query-expansion-lfm2-gguf\"\n", | |
| "MODEL_NAME = \"qmd-query-expansion-lfm2\"\n", | |
| "\n", | |
| "assert len(HF_USERNAME) > 0 and ' ' not in HF_USERNAME, \"Set HF_USERNAME above!\"\n", | |
| "print(f\"SFT adapter -> {OUTPUT_SFT}\")\n", | |
| "print(f\"GGUF repo -> {OUTPUT_GGUF_REPO}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 1. Install dependencies" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "!pip install -q torch trl>=0.12.0 peft>=0.7.0 \"transformers>=4.55.0\" \\\n", | |
| " accelerate>=0.24.0 huggingface_hub>=0.20.0 datasets bitsandbytes \\\n", | |
| " sentencepiece protobuf numpy gguf tokenizers" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 2. Login to HuggingFace" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "from huggingface_hub import login, notebook_login\n", | |
| "\n", | |
| "# Option A: Use Colab secrets (Settings > Secrets > add HF_TOKEN)\n", | |
| "try:\n", | |
| " from google.colab import userdata\n", | |
| " hf_token = userdata.get('HF_TOKEN')\n", | |
| " login(token=hf_token)\n", | |
| " print(\"Logged in via Colab secret\")\n", | |
| "except Exception:\n", | |
| " # Option B: Interactive login\n", | |
| " notebook_login()" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 3. Check GPU" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "import torch\n", | |
| "print(f\"CUDA available: {torch.cuda.is_available()}\")\n", | |
| "if torch.cuda.is_available():\n", | |
| " print(f\"GPU: {torch.cuda.get_device_name(0)}\")\n", | |
| " print(f\"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB\")\n", | |
| "else:\n", | |
| " raise RuntimeError(\"No GPU! Go to Runtime > Change runtime type > T4 GPU\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 4. Load dataset" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "from datasets import load_dataset\n", | |
| "\n", | |
| "print(f\"Loading dataset: {DATASET}...\")\n", | |
| "dataset = load_dataset(DATASET, split=\"train\")\n", | |
| "print(f\"Dataset loaded: {len(dataset)} examples\")\n", | |
| "\n", | |
| "split = dataset.train_test_split(test_size=0.1, seed=42)\n", | |
| "train_dataset = split[\"train\"]\n", | |
| "eval_dataset = split[\"test\"]\n", | |
| "print(f\" Train: {len(train_dataset)}, Eval: {len(eval_dataset)}\")\n", | |
| "\n", | |
| "# Preview\n", | |
| "ex = train_dataset[0][\"messages\"]\n", | |
| "print(f\"\\n--- Example ---\")\n", | |
| "print(f\"User: {ex[0]['content'][:200]}\")\n", | |
| "print(f\"Assistant: {ex[1]['content'][:300]}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 5. Configure & train (SFT with LoRA)" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "from peft import LoraConfig\n", | |
| "from transformers import AutoTokenizer\n", | |
| "from trl import SFTTrainer, SFTConfig\n", | |
| "\n", | |
| "config = SFTConfig(\n", | |
| " output_dir=MODEL_NAME,\n", | |
| " push_to_hub=True,\n", | |
| " hub_model_id=OUTPUT_SFT,\n", | |
| " hub_strategy=\"every_save\",\n", | |
| "\n", | |
| " # Training hyperparams — 2 epochs to avoid overfitting on 4.6K examples\n", | |
| " num_train_epochs=2,\n", | |
| " per_device_train_batch_size=4,\n", | |
| " gradient_accumulation_steps=4, # effective batch = 16\n", | |
| " learning_rate=2e-4,\n", | |
| " max_length=512,\n", | |
| "\n", | |
| " # Logging & saving\n", | |
| " logging_steps=10,\n", | |
| " save_strategy=\"steps\",\n", | |
| " save_steps=100,\n", | |
| " save_total_limit=3,\n", | |
| " eval_strategy=\"steps\",\n", | |
| " eval_steps=100,\n", | |
| "\n", | |
| " # Load best model at end (by eval_loss)\n", | |
| " load_best_model_at_end=True,\n", | |
| " metric_for_best_model=\"eval_loss\",\n", | |
| " greater_is_better=False,\n", | |
| "\n", | |
| " # Schedule\n", | |
| " warmup_ratio=0.03,\n", | |
| " lr_scheduler_type=\"cosine\",\n", | |
| " bf16=True,\n", | |
| " report_to=\"none\",\n", | |
| ")\n", | |
| "\n", | |
| "# LoRA config — LFM2 hybrid architecture targets\n", | |
| "peft_config = LoraConfig(\n", | |
| " r=16,\n", | |
| " lora_alpha=32,\n", | |
| " lora_dropout=0.0,\n", | |
| " bias=\"none\",\n", | |
| " task_type=\"CAUSAL_LM\",\n", | |
| " target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"out_proj\", \"in_proj\", \"w1\", \"w2\", \"w3\"],\n", | |
| ")\n", | |
| "\n", | |
| "print(\"Initializing SFT trainer...\")\n", | |
| "trainer = SFTTrainer(\n", | |
| " model=BASE_MODEL,\n", | |
| " train_dataset=train_dataset,\n", | |
| " eval_dataset=eval_dataset,\n", | |
| " args=config,\n", | |
| " peft_config=peft_config,\n", | |
| ")\n", | |
| "\n", | |
| "print(f\"Trainable params: {sum(p.numel() for p in trainer.model.parameters() if p.requires_grad):,}\")\n", | |
| "print(f\"Total params: {sum(p.numel() for p in trainer.model.parameters()):,}\")\n", | |
| "print(f\"\\nStarting SFT training...\")\n", | |
| "trainer.train()\n", | |
| "\n", | |
| "print(f\"\\nPushing best LoRA adapter to Hub...\")\n", | |
| "trainer.push_to_hub()\n", | |
| "print(f\"SFT adapter: https://huggingface.co/{OUTPUT_SFT}\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 6. Merge LoRA adapter with base model" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "import torch\n", | |
| "from peft import PeftModel\n", | |
| "from transformers import AutoModelForCausalLM, AutoTokenizer\n", | |
| "\n", | |
| "print(f\"Loading base model: {BASE_MODEL}...\")\n", | |
| "base_model = AutoModelForCausalLM.from_pretrained(\n", | |
| " BASE_MODEL, dtype=torch.bfloat16, device_map=\"auto\", trust_remote_code=True,\n", | |
| ")\n", | |
| "\n", | |
| "print(f\"Merging best LoRA adapter...\")\n", | |
| "base_model.config.tie_word_embeddings = False\n", | |
| "model = PeftModel.from_pretrained(base_model, MODEL_NAME, local_files_only=True)\n", | |
| "model = model.merge_and_unload()\n", | |
| "\n", | |
| "tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)\n", | |
| "\n", | |
| "merged_dir = \"/tmp/merged_model\"\n", | |
| "print(f\"Saving merged model to {merged_dir}...\")\n", | |
| "model.save_pretrained(merged_dir, safe_serialization=True)\n", | |
| "tokenizer.save_pretrained(merged_dir)\n", | |
| "print(\"Merged model saved\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 7. Convert to GGUF" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "import subprocess, os\n", | |
| "\n", | |
| "# Ensure latest tokenizers (LFM2 needs TokenizersBackend)\n", | |
| "!pip install -q --upgrade tokenizers transformers\n", | |
| "\n", | |
| "# Setup llama.cpp\n", | |
| "print(\"Setting up llama.cpp...\")\n", | |
| "!apt-get update -qq && apt-get install -y -qq build-essential cmake git > /dev/null 2>&1\n", | |
| "if not os.path.exists(\"/tmp/llama.cpp\"):\n", | |
| " !git clone --depth 1 https://github.com/ggerganov/llama.cpp.git /tmp/llama.cpp\n", | |
| "!pip install -q -r /tmp/llama.cpp/requirements.txt\n", | |
| "\n", | |
| "# Build quantize tool\n", | |
| "print(\"\\nBuilding llama-quantize...\")\n", | |
| "!cmake -B /tmp/llama.cpp/build -S /tmp/llama.cpp -DGGML_CUDA=OFF > /dev/null 2>&1\n", | |
| "!cmake --build /tmp/llama.cpp/build --target llama-quantize -j 4 > /dev/null 2>&1\n", | |
| "\n", | |
| "# Convert to FP16 GGUF\n", | |
| "gguf_dir = \"/tmp/gguf_output\"\n", | |
| "os.makedirs(gguf_dir, exist_ok=True)\n", | |
| "fp16_file = f\"{gguf_dir}/{MODEL_NAME}-f16.gguf\"\n", | |
| "\n", | |
| "print(\"\\nConverting to FP16 GGUF...\")\n", | |
| "!python /tmp/llama.cpp/convert_hf_to_gguf.py /tmp/merged_model --outfile {fp16_file} --outtype f16\n", | |
| "\n", | |
| "size_mb = os.path.getsize(fp16_file) / (1024 * 1024)\n", | |
| "print(f\" FP16: {size_mb:.1f} MB\")\n", | |
| "\n", | |
| "# Quantize to Q8_0\n", | |
| "quantize_bin = \"/tmp/llama.cpp/build/bin/llama-quantize\"\n", | |
| "q8_file = f\"{gguf_dir}/{MODEL_NAME}-q8_0.gguf\"\n", | |
| "\n", | |
| "print(\"\\nQuantizing to Q8_0...\")\n", | |
| "!chmod +x {quantize_bin}\n", | |
| "!{quantize_bin} {fp16_file} {q8_file} q8_0\n", | |
| "\n", | |
| "q8_size = os.path.getsize(q8_file) / (1024 * 1024)\n", | |
| "print(f\" Q8_0: {q8_size:.1f} MB\")\n", | |
| "\n", | |
| "# Cleanup FP16 to save disk\n", | |
| "os.remove(fp16_file)\n", | |
| "print(f\"\\nGGUF conversion complete!\")" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## 8. Upload GGUF to HuggingFace" | |
| ] | |
| }, | |
| { | |
| "cell_type": "code", | |
| "execution_count": null, | |
| "metadata": {}, | |
| "outputs": [], | |
| "source": [ | |
| "from huggingface_hub import HfApi\n", | |
| "import os\n", | |
| "\n", | |
| "api = HfApi()\n", | |
| "api.create_repo(OUTPUT_GGUF_REPO, exist_ok=True)\n", | |
| "\n", | |
| "# Upload Q8_0 GGUF\n", | |
| "q8_file = f\"/tmp/gguf_output/{MODEL_NAME}-q8_0.gguf\"\n", | |
| "filename = os.path.basename(q8_file)\n", | |
| "print(f\"Uploading {filename} ({os.path.getsize(q8_file) / 1024**2:.0f} MB)...\")\n", | |
| "api.upload_file(\n", | |
| " path_or_fileobj=q8_file,\n", | |
| " path_in_repo=filename,\n", | |
| " repo_id=OUTPUT_GGUF_REPO,\n", | |
| ")\n", | |
| "\n", | |
| "# Upload README\n", | |
| "readme = f\"\"\"---\n", | |
| "base_model: {BASE_MODEL}\n", | |
| "tags: [gguf, llama.cpp, quantized, query-expansion, qmd, lfm2]\n", | |
| "---\n", | |
| "# {MODEL_NAME} (GGUF)\n", | |
| "\n", | |
| "Fine-tuned LiquidAI LFM2-1.2B for QMD query expansion.\n", | |
| "\n", | |
| "## Details\n", | |
| "- **Base:** {BASE_MODEL}\n", | |
| "- **Training:** SFT with LoRA (rank 16) on {DATASET}\n", | |
| "- **Epochs:** 2 (best checkpoint by eval_loss)\n", | |
| "- **Task:** Query expansion producing lex/vec/hyde format\n", | |
| "\n", | |
| "## Usage with qmd\n", | |
| "```bash\n", | |
| "export QMD_GEN_MODEL=\"hf:{OUTPUT_GGUF_REPO}/{MODEL_NAME}-q8_0.gguf\"\n", | |
| "qmd query \"your search\"\n", | |
| "```\n", | |
| "\"\"\"\n", | |
| "api.upload_file(\n", | |
| " path_or_fileobj=readme.encode(),\n", | |
| " path_in_repo=\"README.md\",\n", | |
| " repo_id=OUTPUT_GGUF_REPO,\n", | |
| ")\n", | |
| "\n", | |
| "print(f\"\\nUploaded to: https://huggingface.co/{OUTPUT_GGUF_REPO}\")\n", | |
| "print(f\"\\nAdd to ~/.zshrc:\")\n", | |
| "print(f'export QMD_GEN_MODEL=\"hf:{OUTPUT_GGUF_REPO}/{MODEL_NAME}-q8_0.gguf\"')" | |
| ] | |
| }, | |
| { | |
| "cell_type": "markdown", | |
| "metadata": {}, | |
| "source": [ | |
| "## Done!\n", | |
| "\n", | |
| "Copy the export line above and add it to your `~/.zshrc`, then:\n", | |
| "\n", | |
| "```bash\n", | |
| "source ~/.zshrc\n", | |
| "qmd query \"test\"\n", | |
| "```" | |
| ] | |
| } | |
| ], | |
| "metadata": { | |
| "accelerator": "GPU", | |
| "colab": { | |
| "gpuType": "T4", | |
| "provenance": [] | |
| }, | |
| "kernelspec": { | |
| "display_name": "Python 3", | |
| "language": "python", | |
| "name": "python3" | |
| }, | |
| "language_info": { | |
| "name": "python", | |
| "version": "3.12.0" | |
| } | |
| }, | |
| "nbformat": 4, | |
| "nbformat_minor": 0 | |
| } |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment