Skip to content

Instantly share code, notes, and snippets.

@OmerFarukOruc
Created February 19, 2026 18:54
Show Gist options
  • Select an option

  • Save OmerFarukOruc/52db840a05c9dd51991b95642a76858c to your computer and use it in GitHub Desktop.

Select an option

Save OmerFarukOruc/52db840a05c9dd51991b95642a76858c to your computer and use it in GitHub Desktop.
QMD Query Expansion: Fine-tune LiquidAI LFM2-1.2B (v2 - fixed overfitting)
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# QMD Query Expansion: Fine-tune LiquidAI LFM2-1.2B\n",
"\n",
"Fine-tunes LFM2-1.2B on qmd's query expansion dataset to produce structured `lex:/vec:/hyde:` output.\n",
"Then converts to GGUF (Q8_0) for local inference.\n",
"\n",
"**Runtime**: Set to **T4 GPU** (Runtime > Change runtime type > T4)\n",
"\n",
"**Time**: ~1 hour on T4, ~15 min on A100"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Config"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# ============================================================\n",
"# CHANGE THIS to your HuggingFace username\n",
"# ============================================================\n",
"HF_USERNAME = \"OrcsRise\" # <-- change this!\n",
"# ============================================================\n",
"\n",
"BASE_MODEL = \"LiquidAI/LFM2-1.2B\"\n",
"DATASET = \"tobil/qmd-query-expansion-train\"\n",
"OUTPUT_SFT = f\"{HF_USERNAME}/qmd-query-expansion-lfm2-sft\"\n",
"OUTPUT_GGUF_REPO = f\"{HF_USERNAME}/qmd-query-expansion-lfm2-gguf\"\n",
"MODEL_NAME = \"qmd-query-expansion-lfm2\"\n",
"\n",
"assert len(HF_USERNAME) > 0 and ' ' not in HF_USERNAME, \"Set HF_USERNAME above!\"\n",
"print(f\"SFT adapter -> {OUTPUT_SFT}\")\n",
"print(f\"GGUF repo -> {OUTPUT_GGUF_REPO}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 1. Install dependencies"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install -q torch trl>=0.12.0 peft>=0.7.0 \"transformers>=4.55.0\" \\\n",
" accelerate>=0.24.0 huggingface_hub>=0.20.0 datasets bitsandbytes \\\n",
" sentencepiece protobuf numpy gguf tokenizers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2. Login to HuggingFace"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from huggingface_hub import login, notebook_login\n",
"\n",
"# Option A: Use Colab secrets (Settings > Secrets > add HF_TOKEN)\n",
"try:\n",
" from google.colab import userdata\n",
" hf_token = userdata.get('HF_TOKEN')\n",
" login(token=hf_token)\n",
" print(\"Logged in via Colab secret\")\n",
"except Exception:\n",
" # Option B: Interactive login\n",
" notebook_login()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 3. Check GPU"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"print(f\"CUDA available: {torch.cuda.is_available()}\")\n",
"if torch.cuda.is_available():\n",
" print(f\"GPU: {torch.cuda.get_device_name(0)}\")\n",
" print(f\"VRAM: {torch.cuda.get_device_properties(0).total_memory / 1024**3:.1f} GB\")\n",
"else:\n",
" raise RuntimeError(\"No GPU! Go to Runtime > Change runtime type > T4 GPU\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 4. Load dataset"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from datasets import load_dataset\n",
"\n",
"print(f\"Loading dataset: {DATASET}...\")\n",
"dataset = load_dataset(DATASET, split=\"train\")\n",
"print(f\"Dataset loaded: {len(dataset)} examples\")\n",
"\n",
"split = dataset.train_test_split(test_size=0.1, seed=42)\n",
"train_dataset = split[\"train\"]\n",
"eval_dataset = split[\"test\"]\n",
"print(f\" Train: {len(train_dataset)}, Eval: {len(eval_dataset)}\")\n",
"\n",
"# Preview\n",
"ex = train_dataset[0][\"messages\"]\n",
"print(f\"\\n--- Example ---\")\n",
"print(f\"User: {ex[0]['content'][:200]}\")\n",
"print(f\"Assistant: {ex[1]['content'][:300]}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 5. Configure & train (SFT with LoRA)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from peft import LoraConfig\n",
"from transformers import AutoTokenizer\n",
"from trl import SFTTrainer, SFTConfig\n",
"\n",
"config = SFTConfig(\n",
" output_dir=MODEL_NAME,\n",
" push_to_hub=True,\n",
" hub_model_id=OUTPUT_SFT,\n",
" hub_strategy=\"every_save\",\n",
"\n",
" # Training hyperparams — 2 epochs to avoid overfitting on 4.6K examples\n",
" num_train_epochs=2,\n",
" per_device_train_batch_size=4,\n",
" gradient_accumulation_steps=4, # effective batch = 16\n",
" learning_rate=2e-4,\n",
" max_length=512,\n",
"\n",
" # Logging & saving\n",
" logging_steps=10,\n",
" save_strategy=\"steps\",\n",
" save_steps=100,\n",
" save_total_limit=3,\n",
" eval_strategy=\"steps\",\n",
" eval_steps=100,\n",
"\n",
" # Load best model at end (by eval_loss)\n",
" load_best_model_at_end=True,\n",
" metric_for_best_model=\"eval_loss\",\n",
" greater_is_better=False,\n",
"\n",
" # Schedule\n",
" warmup_ratio=0.03,\n",
" lr_scheduler_type=\"cosine\",\n",
" bf16=True,\n",
" report_to=\"none\",\n",
")\n",
"\n",
"# LoRA config — LFM2 hybrid architecture targets\n",
"peft_config = LoraConfig(\n",
" r=16,\n",
" lora_alpha=32,\n",
" lora_dropout=0.0,\n",
" bias=\"none\",\n",
" task_type=\"CAUSAL_LM\",\n",
" target_modules=[\"q_proj\", \"k_proj\", \"v_proj\", \"out_proj\", \"in_proj\", \"w1\", \"w2\", \"w3\"],\n",
")\n",
"\n",
"print(\"Initializing SFT trainer...\")\n",
"trainer = SFTTrainer(\n",
" model=BASE_MODEL,\n",
" train_dataset=train_dataset,\n",
" eval_dataset=eval_dataset,\n",
" args=config,\n",
" peft_config=peft_config,\n",
")\n",
"\n",
"print(f\"Trainable params: {sum(p.numel() for p in trainer.model.parameters() if p.requires_grad):,}\")\n",
"print(f\"Total params: {sum(p.numel() for p in trainer.model.parameters()):,}\")\n",
"print(f\"\\nStarting SFT training...\")\n",
"trainer.train()\n",
"\n",
"print(f\"\\nPushing best LoRA adapter to Hub...\")\n",
"trainer.push_to_hub()\n",
"print(f\"SFT adapter: https://huggingface.co/{OUTPUT_SFT}\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 6. Merge LoRA adapter with base model"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import torch\n",
"from peft import PeftModel\n",
"from transformers import AutoModelForCausalLM, AutoTokenizer\n",
"\n",
"print(f\"Loading base model: {BASE_MODEL}...\")\n",
"base_model = AutoModelForCausalLM.from_pretrained(\n",
" BASE_MODEL, dtype=torch.bfloat16, device_map=\"auto\", trust_remote_code=True,\n",
")\n",
"\n",
"print(f\"Merging best LoRA adapter...\")\n",
"base_model.config.tie_word_embeddings = False\n",
"model = PeftModel.from_pretrained(base_model, MODEL_NAME, local_files_only=True)\n",
"model = model.merge_and_unload()\n",
"\n",
"tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL, trust_remote_code=True)\n",
"\n",
"merged_dir = \"/tmp/merged_model\"\n",
"print(f\"Saving merged model to {merged_dir}...\")\n",
"model.save_pretrained(merged_dir, safe_serialization=True)\n",
"tokenizer.save_pretrained(merged_dir)\n",
"print(\"Merged model saved\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 7. Convert to GGUF"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import subprocess, os\n",
"\n",
"# Ensure latest tokenizers (LFM2 needs TokenizersBackend)\n",
"!pip install -q --upgrade tokenizers transformers\n",
"\n",
"# Setup llama.cpp\n",
"print(\"Setting up llama.cpp...\")\n",
"!apt-get update -qq && apt-get install -y -qq build-essential cmake git > /dev/null 2>&1\n",
"if not os.path.exists(\"/tmp/llama.cpp\"):\n",
" !git clone --depth 1 https://github.com/ggerganov/llama.cpp.git /tmp/llama.cpp\n",
"!pip install -q -r /tmp/llama.cpp/requirements.txt\n",
"\n",
"# Build quantize tool\n",
"print(\"\\nBuilding llama-quantize...\")\n",
"!cmake -B /tmp/llama.cpp/build -S /tmp/llama.cpp -DGGML_CUDA=OFF > /dev/null 2>&1\n",
"!cmake --build /tmp/llama.cpp/build --target llama-quantize -j 4 > /dev/null 2>&1\n",
"\n",
"# Convert to FP16 GGUF\n",
"gguf_dir = \"/tmp/gguf_output\"\n",
"os.makedirs(gguf_dir, exist_ok=True)\n",
"fp16_file = f\"{gguf_dir}/{MODEL_NAME}-f16.gguf\"\n",
"\n",
"print(\"\\nConverting to FP16 GGUF...\")\n",
"!python /tmp/llama.cpp/convert_hf_to_gguf.py /tmp/merged_model --outfile {fp16_file} --outtype f16\n",
"\n",
"size_mb = os.path.getsize(fp16_file) / (1024 * 1024)\n",
"print(f\" FP16: {size_mb:.1f} MB\")\n",
"\n",
"# Quantize to Q8_0\n",
"quantize_bin = \"/tmp/llama.cpp/build/bin/llama-quantize\"\n",
"q8_file = f\"{gguf_dir}/{MODEL_NAME}-q8_0.gguf\"\n",
"\n",
"print(\"\\nQuantizing to Q8_0...\")\n",
"!chmod +x {quantize_bin}\n",
"!{quantize_bin} {fp16_file} {q8_file} q8_0\n",
"\n",
"q8_size = os.path.getsize(q8_file) / (1024 * 1024)\n",
"print(f\" Q8_0: {q8_size:.1f} MB\")\n",
"\n",
"# Cleanup FP16 to save disk\n",
"os.remove(fp16_file)\n",
"print(f\"\\nGGUF conversion complete!\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 8. Upload GGUF to HuggingFace"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from huggingface_hub import HfApi\n",
"import os\n",
"\n",
"api = HfApi()\n",
"api.create_repo(OUTPUT_GGUF_REPO, exist_ok=True)\n",
"\n",
"# Upload Q8_0 GGUF\n",
"q8_file = f\"/tmp/gguf_output/{MODEL_NAME}-q8_0.gguf\"\n",
"filename = os.path.basename(q8_file)\n",
"print(f\"Uploading {filename} ({os.path.getsize(q8_file) / 1024**2:.0f} MB)...\")\n",
"api.upload_file(\n",
" path_or_fileobj=q8_file,\n",
" path_in_repo=filename,\n",
" repo_id=OUTPUT_GGUF_REPO,\n",
")\n",
"\n",
"# Upload README\n",
"readme = f\"\"\"---\n",
"base_model: {BASE_MODEL}\n",
"tags: [gguf, llama.cpp, quantized, query-expansion, qmd, lfm2]\n",
"---\n",
"# {MODEL_NAME} (GGUF)\n",
"\n",
"Fine-tuned LiquidAI LFM2-1.2B for QMD query expansion.\n",
"\n",
"## Details\n",
"- **Base:** {BASE_MODEL}\n",
"- **Training:** SFT with LoRA (rank 16) on {DATASET}\n",
"- **Epochs:** 2 (best checkpoint by eval_loss)\n",
"- **Task:** Query expansion producing lex/vec/hyde format\n",
"\n",
"## Usage with qmd\n",
"```bash\n",
"export QMD_GEN_MODEL=\"hf:{OUTPUT_GGUF_REPO}/{MODEL_NAME}-q8_0.gguf\"\n",
"qmd query \"your search\"\n",
"```\n",
"\"\"\"\n",
"api.upload_file(\n",
" path_or_fileobj=readme.encode(),\n",
" path_in_repo=\"README.md\",\n",
" repo_id=OUTPUT_GGUF_REPO,\n",
")\n",
"\n",
"print(f\"\\nUploaded to: https://huggingface.co/{OUTPUT_GGUF_REPO}\")\n",
"print(f\"\\nAdd to ~/.zshrc:\")\n",
"print(f'export QMD_GEN_MODEL=\"hf:{OUTPUT_GGUF_REPO}/{MODEL_NAME}-q8_0.gguf\"')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Done!\n",
"\n",
"Copy the export line above and add it to your `~/.zshrc`, then:\n",
"\n",
"```bash\n",
"source ~/.zshrc\n",
"qmd query \"test\"\n",
"```"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
"gpuType": "T4",
"provenance": []
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.12.0"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment