Skip to content

Instantly share code, notes, and snippets.

@jennymaeleidig
Last active December 4, 2025 17:25
Show Gist options
  • Select an option

  • Save jennymaeleidig/30bed561cb45368e5bf3fb7cf1f43711 to your computer and use it in GitHub Desktop.

Select an option

Save jennymaeleidig/30bed561cb45368e5bf3fb7cf1f43711 to your computer and use it in GitHub Desktop.
LLM Resources

What

Tools

  • continue - code completion / ai agent extension (preferrably vscode)
  • ollama - simple tool to run llms locally
  • llama.cpp - alternative to ollama, (ollama is a cpp wrapper). more extensible / configurable
    • More here.
    • NOTE: I have not tested this method properly.
  • roo code - ai agent extension (preferrably vscode)
    • useful for interactions with github co-pilot vscode llm api
    • be sure to configure terminal integration as it is quite useful in giving the llm context to help you debug
  • github co-pilot - ai agent extension (preferrably vscode)
    • education tier is pro for free!

Local Models (As of 8/11/25)

NOTE: These are all running (not concurrently) on my m3 macbook air w/ 24 gb unified memeory, hence the smaller model params.

Remote / Deployed Models (as of 8/17/25)

  • Claude 4 Sonnet (via github copilot) - for architect, code, debug mode
  • GPT-4.1 (via github copilot) - really just the code mode, but it is free on the education / pro plan
  • Gemini Pro / Flash (free tier) - for orchestration and architecting
  • Qwen3 Coder (free tier via Operouter) - for architect, code, debug mode
    • Note: While a very effective model, the free tier is of course rate limited, so experience may vary.
  • GPT-5 / GPT-5 Mini (paid, rarely used but cost effective model) - general purpose

Why

A IL L MB I G D A T AB U Z Z W O R D S

Look, I'm a certified AI skeptic. Its a tech bubble, its happened before, its gonna happen again. And like with each bubble, there is some pretty interesting and exciting technology hidden throughout the buckets of slop. I've tried to comb through the acrid goop to try and find some of these tools, and this gist is the brief summary of my findings. To me, code / tab completion is just the logical next step in LSP tech, coding languages are inherently rhytmic and structured, and LLMs are text pattern matching goblins. As for the more agentic / chat approach I am somewhat more hesitant in full adoption at the momment. Rife with confirmation and training bias, these bots can't help but stroke your ego and it really really irks me. The hardcoded subservientness is a bit anti-thetical to my worldview (a topic for another time, see berger's laws of chatbotics), so for the most part, im going to try and find the one that, well, does it the least; I would describe its pathos as obsequious and asinine. Ultimately, they're a handly little tool to have at the ready if youre stumped and need to rubber duck a problem, or to perform some menial task while you can turn your attention elsewhere (reason we made robots). As for this gist itself and the motivation behind local llms, well its not exactly a shocker that the big tech comapnies use and abuse your data, overcharge for their services, overpromise and underdeliver, and all around fail to treat you with dignity and respect. BUT, you can fight back by running these models, on your own machine on your own terms as all software should. I also provide a deployed / cloud based model solution as that is useful on machines that cannot phyiscally run the models meaningfully otherwise.

How

Local Setup (macOS)

  • Install Ollama
brew install ollama
  • (Optional) Copy the simple completion script to your bash completions folder.

  • Pull your model of choice for

    • Coding Agent, e.g
ollama pull hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:Q4_K_XL
  • Tab Completion, e.g
ollama pull hf.co/mradermacher/Codestral-22B-v0.1-GGUF:IQ3_XS
  • Codebase Context Provider
ollama pull nomic-embed-text
  • Install continue on your platform of choice, being sure to disable telemetry.

  • In the continue config.yaml

    • (Optional) You can set continue up with a remote Codestral instance for free to save on local resources. Be sure to read the terms and conditions of use.
name: Local Assistant
version: 1.0.0
schema: v1
models:
  - name: Codestral
    provider: mistral
    model: codestral-latest
    apiKey: { FILL IN IN }
    roles:
      - autocomplete
    # ^ if you dont want to use the remote codestral api, comment out this section.
  - name: Codestral Local
    provider: ollama
    model:
      {
        FILL WITH NAME OF AUTO COMPLETE MODEL INSTALLED ABOVE,
        E.G. hf.co/mradermacher/Codestral-22B-v0.1-GGUF:IQ3_XS,
      }
    roles:
      - autocomplete
  - name: Autodetect
    provider: ollama
    model: AUTODETECT
    # ^ This will allow you to select between models in the continue UI.
  - name: Nomic Embed Text
    provider: ollama
    model: nomic-embed-text
    roles:
      - embed
    # ^ This will enable index / provide context for the local project / file sytstem (also local)
context:
  - provider: code
  - provider: docs
  - provider: diff
  - provider: terminal
  - provider: problems
  - provider: folder
  - provider: codebase
  # ^ The codebase context provider needs some extra config to work properly. See all above.
  - provider: url
  # ^ Be sure to add this context provider as its quite useful. See all available here: https://docs.continue.dev/customize/custom-providers

Current Preferred Setup

  • Roo Code extension.
  • Using the VS Code LLM API
  • Using Github Copilot Pro. Free via Education Plan
  • Be sure to set up local codebase indexing with Roo
    • Use nomic-embed-text:latest as described above
      • Model Dimension is: 768
    • Allows semantic search on top of syntactic.
    • I reccomend the docker-compose approach to persist your indexing computations: qdrant_config
log_level: INFO

service:
  # Reduce max request size to prevent memory issues
  max_request_size_mb: 16
  max_workers: 1

cluster:
  enabled: false

# Disable telemetry
telemetry_disabled: true

docker-compose

services:
  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
      - "6334:6334"
      - ./qdrant_config.yaml:/qdrant/config/production.yaml
    user: root
    environment:
      - QDRANT__STORAGE__ON_DISK_PAYLOAD=true
      - QDRANT__STORAGE__PERFORMANCE__MAX_OPTIMIZATION_THREADS=1
      - QDRANT__STORAGE__HNSW_INDEX__ON_DISK=true
      - QDRANT__STORAGE__OPTIMIZERS__MAX_OPTIMIZATION_THREADS=1
      - QDRANT__SERVICE__MAX_WORKERS=1
    command: ["./qdrant", "--config-path", "config/production.yaml"]

Additional Tooling Configuration

## Description
"Save findings into a `steps` file in the `aiplans` folder"

## Goal
* during the previous conversation we have important outcomes:
    * initial prompt and further user input/corrections
    * findings
    * plans
    * insigts
    * observation
    * decisions
* save them as facts (with a great details) into the new `steps` file

## Command Variants
* `/save new` command is used to create a new `steps` file in the `aiplans` folder
    * if the `aiplans` folder does not exist, create the folder in the current project's root directory
    * file name format `<YYMMDD>-<ID>-step-<Task_Code>-<Task_name>.md`
    * create the new `steps` file if we don't have any during the current conversation yet
    * initial user prompt must be set at the beginning of the new file with caption `# Step 0. Inital Prompt`, preserve the original text
* `/save` command is used to append outcomes to the same `steps` file we are working on
    * use `insert_content` tool to add the latest findings to the end of the investigation file

## Content
1. Structure:
        outcomes must be put into the new chapter called `# Step {NUMBER_OF_THE_STEP}`
        you must fit all outcomes in the ONE chapter, do not split it into several chapters
        feel free to use multiple sub-sections inside the chapter
2. Summary: Describe the curent step summary and general flow of the investigation
3. Facts: your main goal is describing of outcomes as facts (facts, facts!, FACTS!) with a great details
4. User Input: note the user's input and in which direction the user wants to go
5. Avoids: NO conclusions, NO hypothesis, NO proposals, NO assumptions, NO speculations, NO generalizations
  • /load
## Description
"Load previous findings from the `steps` file in the `aiplans` folder."

## Goal
1. you MUST re-read the current `steps` file first
2. then create new to-do list
    * do not focus only on the last step
    * assess the whole context
    * think about the user previous guides
    * re-think what to do
    * create new to-do list

## Rules
1. Golden rule: Be concise in answers
2. Use a simple light-weight language
3. Do NOT do what you are not asked for
4. Your work must be grounded exclusively on a specific codebase, not on assumptions
5. Actively use `codebase_search` tool
6. Follow your rules for the current project in `@/.roo/rules/rules.md` if they exist 
  • Fine tuning all aspects of Roo
  • configure models based on the tasks they excel at e.g
image
  • Set Up MCP servers to provide even more context e.g.
    • context7 - framework documentation
    • playwright - browser interaction
    • brave search - search the web
    • git - provide full git context i.e. branches, tree, commits, etc.
  • Possibly pay one time credit of 10 dollars to OpenRouter to unlock 1000 free requests per day.
    • Confirmed that this does work.
  • Unsurprisingly, it is useful to develop a toolchain / workflow as described here

Further Reading

This is just my preferred setup at the moment and there are tons and tons of additional resources out there. As always the reccomendation is to explore for yourself, try new things, and see what works for you.

AI Impact Education (Chat GPT is used as a synechdoche for general llms / AI chat tooling.):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment