jennymaeleidig/resources.md

## resources.md

      
    Raw
  

              resources.md
            
          
    What

Tools


continue - code completion / ai agent extension (preferrably vscode)
ollama - simple tool to run llms locally
llama.cpp - alternative to ollama, (ollama is a cpp wrapper). more extensible / configurable

More here.
NOTE: I have not tested this method properly.


roo code -  ai agent extension (preferrably vscode)

useful for interactions with github co-pilot vscode llm api
be sure to configure terminal integration as it is quite useful in giving the llm context to help you debug


github co-pilot - ai agent extension (preferrably vscode)

education tier is pro for free!


Local Models (As of 8/11/25)

NOTE: These are all running (not concurrently) on my m3 macbook air w/ 24 gb unified memeory, hence the smaller model params.

Qwen3-Coder-30B-A3B-Instruct-GGUF - Go to agent for coding generation / instruct
Mistral-Small-3.2-24B-Instruct-2506-GGUF - General purpose
gemma-3n-E4B-it-GGUF - Alternative lightweight general purpose
Magistral-Small-2507-GGUF - Higher level reasoning / thinking
Devstral-Small-2507_gguf - Go to agent for coding reasoning / thinking / planning
Mistral-Nemo-Instruct-2407-GGUF - Multi-lingual reasoning / world knowledge
Codestral-22B-v0.1-GGUF - Go to for tab / code completion
nomic-embed-text:latest - for local codebase indexing / semantic search

Remote / Deployed Models (as of 8/17/25)


Claude 4 Sonnet (via github copilot) - for architect, code, debug mode

Note: This is rate limited according to github's request modifiers


GPT-4.1 (via github copilot) - really just the code mode, but it is free on the education / pro plan
Gemini Pro / Flash (free tier) - for orchestration and architecting
Qwen3 Coder (free tier via Operouter) - for architect, code, debug mode

Note: While a very effective model, the free tier is of course rate limited, so experience may vary.


GPT-5 / GPT-5 Mini (paid, rarely used but cost effective model) - general purpose

Why

✨ A I — L L M — B I G D A T A — B U Z Z W O R D S ✨
Look, I'm a certified AI skeptic. Its a tech bubble, its happened before, its gonna happen again. And like with each bubble, there is some pretty interesting and exciting technology hidden throughout the buckets of slop. I've tried to comb through the acrid goop to try and find some of these tools, and this gist is the brief summary of my findings. To me, code / tab completion is just the logical next step in LSP tech, coding languages are inherently rhytmic and structured, and LLMs are text pattern matching goblins. As for the more agentic / chat approach I am somewhat more hesitant in full adoption at the momment. Rife with confirmation and training bias, these bots can't help but stroke your ego and it really really irks me. The hardcoded subservientness is a bit anti-thetical to my worldview (a topic for another time, see berger's laws of chatbotics), so for the most part, im going to try and find the one that, well, does it the least; I would describe its pathos as obsequious and asinine. Ultimately, they're a handly little tool to have at the ready if youre stumped and need to rubber duck a problem, or to perform some menial task while you can turn your attention elsewhere (reason we made robots). As for this gist itself and the motivation behind local llms, well its not exactly a shocker that the big tech comapnies use and abuse your data, overcharge for their services, overpromise and underdeliver, and all around fail to treat you with dignity and respect. BUT, you can fight back by running these models, on your own machine on your own terms as all software should. I also provide a deployed / cloud based model solution as that is useful on machines that cannot phyiscally run the models meaningfully otherwise.
How

Local Setup (macOS)


Install Ollama

brew install ollama


(Optional) Copy the simple completion script to your bash completions folder.


Pull your model of choice for

Coding Agent, e.g


ollama pull hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:Q4_K_XL

Tab Completion, e.g

ollama pull hf.co/mradermacher/Codestral-22B-v0.1-GGUF:IQ3_XS

Codebase Context Provider

ollama pull nomic-embed-text


Install continue on your platform of choice, being sure to disable telemetry.

VS Code
Jetbrains
CLI (Alpha)


In the continue config.yaml

(Optional) You can set continue up with a remote Codestral instance for free to save on local resources. Be sure to read the terms and conditions of use.


name: Local Assistant
version: 1.0.0
schema: v1
models:
  - name: Codestral
    provider: mistral
    model: codestral-latest
    apiKey: { FILL IN IN }
    roles:
      - autocomplete
    # ^ if you dont want to use the remote codestral api, comment out this section.
  - name: Codestral Local
    provider: ollama
    model:
      {
        FILL WITH NAME OF AUTO COMPLETE MODEL INSTALLED ABOVE,
        E.G. hf.co/mradermacher/Codestral-22B-v0.1-GGUF:IQ3_XS,
      }
    roles:
      - autocomplete
  - name: Autodetect
    provider: ollama
    model: AUTODETECT
    # ^ This will allow you to select between models in the continue UI.
  - name: Nomic Embed Text
    provider: ollama
    model: nomic-embed-text
    roles:
      - embed
    # ^ This will enable index / provide context for the local project / file sytstem (also local)
context:
  - provider: code
  - provider: docs
  - provider: diff
  - provider: terminal
  - provider: problems
  - provider: folder
  - provider: codebase
  # ^ The codebase context provider needs some extra config to work properly. See all above.
  - provider: url
  # ^ Be sure to add this context provider as its quite useful. See all available here: https://docs.continue.dev/customize/custom-providers
Current Preferred Setup


Roo Code extension.
Using the VS Code LLM API
Using Github Copilot Pro. Free via Education Plan
Be sure to set up local codebase indexing with Roo

Use nomic-embed-text:latest as described above

Model Dimension is: 768


Allows semantic search on top of syntactic.
I reccomend the docker-compose approach to persist your indexing computations:
qdrant_config


log_level: INFO

service:
  # Reduce max request size to prevent memory issues
  max_request_size_mb: 16
  max_workers: 1

cluster:
  enabled: false

# Disable telemetry
telemetry_disabled: true
docker-compose
services:
  qdrant:
    image: qdrant/qdrant:latest
    ports:
      - "6333:6333"
      - "6334:6334"
      - ./qdrant_config.yaml:/qdrant/config/production.yaml
    user: root
    environment:
      - QDRANT__STORAGE__ON_DISK_PAYLOAD=true
      - QDRANT__STORAGE__PERFORMANCE__MAX_OPTIMIZATION_THREADS=1
      - QDRANT__STORAGE__HNSW_INDEX__ON_DISK=true
      - QDRANT__STORAGE__OPTIMIZERS__MAX_OPTIMIZATION_THREADS=1
      - QDRANT__SERVICE__MAX_WORKERS=1
    command: ["./qdrant", "--config-path", "config/production.yaml"]
Additional Tooling Configuration


Now that gpt-5 has released, more cost effective solution via gpt mini
Fine tune context saving via custom commands

/save


## Description
"Save findings into a `steps` file in the `aiplans` folder"

## Goal
* during the previous conversation we have important outcomes:
    * initial prompt and further user input/corrections
    * findings
    * plans
    * insigts
    * observation
    * decisions
* save them as facts (with a great details) into the new `steps` file

## Command Variants
* `/save new` command is used to create a new `steps` file in the `aiplans` folder
    * if the `aiplans` folder does not exist, create the folder in the current project's root directory
    * file name format `<YYMMDD>-<ID>-step-<Task_Code>-<Task_name>.md`
    * create the new `steps` file if we don't have any during the current conversation yet
    * initial user prompt must be set at the beginning of the new file with caption `# Step 0. Inital Prompt`, preserve the original text
* `/save` command is used to append outcomes to the same `steps` file we are working on
    * use `insert_content` tool to add the latest findings to the end of the investigation file

## Content
1. Structure:
        outcomes must be put into the new chapter called `# Step {NUMBER_OF_THE_STEP}`
        you must fit all outcomes in the ONE chapter, do not split it into several chapters
        feel free to use multiple sub-sections inside the chapter
2. Summary: Describe the curent step summary and general flow of the investigation
3. Facts: your main goal is describing of outcomes as facts (facts, facts!, FACTS!) with a great details
4. User Input: note the user's input and in which direction the user wants to go
5. Avoids: NO conclusions, NO hypothesis, NO proposals, NO assumptions, NO speculations, NO generalizations

/load

## Description
"Load previous findings from the `steps` file in the `aiplans` folder."

## Goal
1. you MUST re-read the current `steps` file first
2. then create new to-do list
    * do not focus only on the last step
    * assess the whole context
    * think about the user previous guides
    * re-think what to do
    * create new to-do list

## Rules
1. Golden rule: Be concise in answers
2. Use a simple light-weight language
3. Do NOT do what you are not asked for
4. Your work must be grounded exclusively on a specific codebase, not on assumptions
5. Actively use `codebase_search` tool
6. Follow your rules for the current project in `@/.roo/rules/rules.md` if they exist 

Fine tuning all aspects of Roo
configure models based on the tasks they excel at e.g


Set Up MCP servers to provide even more context e.g.

context7 - framework documentation
playwright - browser interaction
brave search - search the web
git - provide full git context i.e. branches, tree, commits, etc.


Possibly pay one time credit of 10 dollars to OpenRouter to unlock 1000 free requests per day.

Confirmed that this does work.


Unsurprisingly, it is useful to develop a toolchain / workflow as described here

Further Reading

This is just my preferred setup at the moment and there are tons and tons of additional resources out there. As always the reccomendation is to explore for yourself, try new things, and see what works for you.

Continue Features
Fine Tuning Models
How to Run Qwen Coder Locally
Continue Autocomplete Model Setup
Pull Private Models from Hugging Face w/ Ollama
Setting up Continue with Ollama
LLM Benchmarking

AI Impact Education (Chat GPT is used as a synechdoche for general llms / AI chat tooling.):

We Need AI Regulation, Not Like This
Your Are Being Bought and Sold
Chat GPT is Becoming a Relgion
Chat GPT is Affecting Language
The False Promise of Chat GPT
Chat GPT is Replacing Therapists
AI Will Tell You What You Want to Hear
Chatbots Can Go Into a Delusional Spiral. Here’s How It Happens.
No results found