Ziggoto/LLMs.md

## LLMs.md

      
    Raw
  

              LLMs.md
            
          
    Backends:

Local


Transformers
Ollama
llama.cpp
ExLlamaV2
AutoGPTQ
AutoAWQ
TensorRT-LLM

docs about inference backends: https://www.bentoml.com/blog/benchmarking-llm-inference-backends
Cloud


BentoML

Frontends:


oobabooga
Stable Diffusion web UI
SillyTavern
LM Studio
Axolatotl
GPT4all
Open WebUI

I've used this one


Enchanted

Mac native


Frameworks/Libs

High-level


Langchain (TS & Python)
LLamaindex (TS & Python)
ModelFusion (TS)
Haystack (Python)

Used by AWS, Nvidia, IBM, Intel


CrewAI (Python)
Transformers (Python)

Made by HuggingFace


Low-level


PyTorch
Tensorflow
JAX

Miscelaneous


vokturz/can-it-run-llm
nyxkrage/gguf-vram-calculator
QLoRA

For fine-tuning models


Benchmarks


LMSYS Chatbot Arena Leaderboard

Youtube channels about AI:


bycloud
HuggingFace
Fireship

Not exclusively about LLMs/AI


David Ondrej


About models

Models are usually saved on one of these formats:

GGUF

It's a sucessor of GGML
Tech doc about GGUF (from HuggingFace)


GGML
Safetensors
Exl2
AWQ

These files contains contexts used by the LLMs
1 tokens ~= 0.75 words
Quantization algorithms


Q4_0
Q4_1
Q5_0
Q5_1
Q8_0

K-means Quantizations


Q3_K_S
Q3_K_M
Q3_K_L
Q4_K_S
Q4_K_M
Q5_K_S
Q5_K_M
Q6_K


## Ollama.md

      
    Raw
  

              Ollama.md
            
          
    Summary

Tool for using LLMs locally
It has libraries for Python and Javascript
Supports customisation: it can create a model from a GGUF file or using PyTorch and Safetensors

Useful links:


Github page: https://github.com/ollama/ollama
Official docs: https://github.com/ollama/ollama/blob/main/docs/README.md


Model file:

docs: https://github.com/ollama/ollama/blob/main/docs/modelfile.md
The template property uses ChatML with Go Templates. Official doc
No results found