- ==Current: Flash: 5/10 RPM, 20 RPD, 250K TPM | Gemma: 30 RPM + 14.4K RPD + 15K TPM==
| Model | Category | RPM | TPM | RPD |
|---|---|---|---|---|
| Gemini 2.5 Flash | Text-out models | 5 | 250K | 20 |
| Gemini 2.5 Flash Lite | Text-out models | 10 | 250K | 20 |
| Gemini 2.5 Flash Native Audio Dialog | Live API | Unlimited | 1M | Unlimited |
| Gemini 2.5 Flash TTS | Multi-modal models | 3 | 10K | 10 |
| Gemini 3 Flash | Text-out models | 5 | 250K | 20 |
| Gemini 3.1 Flash Lite | Text-out models | 15 | 250K | 500 |
| Gemini Embedding 1 | Other models | 100 | 30K | 1K |
| Gemini Embedding 2 | Other models | 100 | 30K | 1K |
| Gemini Robotics ER 1.5 Preview | Other models | 10 | 250K | 20 |
| Gemma 3 12B | Other models | 30 | 15K | 14.4K |
| Gemma 3 1B | Other models | 30 | 15K | 14.4K |
| Gemma 3 27B | Other models | 30 | 15K | 14.4K |
| Gemma 3 2B | Other models | 30 | 15K | 14.4K |
| Gemma 3 4B | Other models | 30 | 15K | 14.4K |
- requests per minute (RPM)
- tokens per minute (TPM)
- requests per day (RPD)
Options:
- Web: https://gemini.google.com/app
- Web: https://aistudio.google.com
- API: https://aistudio.google.com/api-keys
- API Docs: https://ai.google.dev/gemini-api/docs
- API Support: LiteLLM, Opencode, Kilocode.
- CLI: https://geminicli.com/
- Android: https://play.google.com/store/apps/details?id=com.google.android.apps.bard
- IOS: https://apps.apple.com/us/app/google-gemini/id6477489729
- ==Current: Cloud deployment - 5 Hour Session Usage + 5 Days Weekly Usage (LIMITS)==
Ollama Cloud imposes tiered rate limits to manage capacity and prevent abuse. These vary by plan, with no exact numerical quotas publicly detailed beyond general descriptions.
Designed for light usage like chat and quick model tests. Includes hourly and daily caps, plus per-minute restrictions on rapid API calls.
Free Tier: Ollama Cloud Usage Session usage - Resets in 5 hours Weekly usage - Resets in 5 days
Options:
- API: https://ollama.com/settings/keys
- API Docs: https://docs.ollama.com/integrations (Integrations)
- API Support: LiteLLM, Opencode, Kilocode.
- ==Current: 20 RPM + 1K RPD, :free Models==
OpenRouter uses global, credit-based limits plus a few hard caps. openrouter
- Requests using
:freemodel variants are capped at 20 requests per minute. openrouter - If you have bought less than 10 credits, you can send 50 free-model requests per day; with at least 10 credits purchased, this increases to 1000 per day. openrouter
- Each API key has a credit limit (or unlimited), a reset schedule, and remaining credits, all retrievable via
GET https://openrouter.ai/api/v1/key. openrouter - The key metadata also tracks total, daily, weekly, and monthly usage for both regular and BYOK (bring your own key) traffic, and flags whether the account is on the free tier. openrouter
- Making multiple accounts or API keys does not bypass rate limits, which are enforced globally per user. openrouter
- Cloudflare DDoS protection can temporarily block IPs that send traffic far above normal patterns, and a negative credit balance can trigger 402 errors even on free models until credits are added. openrouter
Options:
- Web: https://openrouter.ai/chat
- API: https://openrouter.ai/settings/keys
- API Docs: https://openrouter.ai/docs/api/reference/overview
- API Support: LiteLLM, Opencode, Kilocode.
- ==Current: Super-Fast, 30-60 RPM, 250-1K RPD, 100-500k TPD==
| MODEL ID | RPM | RPD | TPM | TPD | ASH | ASD |
|---|---|---|---|---|---|---|
| allam-2-7b | 30 | 7K | 6K | 500K | - | - |
| canopylabs/orpheus-arabic-saudi | 10 | 100 | 1.2K | 3.6K | - | - |
| canopylabs/orpheus-v1-english | 10 | 100 | 1.2K | 3.6K | - | - |
| groq/compound | 30 | 250 | 70K | - | - | - |
| groq/compound-mini | 30 | 250 | 70K | - | - | - |
| llama-3.1-8b-instant | 30 | 14.4K | 6K | 500K | - | - |
| llama-3.3-70b-versatile | 30 | 1K | 12K | 100K | - | - |
| meta-llama/llama-4-scout-17b-16e-instruct | 30 | 1K | 30K | 500K | - | - |
| meta-llama/llama-prompt-guard-2-22m | 30 | 14.4K | 15K | 500K | - | - |
| meta-llama/llama-prompt-guard-2-86m | 30 | 14.4K | 15K | 500K | - | - |
| moonshotai/kimi-k2-instruct | 60 | 1K | 10K | 300K | - | - |
| moonshotai/kimi-k2-instruct-0905 | 60 | 1K | 10K | 300K | - | - |
| openai/gpt-oss-120b | 30 | 1K | 8K | 200K | - | - |
| openai/gpt-oss-20b | 30 | 1K | 8K | 200K | - | - |
| openai/gpt-oss-safeguard-20b | 30 | 1K | 8K | 200K | - | - |
| qwen/qwen3-32b | 60 | 1K | 6K | 500K | - | - |
| whisper-large-v3 | 20 | 2K | - | - | 7.2K | 28.8K |
| whisper-large-v3-turbo | 20 | 2K | - | - | 7.2K | 28.8K |
Groq enforces per-organization limits on requests, tokens, and audio duration, plus response headers to help you throttle when you hit caps. console.groq
- Metrics: requests per minute/day (RPM, RPD), tokens per minute/day (TPM, TPD), and audio seconds per hour/day (ASH, ASD) for speech models. console.groq
- Cached tokens from prompt caching do not count against token limits. console.groq
- Limits apply at the organization level, so all API keys and users in that org share the same quotas. console.groq
| Header | Value | Notes |
|---|---|---|
| retry-after | 2 | In seconds |
| x-ratelimit-limit-requests | 14400 | Always refers to Requests Per Day (RPD) |
| x-ratelimit-limit-tokens | 18000 | Always refers to Tokens Per Minute (TPM) |
| x-ratelimit-remaining-requests | 14370 | Always refers to Requests Per Day (RPD) |
| x-ratelimit-remaining-tokens | 17997 | Always refers to Tokens Per Minute (TPM) |
| x-ratelimit-reset-requests | 2m59.56s | Always refers to Requests Per Day (RPD) |
| x-ratelimit-reset-tokens | 7.66s | Always refers to Tokens Per Minute (TPM) |
- If you exceed limits, the API returns HTTP 429 with a
retry-afterheader indicating how many seconds to wait before retrying; this header appears only when a rate limit is hit. console.groq
Options:
- Web: https://console.groq.com/playground
- API: https://console.groq.com/keys
- API Docs: https://console.groq.com/docs/api-reference#chat
- API Support: LiteLLM, Opencode.
- ==Current: 20 RPM (Each Model) + 1000 RPM month (All)==
| Model | Trial rate limit | Production rate limit |
|---|---|---|
| Command A Reasoning | 20 req / min | Contact sales |
| Command A Translate | 20 req / min | Contact sales |
| Command A Vision | 20 req / min | Contact sales |
| Command A | 20 req / min | 500 req / min |
| Command R+ | 20 req / min | 500 req / min |
| Command R | 20 req / min | 500 req / min |
| Command R7B | 20 req / min | 500 req / min |
| Endpoint | Trial rate limit | Production rate limit |
|---|---|---|
| Embed | 2,000 inputs / min | 2,000 inputs / min |
| Embed (Images) | 5 inputs / min | 400 inputs / min |
| Rerank | 10 req / min | 1,000 req / min |
| Tokenize | 100 req / min | 2,000 req / min |
| EmbedJob | 5 req / min | 50 req / min |
| Default (other) | 500 req / min | 500 req / min |
- There are two API key types: evaluation (trial) keys and production keys; trial keys and production keys on newer chat models are capped at ==1,000 API calls per month==. docs.cohere
- Chat models generally allow ==20 requests per minute== on trial keys, while mature production models (Command A, R, R+, R7B) allow up to 500 requests per minute; newer Command A variants require contacting sales for production limits. docs.cohere
- ==Non-chat endpoints have their own per-minute caps==, with Embed text sharing the same limits for trial and production, while others like Rerank and Tokenize get much higher production limits than trial. docs.cohere
- Any endpoint not explicitly listed falls under a default limit of 500 requests per minute for both trial and production keys, and rate-limit increases require contacting Cohere support. docs.cohere
Options:
- Web: https://dashboard.cohere.com/playground/chat
- API: https://dashboard.cohere.com/api-keys
- API Docs: https://docs.cohere.com/reference/about
- API Support: LiteLLM, Opencode.
- ==Current: Low: 15 RPM + 150 RPD | High: 10 RPM + 50 RPD | 8K IN, 4k OUT TPR==
| Tier / Model group | Metric | Copilot Free |
|---|---|---|
| Low | Requests per minute | 15 |
| Low | Requests per day | 150 |
| Low | Tokens per request | 8000 in, 4000 out |
| Low | Concurrent requests | 5 |
| High | Requests per minute | 10 |
| High | Requests per day | 50 |
| High | Tokens per request | 8000 in, 4000 out |
| High | Concurrent requests | 2 |
- Free playground and API usage are limited by requests per minute, requests per day, tokens per request, and concurrent requests, with different tiers for low, high, and embedding models. docs.github
- Special model groups (Azure OpenAI families, DeepSeek, xAI Grok) have stricter RPM/RPD and concurrency but can offer larger output token caps at higher tiers. docs.github
Options:
- Web: https://github.com/marketplace/models
- API: https://github.com/settings/personal-access-tokens
- API Docs: https://docs.github.com/en/github-models
- ==Current: 1 RPS, 500k TP-Min, 1KKK TP-Month==
| Feature | Free plan value | Notes |
|---|---|---|
| Price | €0 / $0 | Personal use for life and work. |
| Flash answers | Up to 150 per day | Quick “flash” responses. |
| Web searches | Base quota | Paid plans allow “Up to 5x Free”. |
| Think mode | Base quota | Paid plans: “Up to 30x Free”. |
| Deep research (preview) | Base quota | Paid plans: “Up to 5x Free”. |
| Memories | 500 | Saved and recallable user memories. |
| Libraries / storage | Limited | Higher tiers raise to 15–30 GB. |
| Document uploads | Base quota | Paid: “Up to 20x Free”. |
| Image generation | Base quota | Paid: “Up to 40x Free”. |
| Code interpreter | Base quota | Paid: “Up to 5x Free”. |
| Projects | Unlimited | Can group chats into projects. |
| Connectors directory | Full access | Access to connectors directory. |
| Custom MCP connectors | Not included | Marked as “Custom” only on higher tiers. |
| Voice / canvas / agents | Limited or not listed | Described as “Custom” mainly for paid tiers. |
| Customer support | Help center only | No chat/email support on Free. |
- The Free plan gives access to Mistral’s SOTA models in Le Chat but applies hard caps on usage-based features such as ==flash answers (150/day)==, web searches, extended “think” reasoning, deep research, document uploads, image generation, and code interpreter runs, all at a baseline level that paid plans multiply (for example “Up to 40x Free” for images).
- You can store up to 500 memories, have unlimited projects, and full access to the connectors directory, but storage (libraries) is limited and advanced enterprise features like domain verification, audit logs, SAML SSO, and white-labeling are not available. mistral
Options:
- Web: https://console.mistral.ai/build/playground
- API: https://admin.mistral.ai/organization/api-keys
- API Docs: https://docs.mistral.ai/api
- API Support: LiteLLM, Opencode, Kilocode.
- CLI: https://console.mistral.ai/codestral/cli
- Code: https://console.mistral.ai/codestral
- Android: https://play.google.com/store/apps/details?id=ai.mistral.chat
- IOS: https://apps.apple.com/us/app/le-chat-by-mistral-ai/id6740410176
- ==Current: 30 RPM + 900 RPH + 14k RPD + 1000K TPD==
| Model | TPM | TPH | TPD | RPM | RPH | RPD |
|---|---|---|---|---|---|---|
gpt-oss-120b |
64,000 | 1,000,000 | 1,000,000 | 30 | 900 | 14,400 |
llama3.1-8b |
64,000 | 1,000,000 | 1,000,000 | 30 | 900 | 14,400 |
qwen-3-235b-a22b-instruct-2507 |
64,000 | 1,000,000 | 1,000,000 | 30 | 900 | 14,400 |
zai-glm-4.7 |
64,000 | 1,000,000 | 1,000,000 | 10 | 100 | 100 |
- Due to high demand on
zai-glm-4.7andqwen-3-235b-a22b-instruct-2507, we’ve temporarily reduced free-tier rate limits. We’re working to bring it back as quickly as possible and appreciate your understanding.
-
Cerebras measures limits on both requests (RPM, RPH, RPD) and tokens (TPM, TPH, TPD), and whichever limit you hit first will throttle further calls. inference-docs.cerebras
-
Free‑tier org limits for most models are ==30 RPM, 900 RPH, 14.4K RPD and 60K TPM, 1M TPH, 1M TPD==, while the heavier
zai-glm-4.7model is restricted to ==10 RPM, 100 RPH, and 100 RPD== with the same token caps. inference-docs.cerebras -
Every response includes rate‑limit headers (limit, remaining, and reset for requests/day and tokens/minute); exceeding limits returns HTTP 429, and increasing quotas requires contacting Cerebras. inference-docs.cerebras
-
TPM (tokens/min); TPH (tokens/hour); TPD (tokens/day)
-
RPM (requests/min); RPH (requests/hour); RPD (requests/day)
Options:
- Web: https://cloud.cerebras.ai/platform
- API: https://cloud.cerebras.ai/platform/
- API Docs: https://inference-docs.cerebras.ai/api-reference/chat-completions
- API Support: LiteLLM, Opencode.
- ==Current: 1K RPM; 100K RPD; Free Workers: 10k Neuros/day==
| Feature | Workers Free | Workers Paid |
|---|---|---|
| Request | 100,000 requests/day 1000 requests/min |
No limit |
| Worker memory | 128 MB | 128 MB |
| CPU time | 10 ms | 5 min HTTP request 15 min Cron Trigger |
| Duration | No limit | No limit for Workers. 15 min duration limit for Cron Triggers, Durable Object Alarms and Queue Consumers |
Cloudflare Workers Free gives you limited daily traffic and tighter per-request resources compared to paid plans. developers.cloudflare
- You can handle up to ==100,000 requests per day and 1,000 requests per minute per account==, after which routes either bypass the Worker or return a 1027 error depending on your fail-open/closed setting. developers.cloudflare
- Each Worker invocation can make 50 subrequests, open 6 simultaneous outgoing connections, and use up to 128 MB memory with a 1-second startup limit and a 3 MB compressed bundle size. developers.cloudflare
- You can have 100 Workers per account, each with up to 64 environment variables of 5 KB each, and serve static assets with up to 20,000 files per Worker version and 25 MiB per file. developers.cloudflare
- Cache API usage is limited to 50 cache operations (put/match/delete) per request, and the maximum HTTP request body size is 100 MB on the Free Cloudflare plan. developers.cloudflare Options:
- API: https://dash.cloudflare.com/profile/api-tokens
- API Docs: https://developers.cloudflare.com/workers-ai/get-started/workers-wrangler/
- API Support: LiteLLM, Opencode.
- ==Current: $0.10 / month usage==
| Plan | API requests / 5 min | Resolver requests / 5 min | Pages requests / 5 min |
|---|---|---|---|
| Free user | 1,000 * | 5,000 * | 200 * |
-
- Anonymous and Free user limits may change over time depending on platform health. huggingface
- Hugging Face defines separate buckets for Hub API, Resolvers (file downloads via
/resolve/), and Pages, each with its own quota over a 5‑minute fixed window. huggingface - A Free user can make about 1,000 API calls, 5,000 resolver calls, and 200 page views per 5‑minute window, but these values are explicitly marked as subject to change for Free and anonymous users. huggingface
- When you hit a limit, the Hub returns HTTP 429 with standardized
RateLimitandRateLimit-Policyheaders that describe remaining quota and reset time, and the recommended mitigation is to authenticate withHF_TOKEN, spread requests over time, or prefer Resolver endpoints where possible. huggingface
Options:
- Web: https://huggingface.co/playground
- API: https://huggingface.co/settings/tokens
- API Docs: https://huggingface.co/docs/inference-providers/index
- API Support: LiteLLM, Opencode.
- ==Current: UNLIMITED: 40 RPM==
NVIDIA does not actually publish concrete free‑tier model limits for NIM in this thread, so there is no detailed table beyond the one mentioned value. forums.developer.nvidia
| Aspect | Trial / free experience detail |
|---|---|
| Published per‑model limits | Not published |
| Example mentioned by the user | 40 requests per minute (from NVIDIA API catalog trial) |
| Token / context window limits | Not disclosed |
- An NVIDIA staff moderator confirms that they do not publish specific limits for each NIM model, even for the trial / catalog experience. forums.developer.nvidia
- The thread references a 40 requests per minute cap in the catalog trial, but NVIDIA does not confirm or extend this with token or context‑window numbers. forums.developer.nvidia
Options:
- Web: https://build.nvidia.com/models
- API: https://build.nvidia.com/settings/api-keys
- API Docs: https://docs.api.nvidia.com/nim/reference/llm-apis
- API Support: LiteLLM, Opencode.
- ==Current: $5/- Per Month==
Vercel AI Gateway’s free tier is credit-based rather than having explicit RPM/TPM quotas. vercel
| Plan | Monthly credit | When free ends |
|---|---|---|
| AI Gateway | $5 | When you purchase any AI Credits |
- Every team gets $5 of AI Gateway Credits every 30 days after making the first AI Gateway request; these credits can be used on any provider/model in the catalog at standard, no‑markup rates. vercel
Options:
- Web: https://ai-sdk.dev/playground
- API: https://vercel.com/.../~/ai-gateway/api-keys
- API Docs: https://vercel.com/docs/ai-gateway
- API Support: LiteLLM.
- ==Current: UNLIMITED: 1 RPS — till 429 error==
| Limit type | Value / behavior |
|---|---|
| Concurrent requests | 1 request per user at a time (global concurrency limit) |
| Over-limit behavior | Additional requests return HTTP 429 |
| Pricing | Free to use |
| Streaming requests | Tokens released immediately after active cancellation |
| Non-streaming | Model continues running; tokens released only after completion, even if canceled |
- The platform allows only one active request per user at any time; any extra concurrent request receives a 429 error. platform.iflow
- Usage is currently completely free, but users are asked to avoid unnecessary high concurrency to protect shared resources. platform.iflow
- The list of Models: https://platform.iflow.cn/en/models
Options:
- Web: https://iflow.cn/
- API: https://iflow.cn/?open=setting
- API Docs: https://platform.iflow.cn/docs
- CLI: https://cli.iflow.cn/
- ==Current: $0/- Per Month==
Perplexity Pro affects billing/credits, not API rate limits directly. perplexity
| Plan | Monthly API credits | Credit refresh timing | Notes |
|---|---|---|---|
| Perplexity Pro | $0 | 1st day of each month | Auto-applied |
- Pro subscribers automatically receive $0 in API credits on the first day of each month, and new subscribers see the credit appear within about 10–20 minutes after subscribing.
Options:
- Web: https://www.perplexity.ai/
- Web: https://www.perplexity.ai/account/api/playground/search
- Browser: https://www.perplexity.ai/comet
- API: https://www.perplexity.ai/account/api/keys
- API Docs: https://docs.perplexity.ai/docs/agent-api/quickstart
- API Support: LiteLLM, Opencode.
- ==Current: 10 RPH, 50K Credits-Per-Day, $0.025/- Per Day==
| Plan type | Requests | Models available | Credit / cost limits | Notable exclusions |
|---|---|---|---|---|
| Free Unverified | 10 requests per hour | Gemma 3 4b, Gemma 3 12b, Gemma 3n 4b | No explicit daily credit cap mentioned | All other models |
| Free Verified | 10 req/hour on FREE models; 10 req/day on cheap paid models | Any model costing < 50,000 credits per request | 50,000 credits per day; model cost ≤ $0.025 or 50,000 credits | Audio, video, and most image models (typically > 50,000 credits) |
- On Free Unverified, new users get 10 requests per hour and can only use three Gemma 3 family models (4b, 12b, 3n‑4b), with no broader model access. help.aimlapi
- On Free Verified, after adding a payment method, you can use any model that costs under 50,000 credits per request, but you are limited to 10 requests per hour on FREE‑label models, 10 requests per day on low‑cost paid models, and a total of 50,000 credits per day. help.aimlapi
- Even when verified, audio, video, and most image models are not free, because a single call to those models usually exceeds the 50,000‑credit threshold. help.aimlapi
Options:
- Web: https://aimlapi.com/app
- API: https://aimlapi.com/app/keys
- API Docs: https://docs.aimlapi.com/
- API Docs: https://docs.aimlapi.com/integrations/our-integration-list (Integrations)
- ==Current: UNLIMITED: 1 RPS - Free 2 Models==
Zen does not have a classic “free tier” with rate limits; instead, some models are priced at $0 per 1M tokens, and the platform lets you cap spend with monthly limits. opencode
| Model | Input price / 1M tokens | Output price / 1M tokens | Cached read / 1M tokens | Cached write / 1M tokens | Notes |
|---|---|---|---|---|---|
| Big Pickle | Free | Free | Free | – | Free during limited beta |
| GPT 5 Nano | Free | Free | Free | – | Always-free listed model |
| MiMo V2 Flash Free | Free | Free | Free | – | Free during limited beta |
| Nemotron 3 Super Free | Free | Free | Free | – | Free during limited beta |
| MiniMax M2.5 Free | Free | Free | Free | – | Free during limited beta |
- Zen is pay‑as‑you‑go; there is no fixed free allowance, but models like Big Pickle and GPT 5 Nano are currently priced at $0 for input, output, and cached reads, so they effectively function as free to use while available. opencode
Options:
- API: https://opencode.ai/workspace/
- API Docs: https://opencode.ai/docs/zen/
- API Support: Opencode.
- ==Current: UNLIMITED: 1 RP15S - Free 2 Models==
Pollinations.ai provides a free, open-source API alternative compatible with OpenAI-style endpoints for text, image, and audio generation. pollinations
No signup or API keys are needed for basic use, prioritizing privacy with zero data storage. github
- Supports models like GPT-5, Claude, Gemini for text; Flux, GPT Image for images; and OpenAI-audio for speech. pollinations-ai
- Fully open-source on GitHub, used in 500+ community projects including bots and apps.
- Earn daily "Pollen Credits" via contributions for advanced access; basic is unlimited and free. pollinations
Access via simple URLs:
- Images:
https://pollinations.ai/p/your prompt heregithub - Text:
https://text.pollinations.ai/your promptgithub - Audio: Append
?model=openai-audio&voice=novato text URL. github
Integrate in code, like Python requests for images or React hooks for apps. github
Offers OpenAI-compatible text models listable via API; proxy-like access to premium models without direct OpenAI keys. enter.pollinations
Options:
- ==Current: UNLIMITED: 1 RPS — till 429 error==
LongCat is a free LLM API platform compatible with OpenAI and Anthropic formats, offering daily token quotas, several in‑house models, and simple HTTP/SDK integration, with token-output limits but no explicit global QPS limits documented. longcat
-
Supported models include ==LongCat-Flash-Chat, LongCat-Flash-Thinking, LongCat-Flash-Thinking-2601, and LongCat-Flash-Lite==, all available via both API formats, with different performance profiles (general chat, deep thinking, MoE, upgraded deep thinking). longcat
-
Every account gets free daily token quotas: ==500,000 tokens/day== for the Flash-Chat / Flash-Thinking / Flash-Thinking-2601 models, and ==50,000,000 tokens/day== for LongCat-Flash-Lite; quotas reset at Beijing midnight and do not roll over. longcat
-
You can apply via the Usage page to raise the free quota for the first three models up to ==5,000,000 tokens/day== (Flash-Lite is excluded), or email the team with an API key for more quota; both input and output tokens count, and streaming vs non-streaming consume quota equally. longcat
-
==Usage can be checked in real time== on the Usage page, and at the moment the platform is in public beta and does not support paid quota purchases. longcat
-
If you exceed platform rate limits, the ==API returns HTTP 429== with an error payload containing
code: "rate_limit_exceeded",type: "rate_limit_error", and aretry_afterfield (in seconds, example value 60) telling you when to retry. longcat
Options:
- ==60 RPM + 1k RPD; 120 RPM + 1.5k RPD==
For the paid Gemini Code Assist subscriptions (what they effectively treat as the “Gemini Pro” style plans), the CLI lists higher fixed quotas. geminicli
| Plan | Requests / user / day | Requests / user / minute | Models used |
|---|---|---|---|
| Code Assist Standard edition | 1500 | 120 | Gemini family (auto-chosen) |
- Upgrading from the free Google-login tier to Gemini Code Assist Standard raises your quota to 1500 model requests per user per day with 120 requests per user per minute, spread across the Gemini family as selected by the CLI. geminicli
- The Enterprise edition increases only the daily cap (to 2000 requests per user per day) while keeping the same 120 requests per minute limit. geminicli
- These paid tiers give fixed-price, predictable quotas; if you still need more or want to avoid hitting daily caps, you must switch to pay‑as‑you‑go via a Gemini API key or Vertex AI, where limits are governed by API pricing tiers and Vertex quotas rather than these fixed per-user request counts. geminicli
- ==Current: 1 RPS — Till 429 Error Threshold==
The page documents configuration knobs, not hosted rate limits; the only “limits” you can set or hit are things like session turns and token budgets. platform.iflow
| Setting / limit | Default value | What it does |
|---|---|---|
| Current RPS guidance | ~1 RPS until 429 | Recommended request rate |
| maxSessionTurns | -1 | Unlimited turns per chat session |
| tokensLimit | 128000 | Maximum context window length |
| compressionTokenThreshold | 0.8 | Auto-compress |
- iFlow CLI itself does not impose a strict free-tier quota; instead, it recommends keeping around 1 request per second to avoid 429 errors from whichever backend you configure. platform.iflow
- ==Current: Varies by provider: ~60 RPM + ~2,000 RPD on some free plans==
Kilo Code itself does not impose numeric rate limits; it relies on free quotas from external providers and on models that are priced at $0. kilo
| Category | Model / Provider | Cost in Kilo Code |
|---|---|---|
| OpenRouter free models | Qwen3 Coder (free) | Free via OpenRouter |
| OpenRouter free models | Z.AI GLM‑4.5 Air (free) | Free via OpenRouter |
| OpenRouter free models | DeepSeek R1 0528 (free) | Free via OpenRouter |
| OpenRouter free models | MoonshotAI Kimi K2 (free) | Free via OpenRouter |
- ==Current: 1 RPS + 500K TPM - 1B TPM Monthly==
Mistral’s docs describe how limits work but do not publish concrete free‑tier numbers on this page. docs.mistral
| Aspect | Free API tier behavior |
|---|---|
| Availability | Yes, a dedicated free API tier |
| Purpose | Trying and exploring the API, not production workloads |
| Limit types | Requests per second (RPS), tokens per minute/month |
| Scope | Limits applied at workspace level |
| Configuration/visibility | Exact limits shown only in AI Studio “limits” page per workspace |
| Upgrades | Higher tiers provide higher limits; contact support to increase |
- ==Current: UNLIMITED: 1 RPS - Free 2 Models==
Zen does not have a classic “free tier” with rate limits; instead, some models are priced at $0 per 1M tokens, and the platform lets you cap spend with monthly limits. opencode
| Model | Input price / 1M tokens | Output price / 1M tokens | Cached read / 1M tokens | Cached write / 1M tokens | Notes |
|---|---|---|---|---|---|
| Big Pickle | Free | Free | Free | – | Free during limited beta |
| GPT 5 Nano | Free | Free | Free | – | Always-free listed model |
- Zen is pay‑as‑you‑go; there is no fixed free allowance, but models like Big Pickle and GPT 5 Nano are currently priced at $0 for input, output, and cached reads, so they effectively function as free to use while available. opencode
- ==Current: 60 RPM + 2000 RPD - No Token Limits==
| Free option | Requests per day | Requests per minute | Token limits |
|---|---|---|---|
| Qwen OAuth | 2,000 | 60 | No token counting |
- With Qwen OAuth, individual users get 2,000 requests per day and 60 requests per minute, advertised as completely free with no explicit token limit tracking on your side. npmjs
- Regional free tiers add more options: ModelScope offers 2,000 free API calls per day for Mainland China, and OpenRouter offers up to 1,000 free calls per day internationally, each subject to their own backend quotas and terms. npmjs
- Qwen Code warns that a single workflow cycle may trigger multiple API calls, so heavy commands can consume your free request budget faster than expected even though there is no per-call token cap. npmjs
- ==Current: 50 request per month.==
| Plan | Monthly price | Premium requests (chat, agents, reviews, CLI) | Code completions | Models access |
|---|---|---|---|---|
| Free | $0 | 50 per month | 2,000 per month | Haiku 4.5, GPT‑4.1, GPT‑5 mini, and more |
- The Free Copilot plan includes 50 premium requests per month, which are consumed by ==chat, agent mode, code review, coding agents, and Copilot CLI features==; once used up, those premium features stop working until the next month. github
- You also get 2,000 code completions per month in supported editors; after hitting that cap, inline suggestions are disabled until your quota resets. github
- Free users still have access to multiple modern models (such as Claude Haiku 4.5, GPT‑4.1, GPT‑5 mini) but see rate limiting and slower response times during high usage, and they cannot buy extra premium requests or access enterprise governance features. github
- Gemini CLI
- Gemini 3 - gemini-3.1-pro-preview, gemini-3-flash-preview
- Gemini 2.5 - gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite
- OpenAI Codex
- gpt-5.4
- gpt-5.3-codex
- gpt-5.2-codex
- gpt-5.2
- gpt-5.1-codex-max
- gpt-5.1-codex-mini
- IFlow CLI
- GLM-4.7
- iFlow-ROME-30BA3B
- DeepSeek-V3.2
- GLM-5
- Qwen3-Coder-Plus
- Kimi-K2-Thinking
- MiniMax-M2.5
- Kimi-K2.5
- Kimi-K2-0905
- Kilocode CLI
- Kilo Auto Free
- CoreThink
- Giga Potato
- Giga Potato Thinking
- MiniMax M2.5
- Trinity Large Preview
- Stepfun Step 3.5 Flash
- Nemotron 3 Super
- Groq Code Fast 1 Optimied
- Mistral Vibe CLI
- Devstral 2
- Devstral Small
- Local
- OpenCode Zen CLI
- Big Pickle
- GPT 5 Nano
- MiniMax M2.5 Free
- Nemotron 3 Super Free
- MiMo V2 Flash Free
- Trinity Large Preview Free
- Qwen CLI
- Latest Qwen Coder model - Qwen 3.5 Plus
- GitHub Copilot CLI
- Claude Haiku 4.5
- GPT-5 mini
- GPT-4.1
- Cline CLI
- arcee-ai/trinity-large-preview:free
- openrouter/free
Updated on February 6th, 2026 16:55:44
OpenCode connects to over 75 LLM providers via the AI SDK and Models.dev, including OpenAI, Anthropic, ==Google==, Together AI, ==Mistral AI==, ==Groq==, Meta, ==Cohere==, AWS Bedrock, Azure OpenAI, DeepSeek, ==Perplexity API==, ==Hugging Face==, Replicate, Fireworks AI, Lepton AI, Novita AI, AI21 Labs, Aleph Alpha, Bamboo AI, ==Cerebras==, ==Cloudflare Workers AI==, EleutherAI, Falcon, GLM (Zhipu), Google Vertex AI, xAI (Grok), IBM watsonx, Inflection, LightOn, Moonshot AI, ==NVIDIA Nemo==, ==Ollama==, ==OpenRouter==, PaddlePaddle, ==Qwen (Alibaba)==, Stability AI, Yi, and many additional OpenAI-compatible endpoints accessible through custom configurations or OpenCode Zen. opencode
Kilo Code supports over 30 AI providers, including cloud options like Anthropic, OpenAI, ==Google Gemini==, DeepSeek, ==Mistral==, and Zhipu AI; local and self-hosted setups such as ==Ollama==, LM Studio, and any OpenAI-compatible endpoints; AI gateways and routers including ==OpenRouter==, Glama, Requesty, and its own Kilo Gateway; plus enterprise choices like AWS Bedrock and Google Vertex AI. kilo
LiteLLM (often called llmlite in shorthand contexts) is a Python library and proxy server that unifies access to 100+ LLM providers through a single OpenAI-compatible format, proxying calls to endpoints like Bedrock, Azure OpenAI, Vertex AI, ==Google AI Studio==, Anthropic, AWS SageMaker, ==Cohere==, ==Hugging Face==, Replicate, ==Groq==, ==Mistral AI==, ==Cloudflare Workers AI==, DeepInfra, ==Perplexity AI==, DeepSeek, Anyscale, IBM watsonx.ai, Voyage AI, FriendliAI, Galadriel, ==OpenRouter==, ==Ollama==, vLLM, Together AI, Fireworks AI, Novita AI, ==Cerebras==, Baseten, AI21, Aleph Alpha, NLP Cloud, Petals, Dashscope (Qwen), Sarvam.ai, OVHCloud AI Endpoints, ==CometAPI==, CompactifAI, DataRobot, ElevenLabs, Fal AI, Featherless AI, GradientAI, Helicone, Hyperbolic, Infinity, Jina AI, Lambda AI, Llamafile, LM Studio, LlamaGate, Manus, Meta Llama, Milvus, MiniMax, Moonshot AI, Morph, Nebius AI Studio, Novita AI, Nscale, ==NVIDIA NIM==, Oracle OCI, Poe, Predibase, PublicAI, Recraft, RunwayML, SambaNova, SAP Generative AI Hub, Scaleway, Stability AI, Synthetic, Snowflake, Topaz, Triton Inference Server, ==Vercel AI Gateway==, Volcano Engine, Weights & Biases Inference, Xiaomi MiMo, Xinference, Zhipu AI (Z.AI), and many more via custom or OpenAI-compatible setups. docs.litellm