krishnakanthb13/AI Providers.md

## AI Providers.md

      
    Raw
  

              AI Providers.md
            
          
    01. Google Gemini API


==Current: Flash: 5/10 RPM, 20 RPD, 250K TPM | Gemma: 30 RPM + 14.4K RPD + 15K TPM==

Gemini API Rate Limits


Model
Category
RPM
TPM
RPD


Gemini 2.5 Flash
Text-out models
5
250K
20


Gemini 2.5 Flash Lite
Text-out models
10
250K
20


Gemini 2.5 Flash Native Audio Dialog
Live API
Unlimited
1M
Unlimited


Gemini 2.5 Flash TTS
Multi-modal models
3
10K
10


Gemini 3 Flash
Text-out models
5
250K
20


Gemini 3.1 Flash Lite
Text-out models
15
250K
500


Gemini Embedding 1
Other models
100
30K
1K


Gemini Embedding 2
Other models
100
30K
1K


Gemini Robotics ER 1.5 Preview
Other models
10
250K
20


Gemma 3 12B
Other models
30
15K
14.4K


Gemma 3 1B
Other models
30
15K
14.4K


Gemma 3 27B
Other models
30
15K
14.4K


Gemma 3 2B
Other models
30
15K
14.4K


Gemma 3 4B
Other models
30
15K
14.4K


requests per minute (RPM)
tokens per minute (TPM)
requests per day (RPD)

Options:

Web: https://gemini.google.com/app
Web: https://aistudio.google.com
API: https://aistudio.google.com/api-keys

API Docs: https://ai.google.dev/gemini-api/docs
API Support: LiteLLM, Opencode, Kilocode.


CLI: https://geminicli.com/
Android: https://play.google.com/store/apps/details?id=com.google.android.apps.bard
IOS: https://apps.apple.com/us/app/google-gemini/id6477489729


02. Ollama Cloud Models


==Current: Cloud deployment - 5 Hour Session Usage + 5 Days Weekly Usage (LIMITS)==

Ollama Cloud imposes tiered rate limits to manage capacity and prevent abuse. These vary by plan, with no exact numerical quotas publicly detailed beyond general descriptions.
Designed for light usage like chat and quick model tests. Includes hourly and daily caps, plus per-minute restrictions on rapid API calls.
Free Tier: Ollama Cloud Usage
Session usage - Resets in 5 hours
Weekly usage - Resets in 5 days
Options:

API: https://ollama.com/settings/keys

API Docs: https://docs.ollama.com/integrations (Integrations)
API Support: LiteLLM, Opencode, Kilocode.


03. Openrouter Free Models API


==Current: 20 RPM + 1K RPD, :free Models==

OpenRouter uses global, credit-based limits plus a few hard caps. openrouter
Rate limits


Requests using :free model variants are capped at 20 requests per minute. openrouter
If you have bought less than 10 credits, you can send 50 free-model requests per day; with at least 10 credits purchased, this increases to 1000 per day. openrouter

Credits and key info


Each API key has a credit limit (or unlimited), a reset schedule, and remaining credits, all retrievable via GET https://openrouter.ai/api/v1/key. openrouter
The key metadata also tracks total, daily, weekly, and monthly usage for both regular and BYOK (bring your own key) traffic, and flags whether the account is on the free tier. openrouter

Other limitations


Making multiple accounts or API keys does not bypass rate limits, which are enforced globally per user. openrouter
Cloudflare DDoS protection can temporarily block IPs that send traffic far above normal patterns, and a negative credit balance can trigger 402 errors even on free models until credits are added. openrouter

Options:

Web: https://openrouter.ai/chat
API: https://openrouter.ai/settings/keys

API Docs: https://openrouter.ai/docs/api/reference/overview
API Support: LiteLLM, Opencode, Kilocode.


04. Groq API


==Current: Super-Fast, 30-60 RPM, 250-1K RPD, 100-500k TPD==


MODEL ID
RPM
RPD
TPM
TPD
ASH
ASD


allam-2-7b
30
7K
6K
500K
-
-


canopylabs/orpheus-arabic-saudi
10
100
1.2K
3.6K
-
-


canopylabs/orpheus-v1-english
10
100
1.2K
3.6K
-
-


groq/compound
30
250
70K
-
-
-


groq/compound-mini
30
250
70K
-
-
-


llama-3.1-8b-instant
30
14.4K
6K
500K
-
-


llama-3.3-70b-versatile
30
1K
12K
100K
-
-


meta-llama/llama-4-scout-17b-16e-instruct
30
1K
30K
500K
-
-


meta-llama/llama-prompt-guard-2-22m
30
14.4K
15K
500K
-
-


meta-llama/llama-prompt-guard-2-86m
30
14.4K
15K
500K
-
-


moonshotai/kimi-k2-instruct
60
1K
10K
300K
-
-


moonshotai/kimi-k2-instruct-0905
60
1K
10K
300K
-
-


openai/gpt-oss-120b
30
1K
8K
200K
-
-


openai/gpt-oss-20b
30
1K
8K
200K
-
-


openai/gpt-oss-safeguard-20b
30
1K
8K
200K
-
-


qwen/qwen3-32b
60
1K
6K
500K
-
-


whisper-large-v3
20
2K
-
-
7.2K
28.8K


whisper-large-v3-turbo
20
2K
-
-
7.2K
28.8K


Groq enforces per-organization limits on requests, tokens, and audio duration, plus response headers to help you throttle when you hit caps. console.groq
What is limited


Metrics: requests per minute/day (RPM, RPD), tokens per minute/day (TPM, TPD), and audio seconds per hour/day (ASH, ASD) for speech models. console.groq
Cached tokens from prompt caching do not count against token limits. console.groq
Limits apply at the organization level, so all API keys and users in that org share the same quotas. console.groq


Header
Value
Notes


retry-after
2
In seconds


x-ratelimit-limit-requests
14400
Always refers to Requests Per Day (RPD)


x-ratelimit-limit-tokens
18000
Always refers to Tokens Per Minute (TPM)


x-ratelimit-remaining-requests
14370
Always refers to Requests Per Day (RPD)


x-ratelimit-remaining-tokens
17997
Always refers to Tokens Per Minute (TPM)


x-ratelimit-reset-requests
2m59.56s
Always refers to Requests Per Day (RPD)


x-ratelimit-reset-tokens
7.66s
Always refers to Tokens Per Minute (TPM)


If you exceed limits, the API returns HTTP 429 with a retry-after header indicating how many seconds to wait before retrying; this header appears only when a rate limit is hit. console.groq

Options:

Web: https://console.groq.com/playground
API: https://console.groq.com/keys

API Docs: https://console.groq.com/docs/api-reference#chat
API Support: LiteLLM, Opencode.


05. Cohere API


==Current: 20 RPM (Each Model) + 1000 RPM month (All)==

Chat API rate limits (Markdown table)


Model
Trial rate limit
Production rate limit


Command A Reasoning
20 req / min
Contact sales


Command A Translate
20 req / min
Contact sales


Command A Vision
20 req / min
Contact sales


Command A
20 req / min
500 req / min


Command R+
20 req / min
500 req / min


Command R
20 req / min
500 req / min


Command R7B
20 req / min
500 req / min


Other endpoints (Markdown table)


Endpoint
Trial rate limit
Production rate limit


Embed
2,000 inputs / min
2,000 inputs / min


Embed (Images)
5 inputs / min
400 inputs / min


Rerank
10 req / min
1,000 req / min


Tokenize
100 req / min
2,000 req / min


EmbedJob
5 req / min
50 req / min


Default (other)
500 req / min
500 req / min


Summary of rate limits and limitations


There are two API key types: evaluation (trial) keys and production keys; trial keys and production keys on newer chat models are capped at ==1,000 API calls per month==. docs.cohere
Chat models generally allow ==20 requests per minute== on trial keys, while mature production models (Command A, R, R+, R7B) allow up to 500 requests per minute; newer Command A variants require contacting sales for production limits. docs.cohere
==Non-chat endpoints have their own per-minute caps==, with Embed text sharing the same limits for trial and production, while others like Rerank and Tokenize get much higher production limits than trial. docs.cohere
Any endpoint not explicitly listed falls under a default limit of 500 requests per minute for both trial and production keys, and rate-limit increases require contacting Cohere support. docs.cohere

Options:

Web: https://dashboard.cohere.com/playground/chat
API: https://dashboard.cohere.com/api-keys

API Docs: https://docs.cohere.com/reference/about
API Support: LiteLLM, Opencode.


06. GitHub Models API


==Current: Low: 15 RPM + 150 RPD | High: 10 RPM + 50 RPD | 8K IN, 4k OUT TPR==


Tier / Model group
Metric
Copilot Free


Low
Requests per minute
15


Low
Requests per day
150


Low
Tokens per request
8000 in, 4000 out


Low
Concurrent requests
5


High
Requests per minute
10


High
Requests per day
50


High
Tokens per request
8000 in, 4000 out


High
Concurrent requests
2


Summary of rate limits and limitations


Free playground and API usage are limited by requests per minute, requests per day, tokens per request, and concurrent requests, with different tiers for low, high, and embedding models. docs.github
Special model groups (Azure OpenAI families, DeepSeek, xAI Grok) have stricter RPM/RPD and concurrency but can offer larger output token caps at higher tiers. docs.github

Options:

Web: https://github.com/marketplace/models
API: https://github.com/settings/personal-access-tokens

API Docs: https://docs.github.com/en/github-models


07. Mistral API


==Current: 1 RPS, 500k TP-Min, 1KKK TP-Month==

Free plan – feature table


Feature
Free plan value
Notes


Price
€0 / $0
Personal use for life and work.


Flash answers
Up to 150 per day
Quick “flash” responses.


Web searches
Base quota
Paid plans allow “Up to 5x Free”.


Think mode
Base quota
Paid plans: “Up to 30x Free”.


Deep research (preview)
Base quota
Paid plans: “Up to 5x Free”.


Memories
500
Saved and recallable user memories.


Libraries / storage
Limited
Higher tiers raise to 15–30 GB.


Document uploads
Base quota
Paid: “Up to 20x Free”.


Image generation
Base quota
Paid: “Up to 40x Free”.


Code interpreter
Base quota
Paid: “Up to 5x Free”.


Projects
Unlimited
Can group chats into projects.


Connectors directory
Full access
Access to connectors directory.


Custom MCP connectors
Not included
Marked as “Custom” only on higher tiers.


Voice / canvas / agents
Limited or not listed
Described as “Custom” mainly for paid tiers.


Customer support
Help center only
No chat/email support on Free.


Rate limits and other limitations (Free only)


The Free plan gives access to Mistral’s SOTA models in Le Chat but applies hard caps on usage-based features such as ==flash answers (150/day)==, web searches, extended “think” reasoning, deep research, document uploads, image generation, and code interpreter runs, all at a baseline level that paid plans multiply (for example “Up to 40x Free” for images).
You can store up to 500 memories, have unlimited projects, and full access to the connectors directory, but storage (libraries) is limited and advanced enterprise features like domain verification, audit logs, SAML SSO, and white-labeling are not available. mistral

Options:

Web: https://console.mistral.ai/build/playground
API: https://admin.mistral.ai/organization/api-keys

API Docs: https://docs.mistral.ai/api
API Support: LiteLLM, Opencode, Kilocode.


CLI: https://console.mistral.ai/codestral/cli
Code: https://console.mistral.ai/codestral
Android: https://play.google.com/store/apps/details?id=ai.mistral.chat
IOS: https://apps.apple.com/us/app/le-chat-by-mistral-ai/id6740410176


08. Cerebras API


==Current: 30 RPM + 900 RPH + 14k RPD + 1000K TPD==

Free tier rate‑limit table (Markdown)


Model
TPM
TPH
TPD
RPM
RPH
RPD


gpt-oss-120b
64,000
1,000,000
1,000,000
30
900
14,400


llama3.1-8b
64,000
1,000,000
1,000,000
30
900
14,400


qwen-3-235b-a22b-instruct-2507
64,000
1,000,000
1,000,000
30
900
14,400


zai-glm-4.7
64,000
1,000,000
1,000,000
10
100
100


Due to high demand on zai-glm-4.7 and qwen-3-235b-a22b-instruct-2507, we’ve temporarily reduced free-tier rate limits. We’re working to bring it back as quickly as possible and appreciate your understanding.


Summary of free‑tier rate limits and other limitations


Cerebras measures limits on both requests (RPM, RPH, RPD) and tokens (TPM, TPH, TPD), and whichever limit you hit first will throttle further calls. inference-docs.cerebras


Free‑tier org limits for most models are ==30 RPM, 900 RPH, 14.4K RPD and 60K TPM, 1M TPH, 1M TPD==, while the heavier zai-glm-4.7 model is restricted to ==10 RPM, 100 RPH, and 100 RPD== with the same token caps. inference-docs.cerebras


Every response includes rate‑limit headers (limit, remaining, and reset for requests/day and tokens/minute); exceeding limits returns HTTP 429, and increasing quotas requires contacting Cerebras. inference-docs.cerebras


TPM (tokens/min); TPH (tokens/hour); TPD (tokens/day)


RPM (requests/min); RPH (requests/hour); RPD (requests/day)


Options:

Web: https://cloud.cerebras.ai/platform
API: https://cloud.cerebras.ai/platform/

API Docs: https://inference-docs.cerebras.ai/api-reference/chat-completions
API Support: LiteLLM, Opencode.


09. Cloudflare API


==Current: 1K RPM; 100K RPD; Free Workers: 10k Neuros/day==


Feature
Workers Free
Workers Paid


Request
100,000 requests/day  
1000 requests/min
No limit


Worker memory
128 MB
128 MB


CPU time
10 ms
5 min HTTP request  
15 min Cron Trigger


Duration
No limit
No limit for Workers.  
15 min duration limit for
Cron Triggers,
Durable Object Alarms and 
Queue Consumers


Cloudflare Workers Free gives you limited daily traffic and tighter per-request resources compared to paid plans. developers.cloudflare

You can handle up to ==100,000 requests per day and 1,000 requests per minute per account==, after which routes either bypass the Worker or return a 1027 error depending on your fail-open/closed setting. developers.cloudflare
Each Worker invocation can make 50 subrequests, open 6 simultaneous outgoing connections, and use up to 128 MB memory with a 1-second startup limit and a 3 MB compressed bundle size. developers.cloudflare
You can have 100 Workers per account, each with up to 64 environment variables of 5 KB each, and serve static assets with up to 20,000 files per Worker version and 25 MiB per file. developers.cloudflare
Cache API usage is limited to 50 cache operations (put/match/delete) per request, and the maximum HTTP request body size is 100 MB on the Free Cloudflare plan. developers.cloudflare
Options:
API: https://dash.cloudflare.com/profile/api-tokens

API Docs: https://developers.cloudflare.com/workers-ai/get-started/workers-wrangler/
API Support: LiteLLM, Opencode.


10. Huggingface Inference API


==Current: $0.10 / month usage==

Free plan rate‑limit table (Markdown)


Plan
API requests / 5 min
Resolver requests / 5 min
Pages requests / 5 min


Free user
1,000 *
5,000 *
200 *


Anonymous and Free user limits may change over time depending on platform health. huggingface


Summary – rate limits and other limitations (Free only)


Hugging Face defines separate buckets for Hub API, Resolvers (file downloads via /resolve/), and Pages, each with its own quota over a 5‑minute fixed window. huggingface
A Free user can make about 1,000 API calls, 5,000 resolver calls, and 200 page views per 5‑minute window, but these values are explicitly marked as subject to change for Free and anonymous users. huggingface
When you hit a limit, the Hub returns HTTP 429 with standardized RateLimit and RateLimit-Policy headers that describe remaining quota and reset time, and the recommended mitigation is to authenticate with HF_TOKEN, spread requests over time, or prefer Resolver endpoints where possible. huggingface

Options:

Web: https://huggingface.co/playground
API: https://huggingface.co/settings/tokens

API Docs: https://huggingface.co/docs/inference-providers/index
API Support: LiteLLM, Opencode.


11. Nvidia NIM API


==Current: UNLIMITED: 40 RPM==

NVIDIA does not actually publish concrete free‑tier model limits for NIM in this thread, so there is no detailed table beyond the one mentioned value. forums.developer.nvidia
Minimal “table” from the discussion


Aspect
Trial / free experience detail


Published per‑model limits
Not published


Example mentioned by the user
40 requests per minute (from NVIDIA API catalog trial)


Token / context window limits
Not disclosed


Summary – rate limits and other limitations (trial / free)


An NVIDIA staff moderator confirms that they do not publish specific limits for each NIM model, even for the trial / catalog experience. forums.developer.nvidia
The thread references a 40 requests per minute cap in the catalog trial, but NVIDIA does not confirm or extend this with token or context‑window numbers. forums.developer.nvidia

Options:

Web: https://build.nvidia.com/models
API: https://build.nvidia.com/settings/api-keys

API Docs: https://docs.api.nvidia.com/nim/reference/llm-apis
API Support: LiteLLM, Opencode.


12. Vercel API


==Current: $5/- Per Month==

Vercel AI Gateway’s free tier is credit-based rather than having explicit RPM/TPM quotas. vercel


Plan
Monthly credit
When free ends


AI Gateway
$5
When you purchase any AI Credits


Summary – rate limits and other limitations (free only)


Every team gets $5 of AI Gateway Credits every 30 days after making the first AI Gateway request; these credits can be used on any provider/model in the catalog at standard, no‑markup rates. vercel

Options:

Web: https://ai-sdk.dev/playground
API: https://vercel.com/.../~/ai-gateway/api-keys

API Docs: https://vercel.com/docs/ai-gateway
API Support: LiteLLM.


13. Iflow API


==Current: UNLIMITED: 1 RPS — till 429 error==


Limit type
Value / behavior


Concurrent requests
1 request per user at a time (global concurrency limit)


Over-limit behavior
Additional requests return HTTP 429


Pricing
Free to use


Streaming requests
Tokens released immediately after active cancellation


Non-streaming
Model continues running; 
tokens released only after completion, even if canceled


Summary – rate limits and limitations (free only)


The platform allows only one active request per user at any time; any extra concurrent request receives a 429 error. platform.iflow
Usage is currently completely free, but users are asked to avoid unnecessary high concurrency to protect shared resources. platform.iflow
The list of Models: https://platform.iflow.cn/en/models

Options:

Web: https://iflow.cn/
API: https://iflow.cn/?open=setting

API Docs: https://platform.iflow.cn/docs


CLI: https://cli.iflow.cn/


14. Perplexity API


==Current: $0/- Per Month==

Perplexity Pro affects billing/credits, not API rate limits directly. perplexity


Plan
Monthly API credits
Credit refresh timing
Notes


Perplexity Pro
$0
1st day of each month
Auto-applied


Summary – limits and other conditions (Perplexity Pro)


Pro subscribers automatically receive $0 in API credits on the first day of each month, and new subscribers see the credit appear within about 10–20 minutes after subscribing.

Options:

Web: https://www.perplexity.ai/
Web: https://www.perplexity.ai/account/api/playground/search
Browser: https://www.perplexity.ai/comet
API: https://www.perplexity.ai/account/api/keys

API Docs: https://docs.perplexity.ai/docs/agent-api/quickstart
API Support: LiteLLM, Opencode.


15. AI ML API


==Current: 10 RPH, 50K Credits-Per-Day, $0.025/- Per Day==


Plan type
Requests
Models available
Credit / cost limits
Notable exclusions


Free Unverified
10 requests per hour
Gemma 3 4b, Gemma 3 12b, Gemma 3n 4b
No explicit daily credit cap mentioned
All other models


Free Verified
10 req/hour on FREE models; 10 req/day on cheap paid models
Any model costing < 50,000 credits per request
50,000 credits per day; model cost ≤ $0.025 or 50,000 credits
Audio, video, and most image models (typically > 50,000 credits)


Summary of rate limits and other limitations


On Free Unverified, new users get 10 requests per hour and can only use three Gemma 3 family models (4b, 12b, 3n‑4b), with no broader model access. help.aimlapi
On Free Verified, after adding a payment method, you can use any model that costs under 50,000 credits per request, but you are limited to 10 requests per hour on FREE‑label models, 10 requests per day on low‑cost paid models, and a total of 50,000 credits per day. help.aimlapi
Even when verified, audio, video, and most image models are not free, because a single call to those models usually exceeds the 50,000‑credit threshold. help.aimlapi

Options:

Web: https://aimlapi.com/app
API: https://aimlapi.com/app/keys

API Docs: https://docs.aimlapi.com/
API Docs: https://docs.aimlapi.com/integrations/our-integration-list (Integrations)


16. Opencode Zen API


==Current: UNLIMITED: 1 RPS - Free 2 Models==

Zen does not have a classic “free tier” with rate limits; instead, some models are priced at $0 per 1M tokens, and the platform lets you cap spend with monthly limits. opencode


Model
Input price / 1M tokens
Output price / 1M tokens
Cached read / 1M tokens
Cached write / 1M tokens
Notes


Big Pickle
Free
Free
Free
–
Free during limited beta


GPT 5 Nano
Free
Free
Free
–
Always-free listed model


MiMo V2 Flash Free
Free
Free
Free
–
Free during limited beta


Nemotron 3 Super Free
Free
Free
Free
–
Free during limited beta


MiniMax M2.5 Free
Free
Free
Free
–
Free during limited beta


Summary – limits and other conditions for “free” usage


Zen is pay‑as‑you‑go; there is no fixed free allowance, but models like Big Pickle and GPT 5 Nano are currently priced at $0 for input, output, and cached reads, so they effectively function as free to use while available. opencode

Options:

API: https://opencode.ai/workspace/

API Docs: https://opencode.ai/docs/zen/
API Support: Opencode.


17. Pollination AI API


==Current: UNLIMITED: 1 RP15S - Free 2 Models==

Pollinations.ai provides a free, open-source API alternative compatible with OpenAI-style endpoints for text, image, and audio generation. pollinations
No signup or API keys are needed for basic use, prioritizing privacy with zero data storage. github
Key Features


Supports models like GPT-5, Claude, Gemini for text; Flux, GPT Image for images; and OpenAI-audio for speech. pollinations-ai
Fully open-source on GitHub, used in 500+ community projects including bots and apps.
Earn daily "Pollen Credits" via contributions for advanced access; basic is unlimited and free. pollinations

Usage Examples

Access via simple URLs:

Images: https://pollinations.ai/p/your prompt here github
Text: https://text.pollinations.ai/your prompt github
Audio: Append ?model=openai-audio&voice=nova to text URL. github

Integrate in code, like Python requests for images or React hooks for apps. github
OpenAI Compatibility

Offers OpenAI-compatible text models listable via API; proxy-like access to premium models without direct OpenAI keys. enter.pollinations
Options:

Web: https://pollinations.ai/play
Web: https://github.com/pollinations/pollinations


18. Long Cat


==Current: UNLIMITED: 1 RPS — till 429 error==

LongCat is a free LLM API platform compatible with OpenAI and Anthropic formats, offering daily token quotas, several in‑house models, and simple HTTP/SDK integration, with token-output limits but no explicit global QPS limits documented. longcat
Overall Summary


Supported models include ==LongCat-Flash-Chat, LongCat-Flash-Thinking, LongCat-Flash-Thinking-2601, and LongCat-Flash-Lite==, all available via both API formats, with different performance profiles (general chat, deep thinking, MoE, upgraded deep thinking). longcat


Every account gets free daily token quotas: ==500,000 tokens/day== for the Flash-Chat / Flash-Thinking / Flash-Thinking-2601 models, and ==50,000,000 tokens/day== for LongCat-Flash-Lite; quotas reset at Beijing midnight and do not roll over. longcat


You can apply via the Usage page to raise the free quota for the first three models up to ==5,000,000 tokens/day== (Flash-Lite is excluded), or email the team with an API key for more quota; both input and output tokens count, and streaming vs non-streaming consume quota equally. longcat


==Usage can be checked in real time== on the Usage page, and at the moment the platform is in public beta and does not support paid quota purchases. longcat


If you exceed platform rate limits, the ==API returns HTTP 429== with an error payload containing code: "rate_limit_exceeded", type: "rate_limit_error", and a retry_after field (in seconds, example value 60) telling you when to retry. longcat


Options:

Web: https://longcat.chat/
API: https://longcat.chat/platform/api_keys

API Docs: https://longcat.chat/platform/docs/APIDocs.html


19. Google Gemini CLI


==60 RPM + 1k RPD; 120 RPM + 1.5k RPD==

For the paid Gemini Code Assist subscriptions (what they effectively treat as the “Gemini Pro” style plans), the CLI lists higher fixed quotas. geminicli


Plan
Requests / user / day
Requests / user / minute
Models used


Code Assist Standard edition
1500
120
Gemini family (auto-chosen)


Summary – Gemini “Pro”/paid limits and behavior


Upgrading from the free Google-login tier to Gemini Code Assist Standard raises your quota to 1500 model requests per user per day with 120 requests per user per minute, spread across the Gemini family as selected by the CLI. geminicli
The Enterprise edition increases only the daily cap (to 2000 requests per user per day) while keeping the same 120 requests per minute limit. geminicli
These paid tiers give fixed-price, predictable quotas; if you still need more or want to avoid hitting daily caps, you must switch to pay‑as‑you‑go via a Gemini API key or Vertex AI, where limits are governed by API pricing tiers and Vertex quotas rather than these fixed per-user request counts. geminicli


20. Iflow CLI


==Current: 1 RPS — Till 429 Error Threshold==

The page documents configuration knobs, not hosted rate limits; the only “limits” you can set or hit are things like session turns and token budgets. platform.iflow


Setting / limit
Default value
What it does


Current RPS guidance
~1 RPS until 429
Recommended request rate


maxSessionTurns
-1
Unlimited turns per chat session


tokensLimit
128000
Maximum context window length


compressionTokenThreshold
0.8
Auto-compress


Summary for free usage


iFlow CLI itself does not impose a strict free-tier quota; instead, it recommends keeping around 1 request per second to avoid 429 errors from whichever backend you configure. platform.iflow


21. Kilo Code CLI


==Current: Varies by provider: ~60 RPM + ~2,000 RPD on some free plans==

Kilo Code itself does not impose numeric rate limits; it relies on free quotas from external providers and on models that are priced at $0. kilo


Category
Model / Provider
Cost in Kilo Code


OpenRouter free models
Qwen3 Coder (free)
Free via OpenRouter


OpenRouter free models
Z.AI GLM‑4.5 Air (free)
Free via OpenRouter


OpenRouter free models
DeepSeek R1 0528 (free)
Free via OpenRouter


OpenRouter free models
MoonshotAI Kimi K2 (free)
Free via OpenRouter


22. Mistral Vide CLI


==Current: 1 RPS + 500K TPM - 1B TPM Monthly==

Mistral’s docs describe how limits work but do not publish concrete free‑tier numbers on this page. docs.mistral


Aspect
Free API tier behavior


Availability
Yes, a dedicated free API tier


Purpose
Trying and exploring the API, not production workloads


Limit types
Requests per second (RPS), tokens per minute/month


Scope
Limits applied at workspace level


Configuration/visibility
Exact limits shown only in AI Studio “limits” page per workspace


Upgrades
Higher tiers provide higher limits; contact support to increase


23. Opencode CLI


==Current: UNLIMITED: 1 RPS - Free 2 Models==

Zen does not have a classic “free tier” with rate limits; instead, some models are priced at $0 per 1M tokens, and the platform lets you cap spend with monthly limits. opencode


Model
Input price / 1M tokens
Output price / 1M tokens
Cached read / 1M tokens
Cached write / 1M tokens
Notes


Big Pickle
Free
Free
Free
–
Free during limited beta


GPT 5 Nano
Free
Free
Free
–
Always-free listed model


Summary – limits and other conditions for “free” usage


Zen is pay‑as‑you‑go; there is no fixed free allowance, but models like Big Pickle and GPT 5 Nano are currently priced at $0 for input, output, and cached reads, so they effectively function as free to use while available. opencode


24. Qwen CLI


==Current: 60 RPM + 2000 RPD - No Token Limits==


Free option
Requests per day
Requests per minute
Token limits


Qwen OAuth
2,000
60
No token counting


Summary of rate limits and other limitations


With Qwen OAuth, individual users get 2,000 requests per day and 60 requests per minute, advertised as completely free with no explicit token limit tracking on your side. npmjs
Regional free tiers add more options: ModelScope offers 2,000 free API calls per day for Mainland China, and OpenRouter offers up to 1,000 free calls per day internationally, each subject to their own backend quotas and terms. npmjs
Qwen Code warns that a single workflow cycle may trigger multiple API calls, so heavy commands can consume your free request budget faster than expected even though there is no per-call token cap. npmjs


25. GitHub Copilot CLI


==Current: 50 request per month.==


Plan
Monthly price
Premium requests (chat, agents, reviews, CLI)
Code completions
Models access


Free
$0
50 per month
2,000 per month
Haiku 4.5, GPT‑4.1, GPT‑5 mini, and more


Summary – rate limits and other limitations (Free only)


The Free Copilot plan includes 50 premium requests per month, which are consumed by ==chat, agent mode, code review, coding agents, and Copilot CLI features==; once used up, those premium features stop working until the next month. github
You also get 2,000 code completions per month in supported editors; after hitting that cap, inline suggestions are disabled until your quota resets. github
Free users still have access to multiple modern models (such as Claude Haiku 4.5, GPT‑4.1, GPT‑5 mini) but see rate limiting and slower response times during high usage, and they cannot buy extra premium requests or access enterprise governance features. github


Models in CLI


Gemini CLI

Gemini 3 - gemini-3.1-pro-preview, gemini-3-flash-preview
Gemini 2.5 - gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite


OpenAI Codex

gpt-5.4
gpt-5.3-codex
gpt-5.2-codex
gpt-5.2
gpt-5.1-codex-max
gpt-5.1-codex-mini


IFlow CLI

GLM-4.7
iFlow-ROME-30BA3B
DeepSeek-V3.2
GLM-5
Qwen3-Coder-Plus
Kimi-K2-Thinking
MiniMax-M2.5
Kimi-K2.5
Kimi-K2-0905


Kilocode CLI

Kilo Auto Free
CoreThink
Giga Potato
Giga Potato Thinking
MiniMax M2.5
Trinity Large Preview
Stepfun Step 3.5 Flash
Nemotron 3 Super
Groq Code Fast 1 Optimied


Mistral Vibe CLI

Devstral 2
Devstral Small
Local


OpenCode Zen CLI

Big Pickle
GPT 5 Nano
MiniMax M2.5 Free
Nemotron 3 Super Free
MiMo V2 Flash Free
Trinity Large Preview Free


Qwen CLI

Latest Qwen Coder model - Qwen 3.5 Plus


GitHub Copilot CLI

Claude Haiku 4.5
GPT-5 mini
GPT-4.1


Cline CLI

arcee-ai/trinity-large-preview:free
openrouter/free


Updated on February 6th, 2026 16:55:44


Opencode CLI - AI Providers Connect

OpenCode connects to over 75 LLM providers via the AI SDK and Models.dev, including OpenAI, Anthropic, ==Google==, Together AI, ==Mistral AI==, ==Groq==, Meta, ==Cohere==, AWS Bedrock, Azure OpenAI, DeepSeek, ==Perplexity API==, ==Hugging Face==, Replicate, Fireworks AI, Lepton AI, Novita AI, AI21 Labs, Aleph Alpha, Bamboo AI, ==Cerebras==, ==Cloudflare Workers AI==, EleutherAI, Falcon, GLM (Zhipu), Google Vertex AI, xAI (Grok), IBM watsonx, Inflection, LightOn, Moonshot AI, ==NVIDIA Nemo==, ==Ollama==, ==OpenRouter==, PaddlePaddle, ==Qwen (Alibaba)==, Stability AI, Yi, and many additional OpenAI-compatible endpoints accessible through custom configurations or OpenCode Zen. opencode
Kilocode CLI - AI Providers Connect

Kilo Code supports over 30 AI providers, including cloud options like Anthropic, OpenAI, ==Google Gemini==, DeepSeek, ==Mistral==, and Zhipu AI; local and self-hosted setups such as ==Ollama==, LM Studio, and any OpenAI-compatible endpoints; AI gateways and routers including ==OpenRouter==, Glama, Requesty, and its own Kilo Gateway; plus enterprise choices like AWS Bedrock and Google Vertex AI.  kilo
LLMLite

LiteLLM (often called llmlite in shorthand contexts) is a Python library and proxy server that unifies access to 100+ LLM providers through a single OpenAI-compatible format, proxying calls to endpoints like Bedrock, Azure OpenAI, Vertex AI, ==Google AI Studio==, Anthropic, AWS SageMaker, ==Cohere==, ==Hugging Face==, Replicate, ==Groq==, ==Mistral AI==, ==Cloudflare Workers AI==, DeepInfra, ==Perplexity AI==, DeepSeek, Anyscale, IBM watsonx.ai, Voyage AI, FriendliAI, Galadriel, ==OpenRouter==, ==Ollama==, vLLM, Together AI, Fireworks AI, Novita AI, ==Cerebras==, Baseten, AI21, Aleph Alpha, NLP Cloud, Petals, Dashscope (Qwen), Sarvam.ai, OVHCloud AI Endpoints, ==CometAPI==, CompactifAI, DataRobot, ElevenLabs, Fal AI, Featherless AI, GradientAI, Helicone, Hyperbolic, Infinity, Jina AI, Lambda AI, Llamafile, LM Studio, LlamaGate, Manus, Meta Llama, Milvus, MiniMax, Moonshot AI, Morph, Nebius AI Studio, Novita AI, Nscale, ==NVIDIA NIM==, Oracle OCI, Poe, Predibase, PublicAI, Recraft, RunwayML, SambaNova, SAP Generative AI Hub, Scaleway, Stability AI, Synthetic, Snowflake, Topaz, Triton Inference Server, ==Vercel AI Gateway==, Volcano Engine, Weights & Biases Inference, Xiaomi MiMo, Xinference, Zhipu AI (Z.AI), and many more via custom or OpenAI-compatible setups. docs.litellm
Model	Category	RPM	TPM	RPD
Gemini 2.5 Flash	Text-out models	5	250K	20
Gemini 2.5 Flash Lite	Text-out models	10	250K	20
Gemini 2.5 Flash Native Audio Dialog	Live API	Unlimited	1M	Unlimited
Gemini 2.5 Flash TTS	Multi-modal models	3	10K	10
Gemini 3 Flash	Text-out models	5	250K	20
Gemini 3.1 Flash Lite	Text-out models	15	250K	500
Gemini Embedding 1	Other models	100	30K	1K
Gemini Embedding 2	Other models	100	30K	1K
Gemini Robotics ER 1.5 Preview	Other models	10	250K	20
Gemma 3 12B	Other models	30	15K	14.4K
Gemma 3 1B	Other models	30	15K	14.4K
Gemma 3 27B	Other models	30	15K	14.4K
Gemma 3 2B	Other models	30	15K	14.4K
Gemma 3 4B	Other models	30	15K	14.4K
MODEL ID	RPM	RPD	TPM	TPD	ASH	ASD
allam-2-7b	30	7K	6K	500K	-	-
canopylabs/orpheus-arabic-saudi	10	100	1.2K	3.6K	-	-
canopylabs/orpheus-v1-english	10	100	1.2K	3.6K	-	-
groq/compound	30	250	70K	-	-	-
groq/compound-mini	30	250	70K	-	-	-
llama-3.1-8b-instant	30	14.4K	6K	500K	-	-
llama-3.3-70b-versatile	30	1K	12K	100K	-	-
meta-llama/llama-4-scout-17b-16e-instruct	30	1K	30K	500K	-	-
meta-llama/llama-prompt-guard-2-22m	30	14.4K	15K	500K	-	-
meta-llama/llama-prompt-guard-2-86m	30	14.4K	15K	500K	-	-
moonshotai/kimi-k2-instruct	60	1K	10K	300K	-	-
moonshotai/kimi-k2-instruct-0905	60	1K	10K	300K	-	-
openai/gpt-oss-120b	30	1K	8K	200K	-	-
openai/gpt-oss-20b	30	1K	8K	200K	-	-
openai/gpt-oss-safeguard-20b	30	1K	8K	200K	-	-
qwen/qwen3-32b	60	1K	6K	500K	-	-
whisper-large-v3	20	2K	-	-	7.2K	28.8K
whisper-large-v3-turbo	20	2K	-	-	7.2K	28.8K
Header	Value	Notes
retry-after	2	In seconds
x-ratelimit-limit-requests	14400	Always refers to Requests Per Day (RPD)
x-ratelimit-limit-tokens	18000	Always refers to Tokens Per Minute (TPM)
x-ratelimit-remaining-requests	14370	Always refers to Requests Per Day (RPD)
x-ratelimit-remaining-tokens	17997	Always refers to Tokens Per Minute (TPM)
x-ratelimit-reset-requests	2m59.56s	Always refers to Requests Per Day (RPD)
x-ratelimit-reset-tokens	7.66s	Always refers to Tokens Per Minute (TPM)
Model	Trial rate limit	Production rate limit
Command A Reasoning	20 req / min	Contact sales
Command A Translate	20 req / min	Contact sales
Command A Vision	20 req / min	Contact sales
Command A	20 req / min	500 req / min
Command R+	20 req / min	500 req / min
Command R	20 req / min	500 req / min
Command R7B	20 req / min	500 req / min
Endpoint	Trial rate limit	Production rate limit
Embed	2,000 inputs / min	2,000 inputs / min
Embed (Images)	5 inputs / min	400 inputs / min
Rerank	10 req / min	1,000 req / min
Tokenize	100 req / min	2,000 req / min
EmbedJob	5 req / min	50 req / min
Default (other)	500 req / min	500 req / min
Tier / Model group	Metric	Copilot Free
Low	Requests per minute	15
Low	Requests per day	150
Low	Tokens per request	8000 in, 4000 out
Low	Concurrent requests	5
High	Requests per minute	10
High	Requests per day	50
High	Tokens per request	8000 in, 4000 out
High	Concurrent requests	2
Feature	Free plan value	Notes
Price	€0 / $0	Personal use for life and work.
Flash answers	Up to 150 per day	Quick “flash” responses.
Web searches	Base quota	Paid plans allow “Up to 5x Free”.
Think mode	Base quota	Paid plans: “Up to 30x Free”.
Deep research (preview)	Base quota	Paid plans: “Up to 5x Free”.
Memories	500	Saved and recallable user memories.
Libraries / storage	Limited	Higher tiers raise to 15–30 GB.
Document uploads	Base quota	Paid: “Up to 20x Free”.
Image generation	Base quota	Paid: “Up to 40x Free”.
Code interpreter	Base quota	Paid: “Up to 5x Free”.
Projects	Unlimited	Can group chats into projects.
Connectors directory	Full access	Access to connectors directory.
Custom MCP connectors	Not included	Marked as “Custom” only on higher tiers.
Voice / canvas / agents	Limited or not listed	Described as “Custom” mainly for paid tiers.
Customer support	Help center only	No chat/email support on Free.
Model	TPM	TPH	TPD	RPM	RPH	RPD
`gpt-oss-120b`	64,000	1,000,000	1,000,000	30	900	14,400
`llama3.1-8b`	64,000	1,000,000	1,000,000	30	900	14,400
`qwen-3-235b-a22b-instruct-2507`	64,000	1,000,000	1,000,000	30	900	14,400
`zai-glm-4.7`	64,000	1,000,000	1,000,000	10	100	100
Feature	Workers Free	Workers Paid
Request	100,000 requests/day 1000 requests/min	No limit
Worker memory	128 MB	128 MB
CPU time	10 ms	5 min HTTP request 15 min Cron Trigger
Duration	No limit	No limit for Workers. 15 min duration limit for Cron Triggers, Durable Object Alarms and Queue Consumers
Aspect	Trial / free experience detail
Published per‑model limits	Not published
Example mentioned by the user	40 requests per minute (from NVIDIA API catalog trial)
Token / context window limits	Not disclosed