Key takeaways
- LiteLLM is an open-source library and proxy server that gives you a single OpenAI-compatible API endpoint for 100-plus LLM providers, including OpenAI, Anthropic, Google Vertex AI, AWS Bedrock, Azure, Ollama, Cohere, and Hugging Face
- There are two main modes: the Python SDK (a drop-in replacement for the OpenAI client in your code) and the Proxy Server (a self-hosted gateway that any OpenAI-compatible tool can point at without code changes)
- Install with pip install 'litellm[proxy]' or run the official Docker image; Docker Compose with PostgreSQL is the production-recommended path
- A single config.yaml file controls your model list, routing strategy, fallback chain, and spend limits
- Load balancing is built in: you can use round-robin, least-busy, usage-based, or latency-based routing across multiple API keys or providers
- Fallbacks are declarative: if the primary model fails, LiteLLM automatically retries the next provider in the list with no changes to your application code
- Virtual keys (requires PostgreSQL) let you issue separate keys per team or user with per-key budget caps and automatic spend tracking in USD
- Current stable release is v1.90.0 (June 2026); the project ships weekly minor releases
What is LiteLLM and why would I run it?
LiteLLM solves the problem of N different provider SDKs by giving you one consistent API surface regardless of which LLM you use. Without it, switching from OpenAI to Anthropic means rewriting your API calls, error handling, and streaming logic. With LiteLLM, you change one string (the model name) and everything else stays the same.
The project has two distinct use cases.
Python SDK. You add from litellm import completion to your application and call any provider the same way you would call the OpenAI client. This is the lightweight path: no server process, no network hop, just a library that translates between provider APIs at the function-call level.
Proxy Server (the gateway path). LiteLLM runs a local HTTP server on port 4000 that speaks the OpenAI API spec. Point any tool, any framework, or any user's OPENAI_BASE_URL at http://localhost:4000 and it routes requests to whichever backend you configured. This is the more powerful use case because it centralizes routing, logging, and spend tracking outside your application code. Cursor, Open WebUI, LangChain, LlamaIndex, and anything else that supports an OpenAI endpoint work without modification.
The proxy is also the right choice when you are managing LLM access for a team. You issue virtual keys, set per-key budgets, and see exactly what each user or service is spending.
How do I install LiteLLM?
For a quick local test, pip install is the fastest path. For production use with virtual keys and spend tracking, Docker Compose with PostgreSQL is the recommended approach.
Option 1: pip (fastest for local testing)
pip install 'litellm[proxy]'
Start the proxy with a model wired up:
litellm --model ollama/llama3 --port 4000
Or point it at a config file:
litellm --config config.yaml --port 4000
Option 2: uv (recommended for isolated installs)
uv tool install 'litellm[proxy]'
litellm --config config.yaml
Option 3: Docker Compose (production)
Docker Compose is the production-recommended path because it bundles the proxy with a PostgreSQL database, which is required for virtual keys and spend tracking.
# Pull the official docker-compose.yml
curl -O https://raw.githubusercontent.com/BerriAI/litellm/main/docker-compose.yml
# Create your environment file
echo 'LITELLM_MASTER_KEY="sk-1234"' > .env
echo 'LITELLM_SALT_KEY="sk-your-random-salt"' >> .env
# Start everything
docker compose up -d
The proxy will be available at http://localhost:4000. The admin UI is at http://localhost:4000/ui.
The LITELLM_SALT_KEY encrypts stored API credentials and cannot be changed after your first write to the database, so set it before you do anything else.
How do I configure LiteLLM with a config file?
Create a config.yaml in your working directory. This file is the single source of truth for which models are available, how they map to providers, and what your routing and budget rules are.
model_list:
# OpenAI
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY
# Anthropic
- model_name: claude-sonnet
litellm_params:
model: anthropic/claude-sonnet-4-5
api_key: os.environ/ANTHROPIC_API_KEY
# Azure OpenAI
- model_name: azure-gpt-4o
litellm_params:
model: azure/my-gpt4o-deployment
api_base: os.environ/AZURE_API_BASE
api_key: os.environ/AZURE_API_KEY
api_version: "2025-01-01-preview"
# Ollama (local)
- model_name: llama3-local
litellm_params:
model: ollama/llama3
api_base: http://localhost:11434
# AWS Bedrock
- model_name: bedrock-claude
litellm_params:
model: bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0
aws_region_name: us-east-1
litellm_settings:
num_retries: 3
request_timeout: 30
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
The os.environ/ syntax tells LiteLLM to read those values from environment variables at startup, so you never have to put raw keys inside the YAML file. Set the matching variables in your shell or .env file before starting the proxy.
Start the proxy pointing at this file:
litellm --config config.yaml
How do I use LiteLLM as an OpenAI-compatible proxy?
LiteLLM's proxy speaks the OpenAI API spec exactly, so any client library that supports a custom base URL will work with zero code changes. Just set the base URL to http://localhost:4000 and use your LiteLLM master key as the API key.
Using the OpenAI Python SDK:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:4000",
api_key="sk-1234", # Your LiteLLM master key (or a virtual key)
)
response = client.chat.completions.create(
model="claude-sonnet", # Matches the model_name in your config.yaml
messages=[{"role": "user", "content": "Explain load balancing in one sentence."}],
)
print(response.choices[0].message.content)
Using curl:
curl http://localhost:4000/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer sk-1234" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello from the proxy!"}]
}'
The model name in the request body must match a model_name field in your config.yaml. The proxy translates that to the actual provider and model behind the scenes.
Using the LiteLLM Python SDK directly (no proxy):
from litellm import completion
import os
os.environ["ANTHROPIC_API_KEY"] = "your-key"
response = completion(
model="anthropic/claude-sonnet-4-5",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
How do I add load balancing and fallbacks?
Load balancing and fallbacks are configured in config.yaml and require no application code changes. LiteLLM handles them entirely in the routing layer.
Load balancing across multiple deployments:
List the same model_name more than once with different backends. LiteLLM distributes traffic across them using whatever routing_strategy you set.
model_list:
# Two OpenAI keys for the same model
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY_1
rpm: 500
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
api_key: os.environ/OPENAI_API_KEY_2
rpm: 500
# Fallback target: Anthropic
- model_name: claude-fallback
litellm_params:
model: anthropic/claude-sonnet-4-5
api_key: os.environ/ANTHROPIC_API_KEY
router_settings:
routing_strategy: least-busy
num_retries: 2
timeout: 30
litellm_settings:
fallbacks:
- {"gpt-4o": ["claude-fallback"]}
context_window_fallbacks:
- {"gpt-4o": ["claude-fallback"]}
fallbacks fires when the primary returns an error (rate limit, server error, auth failure). context_window_fallbacks fires specifically when the request exceeds the primary model's context window.
The four routing strategies:
- simple-shuffle: weighted random pick based on rpm/tpm limits (the default)
- least-busy: routes to the deployment with the fewest in-flight requests
- usage-based-routing: tracks live TPM and RPM against configured limits
- latency-based-routing: prefers the deployment with the lowest recent latency
How do I manage virtual keys and track spending?
Virtual keys let you issue separate credentials per user, team, or service, each with its own budget cap. The proxy logs every request and accumulates cost in USD automatically.
Virtual keys require a PostgreSQL database. The Docker Compose setup includes one. If you are running the pip install path, add database_url to your general_settings:
general_settings:
master_key: os.environ/LITELLM_MASTER_KEY
database_url: "postgresql://user:password@localhost:5432/litellm"
Generate a virtual key with a budget and model restriction:
curl http://localhost:4000/key/generate \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"models": ["gpt-4o", "claude-sonnet"],
"max_budget": 10.00,
"duration": "30d",
"metadata": {"team": "backend-eng"}
}'
Once the key has accumulated $10 in spend, the proxy rejects further requests with a 429 until the duration window resets. The admin UI at http://localhost:4000/ui gives you a dashboard view of spend per key, per user, and per team without needing to hit the API directly.
Troubleshooting
-
Provider authentication fails immediately after startup What you see:
AuthenticationErroror HTTP 401 from the provider on the first request. What it is: The provider API key environment variable is not set, is set to the wrong name, or was not exported before starting the proxy. The fix: Checkecho $OPENAI_API_KEY(or the relevant variable) in the same shell session where you runlitellm. If the variable is empty, set it and restart. Confirm the variable name matches what is in yourconfig.yamlafter theos.environ/prefix. -
Clients get 401 when hitting the proxy itself What you see: Requests from your application get
Invalid proxy server token. What it is: The client is sending the raw provider key (for example,sk-ant-...) as the Bearer token, but the proxy expects its own master key. The fix: SetAuthorization: Bearer <your-LITELLM_MASTER_KEY>in all client requests, not the upstream provider key. The proxy holds the provider keys internally and injects them when forwarding requests. -
Fallbacks never trigger even when the primary is failing What you see: The proxy returns the primary provider's error directly without trying the fallback. What it is: The
model_namein yourfallbacksconfig does not exactly match themodel_namevalue the client sent in the request. The fix: Open the proxy logs (--detailed_debugflag) and confirm the model name in the incoming request matches the key in yourfallbacksmapping. Both the primary key and the fallback value must matchmodel_nameentries in yourmodel_list. -
Gemini requests fail with credential errors What you see: LiteLLM attempts to authenticate with GCP service account credentials when you are using Google AI Studio (not Vertex AI). What it is: Without the
gemini/prefix, LiteLLM routes the request to Vertex AI and looks for GCP credentials it does not have. The fix: Usemodel: gemini/gemini-2.0-flash(AI Studio, usesGEMINI_API_KEY) rather thanmodel: vertex_ai/gemini-2.0-flash(Vertex, uses GCP service account). These are two different providers in the LiteLLM routing table. -
Virtual key requests fail with "failed-to-connect-to-db" What you see: Any request using a virtual key returns an auth error mentioning the database. What it is: The proxy cannot reach PostgreSQL, so it cannot verify the key. Virtual key validation requires a live database connection; if the connection is down, all virtual key requests fail. The fix: Confirm your
DATABASE_URLis correct and the database is reachable from the proxy container. Rundocker compose logs dbto check if PostgreSQL started cleanly. Verify the connection string format:postgresql://user:password@host:port/dbname.
Frequently asked questions
Is LiteLLM free and open source? Yes. The core library and proxy server are MIT-licensed and available at github.com/BerriAI/litellm. There is a commercial Enterprise tier that adds RBAC, SSO, and some advanced guardrail features, but everything covered in this guide is in the free open-source version.
Does LiteLLM support Ollama and local models?
Yes. Set the provider prefix to ollama/ and the api_base to your Ollama server address (default http://localhost:11434). The model name after the slash must match an Ollama model you have already pulled. You can run Ollama behind the LiteLLM proxy and give multiple clients access to your local models through a single endpoint with the same virtual key controls you use for cloud providers.
What is the difference between the LiteLLM proxy and the LiteLLM Python SDK? The Python SDK is a library you import into your application. It translates provider APIs at the function-call level and is useful when you control the codebase and just want a unified interface. The proxy is a standalone HTTP server. It is useful when you want to centralize access across many tools and teams, or when you cannot modify the client applications (for example, connecting Cursor or Open WebUI to a non-OpenAI model). You can run both: use the SDK in your own code and the proxy for external tools.
Does LiteLLM work with LangChain, LlamaIndex, and similar frameworks?
Yes. Any framework that accepts a custom openai_api_base or base_url parameter works without modification. Set the base URL to your proxy address and the API key to your master key or a virtual key.
What happens to in-flight requests if the proxy restarts?
In-flight requests are dropped. The proxy is stateless at the request level (state lives in PostgreSQL). For production, run at least two proxy replicas behind a load balancer so a rolling restart does not cause downtime. The official Docker Compose file is single-replica by default; add replicas via docker compose scale litellm=2 or use Kubernetes with multiple pods.
Resources
- BerriAI/litellm on GitHub: full source code, releases, and issue tracker
- Official LiteLLM documentation: proxy config reference, virtual keys, routing strategies, and provider setup
- LiteLLM Docker deployment guide: production deployment with Docker Compose and PostgreSQL
- Ollama setup guide: use local models through LiteLLM
Last updated: June 27, 2026.