Is LiteLLM part of SUSE AI, and does it provide the OpenAI-compatible proxy there?

Yes. SUSE AI 1.0 ships LiteLLM as an installable application in its AI Library, alongside Ollama, Open WebUI, vLLM, Milvus, MLflow and others, installed as a Helm chart from SUSE's OCI registry with a custom override file. Inside the platform it plays the same role described in this guide: a single OpenAI-compatible endpoint in front of the models the cluster serves. Because it is the same upstream LiteLLM, everything on this page (config.yaml model_list, master key, virtual keys, ollama/ routing) applies; the SUSE-specific pieces are the registry authentication, imagePullSecrets, and Kubernetes service addressing.

LiteLLM Proxy: One OpenAI-Compatible API for 100-Plus LLM Providers

Key takeaways

LiteLLM is an open-source library and proxy server that gives you a single OpenAI-compatible API endpoint for 100-plus LLM providers, including OpenAI, Anthropic, Google Vertex AI, AWS Bedrock, Azure, Ollama, Cohere, and Hugging Face
There are two main modes: the Python SDK (a drop-in replacement for the OpenAI client in your code) and the Proxy Server (a self-hosted gateway that any OpenAI-compatible tool can point at without code changes)
Install with pip install 'litellm[proxy]' or run the official Docker image; Docker Compose with PostgreSQL is the production-recommended path
A single config.yaml file controls your model list, routing strategy, fallback chain, and spend limits
Load balancing is built in: you can use round-robin, least-busy, usage-based, or latency-based routing across multiple API keys or providers
Fallbacks are declarative: if the primary model fails, LiteLLM automatically retries the next provider in the list with no changes to your application code
Virtual keys (requires PostgreSQL) let you issue separate keys per team or user with per-key budget caps and automatic spend tracking in USD
Current stable release is v1.90.0 (June 2026); the project ships weekly minor releases

What is LiteLLM and why would I run it?

LiteLLM solves the problem of N different provider SDKs by giving you one consistent API surface regardless of which LLM you use. Without it, switching from OpenAI to Anthropic means rewriting your API calls, error handling, and streaming logic. With LiteLLM, you change one string (the model name) and everything else stays the same.

The project has two distinct use cases.

Python SDK. You add from litellm import completion to your application and call any provider the same way you would call the OpenAI client. This is the lightweight path: no server process, no network hop, just a library that translates between provider APIs at the function-call level.

Proxy Server (the gateway path). LiteLLM runs a local HTTP server on port 4000 that speaks the OpenAI API spec. Point any tool, any framework, or any user's OPENAI_BASE_URL at http://localhost:4000 and it routes requests to whichever backend you configured. This is the more powerful use case because it centralizes routing, logging, and spend tracking outside your application code. Cursor, Open WebUI, LangChain, LlamaIndex, and anything else that supports an OpenAI endpoint work without modification.

The proxy is also the right choice when you are managing LLM access for a team. You issue virtual keys, set per-key budgets, and see exactly what each user or service is spending.

How do I install LiteLLM?

For a quick local test, pip install is the fastest path. For production use with virtual keys and spend tracking, Docker Compose with PostgreSQL is the recommended approach.

Option 1: pip (fastest for local testing)

pip install 'litellm[proxy]'

Start the proxy with a model wired up:

litellm --model ollama/llama3 --port 4000

Or point it at a config file:

litellm --config config.yaml --port 4000

Option 2: uv (recommended for isolated installs)

uv tool install 'litellm[proxy]'
litellm --config config.yaml

Option 3: Docker Compose (production)

Docker Compose is the production-recommended path because it bundles the proxy with a PostgreSQL database, which is required for virtual keys and spend tracking.

# Pull the official docker-compose.yml
curl -O https://raw.githubusercontent.com/BerriAI/litellm/main/docker-compose.yml

# Create your environment file
echo 'LITELLM_MASTER_KEY="sk-1234"' > .env
echo 'LITELLM_SALT_KEY="sk-your-random-salt"' >> .env

# Start everything
docker compose up -d

The proxy will be available at http://localhost:4000. The admin UI is at http://localhost:4000/ui.

The LITELLM_SALT_KEY encrypts stored API credentials and cannot be changed after your first write to the database, so set it before you do anything else.

How do I configure LiteLLM with a config file?

Create a config.yaml in your working directory. This file is the single source of truth for which models are available, how they map to providers, and what your routing and budget rules are.

model_list:
  # OpenAI
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY

  # Anthropic
  - model_name: claude-sonnet
    litellm_params:
      model: anthropic/claude-sonnet-4-5
      api_key: os.environ/ANTHROPIC_API_KEY

  # Azure OpenAI
  - model_name: azure-gpt-4o
    litellm_params:
      model: azure/my-gpt4o-deployment
      api_base: os.environ/AZURE_API_BASE
      api_key: os.environ/AZURE_API_KEY
      api_version: "2025-01-01-preview"

  # Ollama (local)
  - model_name: llama3-local
    litellm_params:
      model: ollama/llama3
      api_base: http://localhost:11434

  # AWS Bedrock
  - model_name: bedrock-claude
    litellm_params:
      model: bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0
      aws_region_name: us-east-1

litellm_settings:
  num_retries: 3
  request_timeout: 30

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY

The os.environ/ syntax tells LiteLLM to read those values from environment variables at startup, so you never have to put raw keys inside the YAML file. Set the matching variables in your shell or .env file before starting the proxy.

Start the proxy pointing at this file:

litellm --config config.yaml

How do I use LiteLLM as an OpenAI-compatible proxy?

LiteLLM's proxy speaks the OpenAI API spec exactly, so any client library that supports a custom base URL will work with zero code changes. Just set the base URL to http://localhost:4000 and use your LiteLLM master key as the API key.

Using the OpenAI Python SDK:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:4000",
    api_key="sk-1234",  # Your LiteLLM master key (or a virtual key)
)

response = client.chat.completions.create(
    model="claude-sonnet",  # Matches the model_name in your config.yaml
    messages=[{"role": "user", "content": "Explain load balancing in one sentence."}],
)

print(response.choices[0].message.content)

Using curl:

curl http://localhost:4000/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-1234" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello from the proxy!"}]
  }'

The model name in the request body must match a model_name field in your config.yaml. The proxy translates that to the actual provider and model behind the scenes.

Using the LiteLLM Python SDK directly (no proxy):

from litellm import completion
import os

os.environ["ANTHROPIC_API_KEY"] = "your-key"

response = completion(
    model="anthropic/claude-sonnet-4-5",
    messages=[{"role": "user", "content": "Hello!"}],
)

print(response.choices[0].message.content)

How do I add load balancing and fallbacks?

Load balancing and fallbacks are configured in config.yaml and require no application code changes. LiteLLM handles them entirely in the routing layer.

Load balancing across multiple deployments:

List the same model_name more than once with different backends. LiteLLM distributes traffic across them using whatever routing_strategy you set.

model_list:
  # Two OpenAI keys for the same model
  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY_1
      rpm: 500

  - model_name: gpt-4o
    litellm_params:
      model: openai/gpt-4o
      api_key: os.environ/OPENAI_API_KEY_2
      rpm: 500

  # Fallback target: Anthropic
  - model_name: claude-fallback
    litellm_params:
      model: anthropic/claude-sonnet-4-5
      api_key: os.environ/ANTHROPIC_API_KEY

router_settings:
  routing_strategy: least-busy
  num_retries: 2
  timeout: 30

litellm_settings:
  fallbacks:
    - {"gpt-4o": ["claude-fallback"]}
  context_window_fallbacks:
    - {"gpt-4o": ["claude-fallback"]}

fallbacks fires when the primary returns an error (rate limit, server error, auth failure). context_window_fallbacks fires specifically when the request exceeds the primary model's context window.

The four routing strategies: - simple-shuffle: weighted random pick based on rpm/tpm limits (the default) - least-busy: routes to the deployment with the fewest in-flight requests - usage-based-routing: tracks live TPM and RPM against configured limits - latency-based-routing: prefers the deployment with the lowest recent latency

How do I manage virtual keys and track spending?

Virtual keys let you issue separate credentials per user, team, or service, each with its own budget cap. The proxy logs every request and accumulates cost in USD automatically.

Virtual keys require a PostgreSQL database. The Docker Compose setup includes one. If you are running the pip install path, add database_url to your general_settings:

general_settings:
  master_key: os.environ/LITELLM_MASTER_KEY
  database_url: "postgresql://user:password@localhost:5432/litellm"

Generate a virtual key with a budget and model restriction:

curl http://localhost:4000/key/generate \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "models": ["gpt-4o", "claude-sonnet"],
    "max_budget": 10.00,
    "duration": "30d",
    "metadata": {"team": "backend-eng"}
  }'

Once the key has accumulated $10 in spend, the proxy rejects further requests with a 429 until the duration window resets. The admin UI at http://localhost:4000/ui gives you a dashboard view of spend per key, per user, and per team without needing to hit the API directly.

Troubleshooting

Provider authentication fails immediately after startup What you see: AuthenticationError or HTTP 401 from the provider on the first request. What it is: The provider API key environment variable is not set, is set to the wrong name, or was not exported before starting the proxy. The fix: Check echo $OPENAI_API_KEY (or the relevant variable) in the same shell session where you run litellm. If the variable is empty, set it and restart. Confirm the variable name matches what is in your config.yaml after the os.environ/ prefix.
Clients get 401 when hitting the proxy itself What you see: Requests from your application get Invalid proxy server token. What it is: The client is sending the raw provider key (for example, sk-ant-...) as the Bearer token, but the proxy expects its own master key. The fix: Set Authorization: Bearer <your-LITELLM_MASTER_KEY> in all client requests, not the upstream provider key. The proxy holds the provider keys internally and injects them when forwarding requests.
Fallbacks never trigger even when the primary is failing What you see: The proxy returns the primary provider's error directly without trying the fallback. What it is: The model_name in your fallbacks config does not exactly match the model_name value the client sent in the request. The fix: Open the proxy logs (--detailed_debug flag) and confirm the model name in the incoming request matches the key in your fallbacks mapping. Both the primary key and the fallback value must match model_name entries in your model_list.
Gemini requests fail with credential errors What you see: LiteLLM attempts to authenticate with GCP service account credentials when you are using Google AI Studio (not Vertex AI). What it is: Without the gemini/ prefix, LiteLLM routes the request to Vertex AI and looks for GCP credentials it does not have. The fix: Use model: gemini/gemini-2.0-flash (AI Studio, uses GEMINI_API_KEY) rather than model: vertex_ai/gemini-2.0-flash (Vertex, uses GCP service account). These are two different providers in the LiteLLM routing table.
Virtual key requests fail with "failed-to-connect-to-db" What you see: Any request using a virtual key returns an auth error mentioning the database. What it is: The proxy cannot reach PostgreSQL, so it cannot verify the key. Virtual key validation requires a live database connection; if the connection is down, all virtual key requests fail. The fix: Confirm your DATABASE_URL is correct and the database is reachable from the proxy container. Run docker compose logs db to check if PostgreSQL started cleanly. Verify the connection string format: postgresql://user:password@host:port/dbname.

LiteLLM inside SUSE AI

LiteLLM is not only a self-hosted tool; it is also a first-class component of SUSE AI 1.0, SUSE's Kubernetes-based enterprise AI platform. The SUSE AI "AI Library" ships LiteLLM as an installable application alongside Ollama, Open WebUI, vLLM, Milvus, OpenSearch, Qdrant, PyTorch, MLflow, and Kubeflow. Its job there is exactly the role this guide describes: the OpenAI-compatible gateway that fronts whatever models the cluster serves, so platform users and applications talk to one governed endpoint instead of to individual model servers.

The SUSE packaging differs from the Docker Compose setup above in delivery, not in substance:

Helm chart from SUSE's OCI registry: AI Library applications install from oci://dp.apps.rancher.io/charts/ (the SUSE Application Collection), authenticated with your SUSE registration. Configuration happens through a Helm override YAML rather than a mounted config.yaml, and SUSE's docs recommend helm show values against the chart to see the supported options for the packaged version.
Bundled dependencies: the chart pulls in its PostgreSQL and Redis dependencies as subcharts from the same registry, so virtual keys and spend tracking work out of the box without wiring an external database.
Same engine underneath: it is upstream LiteLLM. The model_list semantics, master key behaviour, virtual keys, provider prefixes, and fallback routing covered on this page all apply unchanged.

Common issues in the SUSE AI deployment

Because it is the same proxy, most failures are the general ones in the troubleshooting list above, wearing Kubernetes clothes. The ones specific to this packaging:

ImagePullBackOff on install. The SUSE registries are authenticated; every AI Library install requires registry secrets in the namespace and a matching global.imagePullSecrets entry in the override file. A missing or misnamed secret is the most common first-install failure, and it shows up as pods stuck pulling images, not as a LiteLLM error.
Ollama unreachable: localhost means the wrong thing. The api_base for an ollama/ model must point at the in-cluster service DNS name (for example http://ollama.<namespace>.svc.cluster.local:11434), not http://localhost:11434. Inside the LiteLLM pod, localhost is the LiteLLM pod. This is the Kubernetes version of the most common LiteLLM-plus-Ollama mistake.
Virtual keys stop working after a pod reschedule. If the bundled PostgreSQL runs without persistent storage configured, its data lives in the pod. Configure persistence (storage class and size) in the override before creating virtual keys you care about, and back it up like any other database, because the master key and spend history live there.
Upstream docs describe features the packaged version doesn't have yet. The chart pins a specific LiteLLM version, which can trail the upstream releases that docs.litellm.ai describes. When an option from the upstream docs is rejected, check the chart's app version and its helm show values output before assuming a config error.
Master key hygiene. In a cluster, the master key belongs in a Kubernetes Secret referenced from the override, not as a plain value in an override file that gets committed to Git. Same rule as the Docker deployment, higher blast radius.

Frequently asked questions

Is LiteLLM free and open source? Yes. The core library and proxy server are MIT-licensed and available at github.com/BerriAI/litellm. There is a commercial Enterprise tier that adds RBAC, SSO, and some advanced guardrail features, but everything covered in this guide is in the free open-source version.

Does LiteLLM support Ollama and local models? Yes. Set the provider prefix to ollama/ and the api_base to your Ollama server address (default http://localhost:11434). The model name after the slash must match an Ollama model you have already pulled. You can run Ollama behind the LiteLLM proxy and give multiple clients access to your local models through a single endpoint with the same virtual key controls you use for cloud providers.

What is the difference between the LiteLLM proxy and the LiteLLM Python SDK? The Python SDK is a library you import into your application. It translates provider APIs at the function-call level and is useful when you control the codebase and just want a unified interface. The proxy is a standalone HTTP server. It is useful when you want to centralize access across many tools and teams, or when you cannot modify the client applications (for example, connecting Cursor or Open WebUI to a non-OpenAI model). You can run both: use the SDK in your own code and the proxy for external tools.

Does LiteLLM work with LangChain, LlamaIndex, and similar frameworks? Yes. Any framework that accepts a custom openai_api_base or base_url parameter works without modification. Set the base URL to your proxy address and the API key to your master key or a virtual key.

Is LiteLLM part of SUSE AI? Yes. SUSE AI 1.0 ships LiteLLM as an installable application in its AI Library, alongside Ollama, Open WebUI, vLLM, Milvus, MLflow, and others, delivered as a Helm chart from SUSE's OCI registry and configured through an override file. It plays the same role there that this guide describes: the single OpenAI-compatible endpoint in front of the cluster's models. Everything on this page applies to the SUSE deployment because it is the same upstream LiteLLM; the SUSE-specific parts are registry authentication, imagePullSecrets, in-cluster service addressing, and the bundled PostgreSQL and Redis subcharts. See the section above for the packaging details and its common failure points.

What happens to in-flight requests if the proxy restarts? In-flight requests are dropped. The proxy is stateless at the request level (state lives in PostgreSQL). For production, run at least two proxy replicas behind a load balancer so a rolling restart does not cause downtime. The official Docker Compose file is single-replica by default; add replicas via docker compose scale litellm=2 or use Kubernetes with multiple pods.

Resources

BerriAI/litellm on GitHub: full source code, releases, and issue tracker
Official LiteLLM documentation: proxy config reference, virtual keys, routing strategies, and provider setup
LiteLLM Docker deployment guide: production deployment with Docker Compose and PostgreSQL
Ollama setup guide: use local models through LiteLLM

Last updated: June 27, 2026.