AnythingLLM: Self-Hosted RAG and Chat for Local and Cloud LLMs

Key takeaways

AnythingLLM (version 1.15.0 as of this writing) is a free, MIT-licensed, self-hosted platform for chatting with documents and running AI agents using any LLM provider
Docker is the recommended install method for servers and teams; the desktop app is a one-click option for single users on a personal machine
Multi-user mode requires Docker. The desktop app is single-user only. Docker adds three access roles: Admin, Manager, and Default
Workspaces are isolated environments, each with its own document library, system prompt, and optional LLM override. Documents pinned to one workspace are invisible to all others
AnythingLLM supports 30+ LLM providers: Ollama, LM Studio, Local AI, OpenAI, Anthropic, Gemini, Groq, Mistral, AWS Bedrock, and more
An embedding model must be configured before document chat works. If you skip this, uploads appear to succeed but the LLM has no documents to search
When running in Docker on Linux, you cannot reach Ollama at localhost. Use 172.17.0.1 (or host.docker.internal on Mac/Windows) as the Ollama base URL instead
Minimum requirements for Docker: 2 GB RAM and 10 GB of free disk space, with more needed for large document libraries or local model inference

How do I install AnythingLLM?

AnythingLLM offers three install paths: a desktop app, a Docker container, and a managed cloud service.

Desktop app (single user, no Docker needed)

Download the macOS, Windows, or Linux installer from useanything.com. The app bundles a built-in LLM engine, a CPU-based embedder, and a LanceDB vector store. Nothing else is required. The trade-off is that the desktop app supports only one user account and does not include embeddable chat widgets, user management, or white-labeling.

Docker (recommended for servers and teams)

Docker is the right choice for always-on server deployments, multi-user setups, or when you want to expose a chat interface to a team. You need Docker installed and at least 2 GB of free RAM.

On Linux or macOS, create a storage folder and an empty .env file, then start the container:

export STORAGE_LOCATION=$HOME/anythingllm && \
mkdir -p $STORAGE_LOCATION && \
touch "$STORAGE_LOCATION/.env" && \
docker run -d -p 3001:3001 \
  --cap-add SYS_ADMIN \
  -v ${STORAGE_LOCATION}:/app/server/storage \
  -v ${STORAGE_LOCATION}/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  mintplexlabs/anythingllm

On Windows (PowerShell):

$env:STORAGE_LOCATION="$HOME\Documents\anythingllm"
If(!(Test-Path $env:STORAGE_LOCATION)) {New-Item $env:STORAGE_LOCATION -ItemType Directory}
If(!(Test-Path "$env:STORAGE_LOCATION\.env")) {New-Item "$env:STORAGE_LOCATION\.env" -ItemType File}
docker run -d -p 3001:3001 `
  --cap-add SYS_ADMIN `
  -v "$env:STORAGE_LOCATION:/app/server/storage" `
  -v "$env:STORAGE_LOCATION\.env:/app/server/.env" `
  -e STORAGE_DIR="/app/server/storage" `
  mintplexlabs/anythingllm

Open http://localhost:3001 once the container is running. The volume mount (-v) is not optional. Without it, all your workspaces, documents, and settings are wiped every time the container restarts.

For a longer-running deployment, Docker Compose is easier to manage:

version: '3.8'
services:
  anythingllm:
    image: mintplexlabs/anythingllm
    container_name: anythingllm
    ports:
      - "3001:3001"
    cap_add:
      - SYS_ADMIN
    environment:
      - STORAGE_DIR=/app/server/storage
      - JWT_SECRET=replace-this-with-a-long-random-string
    volumes:
      - anythingllm_storage:/app/server/storage
    restart: always

volumes:
  anythingllm_storage:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /path/to/your/data/folder

Start it with docker compose up -d. Logs are available with docker compose logs -f anythingllm.

How do I connect AnythingLLM to a local model?

Go to Settings, then LLM Preference, and select your local provider from the list. Enter the base URL and AnythingLLM fetches available models automatically.

For Ollama running on the same machine as the desktop app, the default URL http://127.0.0.1:11434 works without changes. For Docker, localhost and 127.0.0.1 resolve inside the container, not to your host machine, so they will never find Ollama. Use http://host.docker.internal:11434 on Mac or Windows. On Linux, use http://172.17.0.1:11434 and add --add-host=host.docker.internal:host-gateway to your docker run command:

docker run -d -p 3001:3001 \
  --cap-add SYS_ADMIN \
  --add-host=host.docker.internal:host-gateway \
  -v ${STORAGE_LOCATION}:/app/server/storage \
  -v ${STORAGE_LOCATION}/.env:/app/server/.env \
  -e STORAGE_DIR="/app/server/storage" \
  mintplexlabs/anythingllm

Before you change any AnythingLLM settings, verify Ollama is running by opening http://127.0.0.1:11434 in your browser on the host. You should see plain text that says "Ollama is running." If that page does not load, the problem is with Ollama, not with AnythingLLM.

Beyond Ollama, AnythingLLM supports over 30 providers. Local options include LM Studio, Local AI, and KoboldCPP. Cloud options include OpenAI, Anthropic, Google Gemini, Groq, Mistral, Cohere, Perplexity AI, Together AI, OpenRouter, Azure OpenAI, Hugging Face, and AWS Bedrock. You can also override the system-level LLM on a per-workspace basis, so a fast small model handles general chat while a larger model is reserved for deep document analysis in a separate workspace.

What is a workspace in AnythingLLM?

A workspace is an isolated chat environment with its own document library, system prompt, conversation history, and optional LLM override. Documents pinned to one workspace are not visible in any other workspace, and each workspace can be configured independently.

A practical example: create one workspace for your product documentation, one for internal HR policies, and one for personal research notes. A team member who has access to the product workspace cannot see or search the HR documents unless you explicitly grant them access (in multi-user mode).

Create a workspace from the left sidebar by clicking "Add workspace." After creating it, open the workspace settings to: - Write a custom system prompt that gives the workspace a specific role or persona - Select a different LLM provider or model for that workspace only - Adjust how aggressively AnythingLLM uses document context versus the model's general knowledge

To add documents, open the workspace and click the document manager icon. Upload files from your computer or import from a URL, GitHub repository, or YouTube transcript. Uploaded files go to a staging area first. To make them searchable, move them from the left panel (uploaded) to the right panel (pinned to workspace). Only pinned documents are searched when you chat.

How does RAG work in AnythingLLM?

RAG (Retrieval-Augmented Generation) lets the LLM answer questions using your uploaded documents rather than relying only on its training data. When you upload a file, AnythingLLM breaks it into smaller chunks, runs each chunk through an embedding model to produce a vector, and stores those vectors in a database. When you send a chat message, your message is also converted to a vector, the most similar document chunks are retrieved, and those chunks are injected into the prompt alongside your question before it goes to the LLM.

The embedding model is separate from the LLM and must be configured on its own. Go to Settings, then Embedding Preference. For quick local use, the built-in AnythingLLM embedder requires no additional setup. For better accuracy on larger document sets, use Ollama with a dedicated embedding model such as nomic-embed-text:

ollama pull nomic-embed-text

Then set the Ollama base URL in Embedding Preference to match your Ollama address (same networking rules as for the LLM apply). Cloud embedding options include OpenAI, Azure OpenAI, Cohere, and Voyage AI.

Supported document types include PDF, DOCX, TXT, CSV, Markdown, and HTML files. When the LLM answers using a document, AnythingLLM shows a citation so you can trace the claim back to the original passage. If the LLM is not citing your documents at all, the most common cause is that the documents are uploaded but not pinned to the workspace.

How do I enable multi-user mode?

Multi-user mode is enabled in Settings under Security. It is available in the Docker version only. Toggle on "Multi-User Mode" and set an admin password when prompted.

Three roles are available. Admin has full access to all settings, workspaces, users, and configuration. Manager can create workspaces and invite users but cannot change system-level settings such as the LLM provider or API keys. Default users can only access the workspaces they have been assigned to and cannot change any settings.

To add a user, go to Settings, then Users, and click Invite User. Assign the user to one or more workspaces at creation time. Users log in at the same URL as the app with their own credentials.

If you need the server to be accessible to remote users, place AnythingLLM behind a reverse proxy with HTTPS. A minimal Caddy configuration:

anythingllm.yourdomain.com {
  reverse_proxy localhost:3001
}

Caddy handles TLS certificates automatically via Let's Encrypt. Do not expose port 3001 directly to the internet without a TLS layer in front of it.

What can I do with AI agents in AnythingLLM?

Agents let the LLM use tools to complete tasks rather than just returning text. To start an agent session, type @agent at the beginning of your message in any workspace. AnythingLLM checks whether your chosen LLM supports tool use and activates agent mode if it does.

Built-in agent skills include web browsing and scraping, searching your pinned documents, listing and summarizing documents in the workspace, reading and writing files, generating charts, and querying SQL databases. Integration skills for Gmail, Google Calendar, and Outlook are also included.

Beyond built-in skills, you can extend agents in two ways. Agent Flows (a visual workflow builder in the interface) lets you chain tools together into a repeatable sequence without writing code. You can also connect tools via MCP (Model Context Protocol), the same standard used by Claude Desktop and other AI apps. Any MCP server you already have running can be connected to AnythingLLM agents.

Common problems and fixes

Ollama shows "loading available models" forever What you see: The model dropdown in LLM Preference spins without showing any models, or shows "--loading available models--" with nothing below it. What it is: The Ollama URL points to the Docker container's own loopback address rather than the host machine. localhost and 127.0.0.1 inside the container refer to the container, not to your computer. The fix: Change the URL to http://host.docker.internal:11434 on Mac or Windows, or http://172.17.0.1:11434 on Linux. Before changing anything in AnythingLLM, open http://127.0.0.1:11434 in your browser on the host and confirm you see "Ollama is running." If that page does not load, fix Ollama first.
Documents upload with a green checkmark but the LLM ignores them What you see: Files upload without errors and appear in the document manager, but when you chat, the LLM answers from its training data and never cites anything from your files. What it is: Either no embedding model is configured (so the documents were stored but never indexed), or the document was left in the upload staging area and not pinned to the workspace. The fix: Go to Settings, then Embedding Preference, and choose a provider. Then open the workspace, click the document manager, and drag the document from the left column (uploaded) to the right column (pinned to this workspace). Only documents in the right column are searched during chat.
Uploading a large PDF causes the job to freeze or the container to restart What you see: A large PDF (100+ pages) begins processing, shows a progress indicator for several minutes, then freezes or disappears. The container may restart silently. What it is: The embedding process holds the full document in memory during chunking. Files with hundreds of pages can exceed the memory available to the container. The fix: Split the PDF into segments of 50 pages or fewer before uploading. If you need to handle large files regularly, increase the container's memory limit by adding --memory=4g to your docker run command. Also confirm you have at least 10 GB of free disk space, since the vector store grows with each embedded document.
Inference is very slow even with a capable GPU What you see: Every response from the LLM takes 30 seconds or more, even for short questions, despite the machine having a modern GPU. What it is: AnythingLLM does not run inference itself. The bottleneck is in Ollama (or whichever backend you are using). Ollama falls back to CPU if it cannot access the GPU, which happens when CUDA or ROCm drivers are not found or when Ollama was started without the right permissions. The fix: On the host machine, run ollama run <model-name> and watch the startup output. If it says "using CPU," the GPU is not being used. Review the Ollama GPU setup documentation for your operating system. Also check that the model fits in VRAM: a 7B parameter model typically needs around 5-6 GB of VRAM to run entirely in GPU memory.
Permission denied errors on the storage volume on Linux What you see: The container starts but the logs show EACCES: permission denied when AnythingLLM tries to write to the storage directory. What it is: The AnythingLLM container process runs as UID 1000 by default. If the host storage directory was created by root or another user, the container cannot write to it. The fix: On your host machine, run chown -R 1000:1000 ~/anythingllm (replace the path with your actual STORAGE_LOCATION). Then restart the container. If you need to run the container as a different UID, add -e UID=$(id -u) -e GID=$(id -g) to your docker run command.

Frequently asked questions

Is AnythingLLM free and open source? Yes. AnythingLLM is MIT-licensed and the complete source code is available at github.com/Mintplex-Labs/anything-llm. The self-hosted version (Docker or desktop) has no usage caps, no paid tier, and no feature restrictions. A managed cloud version is available at cloud.useanything.com if you prefer not to run your own server, but self-hosting is fully free.

Does AnythingLLM support multiple users? Yes, in the Docker version. Enable multi-user mode in Settings under Security, then create accounts with Admin, Manager, or Default roles. The desktop app is single-user only and does not support additional accounts.

What LLMs does AnythingLLM support? More than 30 providers. Local options include Ollama, LM Studio, Local AI, KoboldCPP, and the built-in model that ships with the desktop app. Cloud options include OpenAI, Anthropic, Google Gemini, Groq, Mistral, Cohere, Perplexity AI, Together AI, OpenRouter, Azure OpenAI, Hugging Face, and AWS Bedrock. You can assign different providers at the system level, the workspace level, and the agent level.

What vector databases does AnythingLLM support? LanceDB is the default and runs embedded with no extra configuration. You can switch to Chroma, Milvus, Pinecone, Qdrant, Weaviate, or PostgreSQL with pgvector for larger deployments. Change the provider in Settings under Vector Database.

Can I embed a chat widget on my website? Yes, in the Docker version with multi-user mode enabled. Each workspace has an Embed option in its settings that generates an iframe snippet or a JavaScript script tag. The widget connects back to your AnythingLLM server, so the server must be reachable from wherever the page is hosted.

Resources

AnythingLLM on GitHub: full source code, issue tracker, changelogs, and Docker setup docs from Mintplex Labs
Official AnythingLLM documentation: install guides, LLM and embedder configuration, agent skills, and API reference
Ollama setup guide: how to run local models for AnythingLLM on your own hardware
Dify guide: alternative self-hosted AI platform with a visual workflow builder

Last updated: June 27, 2026.