Implement on-prem RAG with Qdrant and Ollama for a self-hosted KB

Go to Workflow
0 views
Built by Mabura Ze Guru Mabura Ze Guru
Created on June 08, 2026

Description


Try It
This n8n template provides a self hosted RAG implementation.

How it works
Provides one workflow to maintain the knowledge base and another one to query the knowledge base.
Uploaded documents are saved into the Qdrant vector store.
When a query is made, the most relevant documents are retrieved from the vector store and sent to the LLM as context for generating a response.

How to use
Start the workflow by clicking Execute workflow
Use the file upload form to upload a document into the knowledge base (Qdrant db).
Click Open chat to start asking questions related to the uploaded documents.

Setup steps
Below steps show how to setup on Amazon Linux. Consult your OS for respective steps

Install Ollama on prem
mkdir ollama
cd ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama --version
Install required models ( in Amazon Linux)

ollama pull llama3:8b
ollama pull mistral:7b
ollama pull nomic-embed-text:latest
Access ollama via http://localhost:11434
Fire up Qdrant (e.g. via docker)
docker run -p 6333:6333 qdrant/qdrant.
Access Qdrant via http://localhost:6333/dashboard
Create a Qdrant collection named knowledge-base configured with vector length of 768.
NB: Do not forget a persistent docker volume for Qdrant if you want to keep the data when using docker.
Point the nodes to the respective on premise Qdrant and Ollama runtimes.

Need Help?
Join the Discord or ask in the Forum!

Happy RAGing!

Nodes Used (6)

AI Agent
@n8n/n8n-nodes-langchain.agent
Default Data Loader
@n8n/n8n-nodes-langchain.documentDefaultDataLoader
Embeddings Ollama
@n8n/n8n-nodes-langchain.embeddingsOllama
Ollama Chat Model
@n8n/n8n-nodes-langchain.lmChatOllama
Qdrant Vector Store
@n8n/n8n-nodes-langchain.vectorStoreQdrant
Simple Memory
@n8n/n8n-nodes-langchain.memoryBufferWindow