Build a local RAG chatbot with Ollama, Qwen, BGE-M3 and Postgres PGVector

0 views

Built by

Wassim Abid

Created on July 29, 2026

Description

Build a fully local RAG chatbot using Ollama that works without tool calling — ideal for smaller open-source models like Qwen that don't support native function calls. This template lets you run a private, self-hosted AI assistant with retrieval-augmented generation using only your own hardware.

How it works

A Webhook receives the user's chat message
A small classifier LLM (Qwen 7B) analyzes the input and decides: is this small talk, or a real question that needs the knowledge base?
For small talk, a dedicated AI agent responds conversationally with chat memory
For real questions, the classifier generates focused sub-queries, which are sent through a loop-based RAG pipeline:
Each sub-query is embedded using BGE-M3 and matched against a Postgres PGVector store
Results are filtered by a relevance score threshold (>0.4)
Chunks are aggregated and deduplicated across all sub-queries
An Answer Generator agent (Qwen 14B) produces a sourced answer using a strict 3-step format: short answer → sources → follow-up question
Both paths use Postgres-backed chat memory for multi-turn conversations
A post-processing step removes <think> tags that some reasoning models produce

Set up steps

Install Ollama and pull the required models:
ollama pull qwen2.5:7b (classifier + small talk)
ollama pull qwen3:14b (answer generation)
ollama pull bge-m3 (embeddings)
Set up PostgreSQL with the pgvector extension enabled
Create your vector store — ingest your documents into the PGVector store using BGE-M3 embeddings (you can use n8n's built-in document loaders for this)
Configure credentials in n8n:
Ollama connection (default: http://localhost:11434)
PostgreSQL connection for both chat memory and vector store
Customize the webhook path and connect it to your frontend or API client
Optional: Adjust the relevance score threshold, swap models for larger/smaller ones, or modify the system prompts to match your use case

Nodes Used (7)

AI Agent

@n8n/n8n-nodes-langchain.agent

Basic LLM Chain

@n8n/n8n-nodes-langchain.chainLlm

Code

n8n-nodes-base.code

Embeddings Ollama

@n8n/n8n-nodes-langchain.embeddingsOllama

Ollama Chat Model

@n8n/n8n-nodes-langchain.lmChatOllama

Postgres Chat Memory

@n8n/n8n-nodes-langchain.memoryPostgresChat

Postgres PGVector Store

@n8n/n8n-nodes-langchain.vectorStorePGVector

Build a local RAG chatbot with Ollama, Qwen, BGE-M3 and Postgres PGVector

Description

Nodes Used (7)

Select Nodes to Filter