Route AI queries cost‑efficiently with GPT‑4o‑mini, GPT‑4o and confidence scoring

Go to Workflow
0 views
Built by ResilNext ResilNext
Created on June 05, 2026

Description



This workflow implements a cost-optimized AI routing system using n8n. It intelligently decides whether a request should be handled by a low-cost model or escalated to a higher-quality model based on response confidence.

The goal is to minimize LLM usage costs while maintaining high answer quality.

A query is first processed by a cheaper model. The response is then evaluated by a confidence-scoring AI agent. If the response quality is insufficient, the workflow automatically escalates the request to a more capable model.

This approach is useful for building scalable AI systems where most queries can be answered cheaply, while complex queries still receive high-quality responses.

How It Works

Webhook Trigger
Receives a user query from an external application.

Workflow Configuration
Defines parameters such as:
confidence threshold
cheap model cost
expensive model cost

Cheap Model Response
The query is first processed using GPT-4o-mini to minimize cost.

Confidence Evaluation
An AI agent analyzes the response quality.
It evaluates accuracy, completeness, clarity, and relevance.

Structured Output Parsing
The evaluator returns structured data including:
confidence score
explanation
escalation recommendation.

Decision Logic
If the confidence score is below the configured threshold, the workflow escalates the request.

Expensive Model Escalation
The query is reprocessed using GPT-4o for a higher-quality answer.

Cost Calculation
Token usage is analyzed to estimate:
total cost
cost difference between models.

Final Response Formatting
The workflow returns:
AI response
model used
confidence score
escalation status
estimated cost.

Setup Instructions

Create an OpenAI credential in n8n.

Configure the following nodes:
Cheap Model (GPT-4o-mini)
Expensive Model (GPT-4o)
OpenAI Chat Model used by the confidence evaluator agent.

Adjust configuration values in the Workflow Configuration node:
confidenceThreshold
cheapModelCostPer1kTokens
expensiveModelCostPer1kTokens

Deploy the workflow and send requests to the Webhook URL.

Example webhook payload:

{
"query": "Explain how photosynthesis works."
}

Nodes Used (5)

AI Agent
@n8n/n8n-nodes-langchain.agent
Code
n8n-nodes-base.code
OpenAI
@n8n/n8n-nodes-langchain.openAi
OpenAI Chat Model
@n8n/n8n-nodes-langchain.lmChatOpenAi
Structured Output Parser
@n8n/n8n-nodes-langchain.outputParserStructured