Route AI queries cost‑efficiently with GPT‑4o‑mini, GPT‑4o and confidence scoring

0 views

Built by

ResilNext

Created on July 28, 2026

Description

This workflow implements a cost-optimized AI routing system using n8n. It intelligently decides whether a request should be handled by a low-cost model or escalated to a higher-quality model based on response confidence.

The goal is to minimize LLM usage costs while maintaining high answer quality.

A query is first processed by a cheaper model. The response is then evaluated by a confidence-scoring AI agent. If the response quality is insufficient, the workflow automatically escalates the request to a more capable model.

This approach is useful for building scalable AI systems where most queries can be answered cheaply, while complex queries still receive high-quality responses.

How It Works

Webhook Trigger
Receives a user query from an external application.

Workflow Configuration
Defines parameters such as:
confidence threshold
cheap model cost
expensive model cost

Cheap Model Response
The query is first processed using GPT-4o-mini to minimize cost.

Confidence Evaluation
An AI agent analyzes the response quality.
It evaluates accuracy, completeness, clarity, and relevance.

Structured Output Parsing
The evaluator returns structured data including:
confidence score
explanation
escalation recommendation.

Decision Logic
If the confidence score is below the configured threshold, the workflow escalates the request.

Expensive Model Escalation
The query is reprocessed using GPT-4o for a higher-quality answer.

Cost Calculation
Token usage is analyzed to estimate:
total cost
cost difference between models.

Final Response Formatting
The workflow returns:
AI response
model used
confidence score
escalation status
estimated cost.

Setup Instructions

Create an OpenAI credential in n8n.

Configure the following nodes:
Cheap Model (GPT-4o-mini)
Expensive Model (GPT-4o)
OpenAI Chat Model used by the confidence evaluator agent.

Adjust configuration values in the Workflow Configuration node:
confidenceThreshold
cheapModelCostPer1kTokens
expensiveModelCostPer1kTokens

Deploy the workflow and send requests to the Webhook URL.

Example webhook payload:

{
"query": "Explain how photosynthesis works."
}

Nodes Used (5)

AI Agent

@n8n/n8n-nodes-langchain.agent

Code

n8n-nodes-base.code

OpenAI

@n8n/n8n-nodes-langchain.openAi

OpenAI Chat Model

@n8n/n8n-nodes-langchain.lmChatOpenAi

Structured Output Parser

@n8n/n8n-nodes-langchain.outputParserStructured

Route AI queries cost‑efficiently with GPT‑4o‑mini, GPT‑4o and confidence scoring

Description

Nodes Used (5)

Select Nodes to Filter