Evaluate AI Agent Response Relevance using OpenAI and Cosine Similarity

555 views

Built by

Jimleuk

Created on July 28, 2026

Description

This n8n template demonstrates how to calculate the evaluation metric "Relevance" which in this scenario, measures the relevance of the agent's response to the user's question.

The scoring approach is adapted from the open-source evaluations project RAGAS and you can see the source here
https://github.com/explodinggradients/ragas/blob/main/ragas/src/ragas/metrics/_answer_relevance.py

How it works
This evaluation works best for Q&A agents.
For our scoring, we analyse the agent's response and ask another AI to generate a question from it. This generated question is then compared to the original question using cosine similarity.
A high score indicates relevance and the agent's successful ability to answer the question whereas a low score means agent may have added too much irrelevant info, went off script or hallucinated.

Requirements
n8n version 1.94+
Check out this Google Sheet for a sample data https://docs.google.com/spreadsheets/d/1YOnu2JJjlxd787AuYcg-wKbkjyjyZFgASYVV0jsij5Y/edit?usp=sharing

Nodes Used (7)

AI Agent

@n8n/n8n-nodes-langchain.agent

Basic LLM Chain

@n8n/n8n-nodes-langchain.chainLlm

Code

n8n-nodes-base.code

Evaluation

n8n-nodes-base.evaluation

HTTP Request

n8n-nodes-base.httpRequest

OpenAI Chat Model

@n8n/n8n-nodes-langchain.lmChatOpenAi

Structured Output Parser

@n8n/n8n-nodes-langchain.outputParserStructured

Evaluate AI Agent Response Relevance using OpenAI and Cosine Similarity

Description

Nodes Used (7)

Select Nodes to Filter