Evaluation metric example: String similarity

361 views

Built by

David Roberts

Created on July 28, 2026

Description

AI evaluation in n8n

This is a template for n8n's evaluation feature.

Evaluation is a technique for getting confidence that your AI workflow performs reliably, by running a test dataset containing different inputs through the workflow.

By calculating a metric (score) for each input, you can see where the workflow is performing well and where it isn't.

How it works

This template shows how to calculate a workflow evaluation metric: text similarity, measured character-by-character.

The workflow takes images of hand-written codes, extracts the code and compares it with the expected answer from the dataset.

The images look like this:

The workflow works as follows:

We use an evaluation trigger to read in our dataset
It is wired up in parallel with the regular trigger so that the workflow can be started from either one. More info
We download the image and use AI to extract the code
If we’re evaluating (i.e. the execution started from the evaluation trigger), we calculate the string distance metric
We pass this information back to n8n as a metric

Nodes Used (4)

Code

n8n-nodes-base.code

Evaluation

n8n-nodes-base.evaluation

HTTP Request

n8n-nodes-base.httpRequest

OpenAI

@n8n/n8n-nodes-langchain.openAi

Evaluation metric example: String similarity

Description

Nodes Used (4)

Select Nodes to Filter