Extract Invoice Data from PDFs to JSON with Gemini AI and XML Transformation

Go to Workflow
0 views
Built by Mauricio Perera Mauricio Perera
Created on June 05, 2026

Description

This n8n workflow converts invoices in PDF format into a structured, ready-to-use JSON, using AI and XML transformation — without writing any code.

🚀 How it works

Upload form → The user uploads a PDF file.
Text extraction → The PDF content is extracted as plain text.
XML schema definition → A standard invoice structure is defined with fields such as:

Invoice number
Customer and issuer details
Items with description, quantity, and price
Totals and taxes
Bank account details
AI (Gemini) → The model rewrites the PDF text into a valid XML following the predefined schema.
XML cleanup → Removes extra tags, line breaks, and unnecessary formatting.
JSON conversion → The XML is transformed into a clean, structured JSON object, ready for integrations, APIs, or storage.

✨ Benefits

Transforms unstructured PDFs into normalized JSON data.
No coding required, only n8n nodes.
Scalable to different invoice formats with minimal adjustments.
Leverages AI to interpret complex textual content.

🛠️ Use cases

Automating invoice data capture.
Integration with ERPs, CRMs, or databases.
Generating financial reports from PDFs.

Nodes Used (1)

Google Gemini
@n8n/n8n-nodes-langchain.googleGemini