Extract Invoice Data from PDFs to JSON with Gemini AI and XML Transformation
Go to WorkflowDescription
This n8n workflow converts invoices in PDF format into a structured, ready-to-use JSON, using AI and XML transformation — without writing any code.
🚀 How it works
Upload form → The user uploads a PDF file.
Text extraction → The PDF content is extracted as plain text.
XML schema definition → A standard invoice structure is defined with fields such as:
Invoice number
Customer and issuer details
Items with description, quantity, and price
Totals and taxes
Bank account details
AI (Gemini) → The model rewrites the PDF text into a valid XML following the predefined schema.
XML cleanup → Removes extra tags, line breaks, and unnecessary formatting.
JSON conversion → The XML is transformed into a clean, structured JSON object, ready for integrations, APIs, or storage.
✨ Benefits
Transforms unstructured PDFs into normalized JSON data.
No coding required, only n8n nodes.
Scalable to different invoice formats with minimal adjustments.
Leverages AI to interpret complex textual content.
🛠️ Use cases
Automating invoice data capture.
Integration with ERPs, CRMs, or databases.
Generating financial reports from PDFs.