Generate production database schemas from Excel and CSV with OpenAI and LangChain

Go to Workflow
0 views
Built by ResilNext ResilNext
Created on June 05, 2026

Description

Overview

This workflow automatically converts CSV or Excel files into a production-ready database schema using AI and rule-based validation.

It analyzes uploaded data, detects column types, relationships, and data quality, then generates a normalized schema. The output includes SQL DDL scripts, ERD diagrams, a data dictionary, and a load plan.

This eliminates manual schema design and accelerates database setup from raw data.

How It Works

File Upload (Webhook)
Accepts CSV or XLSX files via webhook endpoint
Initializes workflow configuration (thresholds, retry limits)

File Extraction
Detects file format (CSV or Excel)
Extracts rows into structured JSON
Merges extracted datasets

Data Cleaning & Profiling
Removes duplicates and normalizes values
Detects data types (integer, float, date, boolean, string)
Computes column statistics (nulls, uniqueness, distributions)
Generates file hash and sample dataset

Column Profiling Engine
Identifies potential primary keys
Detects cardinality and uniqueness levels
Suggests foreign key relationships based on value overlap

AI Schema Generation
Uses an AI agent to design normalized tables
Assigns SQL data types based on real data
Defines primary keys, foreign keys, constraints, and indexes

Validation Layer
Ensures schema matches actual data
Validates:
Data types
Primary key uniqueness
Foreign key overlap (>70%)
Constraint consistency
Detects circular dependencies

Revision Loop
If validation fails:
Sends feedback to AI agent
Regenerates schema
Retries up to configured limit

Schema Output Generation
Generates:
SQL DDL scripts
ERD (Mermaid format)
Data dictionary
Load plan with dependency graph

Load Plan Engine
Computes optimal table insertion order
Detects circular dependencies
Suggests batching strategy

Combine & Explain
Merges all outputs
Optional AI explanation of schema decisions

Response Output
Returns structured JSON via webhook:
SQL schema
ERD summary
Data dictionary
Load plan
Optional explanation

Setup Instructions

Activate the workflow and copy the webhook URL
Send a POST request with a CSV or XLSX file
Configure OpenAI credentials (used by AI agent)
Adjust thresholds if needed (FK overlap, retries, confidence)
Execute workflow and review generated outputs

Use Cases

Auto-generate database schema from CSV/Excel files
Data migration and onboarding pipelines
Rapid database prototyping
Reverse engineering datasets
AI-assisted data modeling

Requirements

n8n (latest version recommended)
OpenAI API credentials
LangChain nodes enabled
CSV or XLSX input file

Nodes Used (4)

AI Agent
@n8n/n8n-nodes-langchain.agent
Code
n8n-nodes-base.code
OpenAI Chat Model
@n8n/n8n-nodes-langchain.lmChatOpenAi
Structured Output Parser
@n8n/n8n-nodes-langchain.outputParserStructured