Generate production database schemas from Excel and CSV with OpenAI and LangChain
Go to WorkflowDescription
Overview
This workflow automatically converts CSV or Excel files into a production-ready database schema using AI and rule-based validation.
It analyzes uploaded data, detects column types, relationships, and data quality, then generates a normalized schema. The output includes SQL DDL scripts, ERD diagrams, a data dictionary, and a load plan.
This eliminates manual schema design and accelerates database setup from raw data.
How It Works
File Upload (Webhook)
Accepts CSV or XLSX files via webhook endpoint
Initializes workflow configuration (thresholds, retry limits)
File Extraction
Detects file format (CSV or Excel)
Extracts rows into structured JSON
Merges extracted datasets
Data Cleaning & Profiling
Removes duplicates and normalizes values
Detects data types (integer, float, date, boolean, string)
Computes column statistics (nulls, uniqueness, distributions)
Generates file hash and sample dataset
Column Profiling Engine
Identifies potential primary keys
Detects cardinality and uniqueness levels
Suggests foreign key relationships based on value overlap
AI Schema Generation
Uses an AI agent to design normalized tables
Assigns SQL data types based on real data
Defines primary keys, foreign keys, constraints, and indexes
Validation Layer
Ensures schema matches actual data
Validates:
Data types
Primary key uniqueness
Foreign key overlap (>70%)
Constraint consistency
Detects circular dependencies
Revision Loop
If validation fails:
Sends feedback to AI agent
Regenerates schema
Retries up to configured limit
Schema Output Generation
Generates:
SQL DDL scripts
ERD (Mermaid format)
Data dictionary
Load plan with dependency graph
Load Plan Engine
Computes optimal table insertion order
Detects circular dependencies
Suggests batching strategy
Combine & Explain
Merges all outputs
Optional AI explanation of schema decisions
Response Output
Returns structured JSON via webhook:
SQL schema
ERD summary
Data dictionary
Load plan
Optional explanation
Setup Instructions
Activate the workflow and copy the webhook URL
Send a POST request with a CSV or XLSX file
Configure OpenAI credentials (used by AI agent)
Adjust thresholds if needed (FK overlap, retries, confidence)
Execute workflow and review generated outputs
Use Cases
Auto-generate database schema from CSV/Excel files
Data migration and onboarding pipelines
Rapid database prototyping
Reverse engineering datasets
AI-assisted data modeling
Requirements
n8n (latest version recommended)
OpenAI API credentials
LangChain nodes enabled
CSV or XLSX input file