API Schema Extractor

Go to Workflow
23,830 views
Built by Polina Medvedieva Polina Medvedieva
Created on June 07, 2026

Description

This workflow automates the process of discovering and extracting APIs from various services, followed by generating custom schemas. It works in three distinct stages: research, extraction, and schema generation, with each stage tracking progress in a Google Sheet.

🙏 Jim Le deserves major kudos for helping to build this sophisticated three-stage workflow that cleverly automates API documentation processing using a smart combination of web scraping, vector search, and LLM technologies.

How it works
Stage 1 - Research:
Fetches pending services from a Google Sheet
Uses Google search to find API documentation
Employs Apify for web scraping to filter relevant pages
Stores webpage contents and metadata in Qdrant (vector database)
Updates progress status in Google Sheet (pending, ok, or error)

Stage 2 - Extraction:
Processes services that completed research successfully
Queries vector store to identify products and offerings
Further queries for relevant API documentation
Uses Gemini (LLM) to extract API operations
Records extracted operations in Google Sheet
Updates progress status (pending, ok, or error)

Stage 3 - Generation:
Takes services with successful extraction
Retrieves all API operations from the database
Combines and groups operations into a custom schema
Uploads final schema to Google Drive
Updates final status in sheet with file location

Ideal for:
Development teams needing to catalog multiple APIs
API documentation initiatives
Creating standardized API schema collections
Automating API discovery and documentation

Accounts required:
Google account (for Sheets and Drive access)
Apify account (for web scraping)
Qdrant database
Gemini API access

Set up instructions:
Prepare your Google Sheets document with the services information. Here's an example of a Google Sheet – you can copy it and change or remove the values under the columns. Also, make sure to update Google Sheets nodes with the correct Google Sheet ID.
Configure Google Sheets OAuth2 credentials, required third-party services (Apify, Qdrant) and Gemini.
Ensure proper permissions for Google Drive access.

Nodes Used (11)

Code
n8n-nodes-base.code
Default Data Loader
@n8n/n8n-nodes-langchain.documentDefaultDataLoader
Embeddings Google Gemini
@n8n/n8n-nodes-langchain.embeddingsGoogleGemini
Google Drive
n8n-nodes-base.googleDrive
Google Gemini Chat Model
@n8n/n8n-nodes-langchain.lmChatGoogleGemini
Google Sheets
n8n-nodes-base.googleSheets
HTTP Request
n8n-nodes-base.httpRequest
Information Extractor
@n8n/n8n-nodes-langchain.informationExtractor
Qdrant Vector Store
@n8n/n8n-nodes-langchain.vectorStoreQdrant
Recursive Character Text Splitter
@n8n/n8n-nodes-langchain.textSplitterRecursiveCharacterTextSplitter
Text Classifier
@n8n/n8n-nodes-langchain.textClassifier