π Firecrawl Website Content Extractor
Go to WorkflowDescription
π Firecrawl Website Content Extractor (n8n Workflow)
This n8n automation workflow uses Firecrawl API to extract structured data (e.g., quotes and authors) from web pages β such as Quotes to Scrape β and handles retries in case of delayed extraction.
π Workflow Overview
π― Purpose:
Crawl and extract structured web data using Firecrawl
Wait for asynchronous scraping to complete
Retrieve and validate results
Support retries if content is not ready
π§ Step-by-Step Node Breakdown
1. π§ͺ Manual Trigger
Node: When clicking βTest workflowβ
Used to manually test or execute the workflow during setup or debugging.
2. π€ Firecrawl Extract API Request
Node: Extract
Sends a POST request to https://api.firecrawl.dev/v1/extract
Payload includes:
urls: List of pages to crawl (https://quotes.toscrape.com/*)
prompt: "Extract all quotes and their corresponding authors from the website."
schema: JSON schema defining expected structure (quotes[], each with text and author)
> π Uses an HTTP Header Auth credential for Firecrawl API
3. β±οΈ Wait for 30 Seconds
Node: 30 Secs
Gives Firecrawl time to finish processing in the background
Prevents hitting the API before results are ready
4. π₯ Get Results
Node: Get Results
Performs a GET request to the status URL using {{ $('Extract').item.json.id }} to retrieve extraction results.
5. β
β Condition Check
Node: If
Checks if the data array is empty (i.e., no results yet)
If data is empty:
Waits 10 more seconds and retries
If data is available:
Passes data to the next step (e.g., processing or storage)
6. π Retry Delay
Node: 10 Seconds
Waits briefly before sending another GET request to Firecrawl
7. π οΈ Edit Fields (Optional Output Formatting)
Node: Edit Fields
Placeholder to structure or format the extracted results (quotes and authors)
π§Ύ Sticky Note: Firecrawl Setup Guide
Included as an embedded reference:
π 10% Firecrawl Discount
π§° Instructions to:
Add Firecrawl API credentials in n8n
Use Firecrawl Community Node for self-hosted instances
Set up the schema and prompt for targeted data extraction
β
Key Features
π API-based crawling with schema-structured output
β±οΈ Smart waiting + retry mechanism
π§ AI prompt integration for intelligent data parsing
βοΈ Flexible for different URLs, prompts, and schemas
π¦ Sample Output Schema
{
"quotes": [
{
"text": "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.",
"author": "Albert Einstein"
},
{
"text": "It is our choices, Harry, that show what we truly are, far more than our abilities.",
"author": "J.K. Rowling"
}
]
}