🌐 Firecrawl Website Content Extractor

431 views

Built by

Aashit Sharma

Created on July 28, 2026

Description

🌐 Firecrawl Website Content Extractor (n8n Workflow)

This n8n automation workflow uses Firecrawl API to extract structured data (e.g., quotes and authors) from web pages — such as Quotes to Scrape — and handles retries in case of delayed extraction.

🔁 Workflow Overview

🎯 Purpose:
Crawl and extract structured web data using Firecrawl
Wait for asynchronous scraping to complete
Retrieve and validate results
Support retries if content is not ready

🔧 Step-by-Step Node Breakdown

1. 🧪 Manual Trigger
Node: When clicking ‘Test workflow’
Used to manually test or execute the workflow during setup or debugging.

2. 📤 Firecrawl Extract API Request
Node: Extract
Sends a POST request to https://api.firecrawl.dev/v1/extract
Payload includes:
urls: List of pages to crawl (https://quotes.toscrape.com/*)
prompt: "Extract all quotes and their corresponding authors from the website."
schema: JSON schema defining expected structure (quotes[], each with text and author)

> 📌 Uses an HTTP Header Auth credential for Firecrawl API

3. ⏱️ Wait for 30 Seconds
Node: 30 Secs
Gives Firecrawl time to finish processing in the background
Prevents hitting the API before results are ready

4. 📥 Get Results
Node: Get Results
Performs a GET request to the status URL using {{ $('Extract').item.json.id }} to retrieve extraction results.

5. ✅❌ Condition Check
Node: If
Checks if the data array is empty (i.e., no results yet)
If data is empty:
Waits 10 more seconds and retries
If data is available:
Passes data to the next step (e.g., processing or storage)

6. 🔁 Retry Delay
Node: 10 Seconds
Waits briefly before sending another GET request to Firecrawl

7. 🛠️ Edit Fields (Optional Output Formatting)
Node: Edit Fields
Placeholder to structure or format the extracted results (quotes and authors)

🧾 Sticky Note: Firecrawl Setup Guide

Included as an embedded reference:
🔗 10% Firecrawl Discount
🧰 Instructions to:
Add Firecrawl API credentials in n8n
Use Firecrawl Community Node for self-hosted instances
Set up the schema and prompt for targeted data extraction

✅ Key Features

🔌 API-based crawling with schema-structured output
⏱️ Smart waiting + retry mechanism
🧠 AI prompt integration for intelligent data parsing
⚙️ Flexible for different URLs, prompts, and schemas

📦 Sample Output Schema

{
"quotes": [
{
"text": "The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.",
"author": "Albert Einstein"
},
{
"text": "It is our choices, Harry, that show what we truly are, far more than our abilities.",
"author": "J.K. Rowling"
}
]
}

Nodes Used (1)

HTTP Request

n8n-nodes-base.httpRequest

🌐 Firecrawl Website Content Extractor

Description

Nodes Used (1)

Select Nodes to Filter