💡🌐 Essential Multipage Website Scraper with Jina.ai

Go to Workflow
15,070 views
Built by Joseph LePage Joseph LePage
Created on June 05, 2026

Description

💡🌐 Essential Multipage Website Scraper with Jina.ai

Use responsibly and follow local rules and regulations

This N8N workflow enables automated multi-page website scraping using Jina.ai's powerful web scraping capabilities, with seamless integration to Google Drive for content storage. Here's how it works:

Main Features
The workflow automatically scrapes multiple pages from a website's sitemap and saves each page's content as a separate Google Drive document.

Key Components
Input Configuration
Starts with a sitemap URL (default: https://ai.pydantic.dev/sitemap.xml)**
Processes the sitemap to extract individual page URLs
Includes filtering options to target specific topics or pages

Scraping Process
Uses Jina.ai's web scraper to extract content from each URL
Converts webpage content into clean markdown format
Extracts page titles automatically for document naming

Storage Integration
Creates individual Google Drive documents for each scraped page
Names documents using the format "URL - Page Title"
Saves content in markdown format for better readability

Usage Instructions
Set your target website's sitemap URL in the "Set Website URL" node
Configure the "Filter By Topics or Pages" node to select specific content
Adjust the "Limit" node (default: 20 pages) to control batch size
Connect your Google Drive account
Run the workflow to begin automated scraping

Additional Features
Built-in rate limiting through the Wait node to prevent overloading servers
Batch processing capability for handling large sitemaps

The workflow requires no API key for Jina.ai, making it accessible for immediate use while maintaining responsible scraping practices.

Nodes Used (3)

Code
n8n-nodes-base.code
Google Drive
n8n-nodes-base.googleDrive
HTTP Request
n8n-nodes-base.httpRequest