Detect stale website content with OpenAI, Google Sheets, and Gmail
Go to WorkflowDescription
Stale Content Detector for Websites
Who is this for
Content marketers, SEO managers, and website owners who want to automatically find pages on their site that are outdated or need refreshing — without manually auditing every page.
What it does
This workflow fetches your sitemap, identifies pages that have not been updated in a configurable number of days, fetches each stale page, and uses AI to assess whether the content is actually outdated or still accurate.
Sitemap parsing**: Fetches your sitemap.xml and extracts all URLs with their last-modified dates
Staleness filtering**: Flags pages not updated in more than X days (default: 180) and sorts by most stale first
Page content extraction**: Fetches each stale page and extracts the title and body text
AI freshness analysis**: An OpenAI-powered agent reviews each page and rates it LOW, MEDIUM, HIGH, or CRITICAL with specific update suggestions
Audit logging**: Saves every reviewed page to a Google Sheet with the full AI analysis
HTML email report**: Builds a color-coded summary email showing each flagged page with its AI verdict and sends one consolidated digest
How to set up
Open Site Configuration and set your sitemapUrl, staleDays (default: 180), and alertEmail
Create a Google Sheet with a ContentAudit tab (columns: scan_date, page_url, last_modified, days_since_update, ai_review)
Paste your Google Sheet URL into the Save to Content Audit Sheet node
Connect your Gmail OAuth2 credentials on the Email Content Audit Report node
Connect your Google Sheets credentials
Connect your OpenAI API credentials on the OpenAI Chat Model node
Activate — runs every Monday at 7 AM
Requirements
n8n account (cloud or self-hosted)
A website with a sitemap.xml (most CMS platforms generate one automatically)
OpenAI API key (uses gpt-4o-mini)
Gmail account with OAuth2
Google Sheets
How to customize
Change the staleDays threshold in Site Configuration (default: 180 days / 6 months)
Increase the page limit above 20 in the Code node for larger sites
Add specific URL path filters to focus on blog posts, docs, or landing pages only
Replace Gmail with Slack for faster team notifications
Connect to your CMS API (WordPress, Ghost, Webflow) to pull content directly instead of scraping