✨

Anansi: Self-Healing Web Scraper with MCP Server

May 16, 2026

Original link

Anansi GitHub repositoryOpen original externally

Quick summary

A Python crawler for unstable or JavaScript-heavy sites, with selector healing, structured-data extraction, adaptive rate limiting, and an MCP server for agent-driven crawling. Use only for authorized scraping.

Anansi: Self-Healing Web Scraper with MCP Server

Image

Enlarge

Preview image from the primary source.

#web scraping#mcp#python#crawler#ai agents#data extraction#automation

Status & Access

Current access and latest update details.

Access

Free

Updated

May 16, 2026, 11:45 PM

Anansi is a Python web scraping toolkit designed for sites that change often or need browser rendering. It combines adaptive parsing, structured-data extraction, incremental crawling, proxy support, and an MCP server so an LLM or agent workflow can drive fetch, extract, crawl, pause, resume, export, and metrics actions.

Why it is useful

Self-healing selectors: stores selector confidence and attempts fallback strategies when a layout changes.
Structured extraction first: pulls JSON-LD, Open Graph, and Microdata before relying on brittle CSS selectors.
Browser upgrade path: can switch from HTTP fetching to Playwright rendering for JavaScript-heavy pages.
Crawler durability: includes an async crawler, SQLite-backed queue, incremental recrawls, ETag/Last-Modified handling, and resumable jobs.
Agent-ready interface: ships with an MCP server so compatible LLM tools can operate crawls through tool calls.

Best fit

Use Anansi when you need a resilient research or data-extraction crawler for websites you are allowed to access, especially where pages change structure or require JavaScript rendering. It is most relevant for developers building data pipelines, monitoring workflows, competitive research dashboards, or agentic browsing systems.

Quick evaluation checklist

Confirm the target website permits your intended crawling use case.
Start with structured data extraction before custom selectors.
Enable browser rendering only where HTTP fetching is insufficient.
Keep adaptive rate limiting active and respect Retry-After responses.
Use the MCP server when you want an agent to orchestrate crawl tasks instead of manually scripting every step.

Source notes

The GitHub repository describes Anansi as a self-healing web scraper with selector repair, browser rendering fallback, Chrome-like TLS fingerprinting, Pydantic validation, incremental crawling, and an MCP server. The project is written primarily in Python and is licensed under Apache-2.0.

Sources & links

References, demos, and supporting links.

Anansi GitHub repositorygithub.comPrimary

Discussion

No comments yet. Start the discussion.

Keep exploring

Anansi: Self-Healing Web Scraper with MCP Server

Quick summary

Why it is useful

Best fit

Quick evaluation checklist

Source notes

More from this topic

Lowdefy v5.3 adds AI agents in 30 lines of YAML

Airbyte Agents launches a context layer for AI agents across business apps

Containarium puts a self-hosted, MCP-native sandbox in front of AI coding agents

Anansi: Self-Healing Web Scraper with MCP Server

Quick summary

Why it is useful

Best fit

Quick evaluation checklist

Source notes

More from this topic

AI Won’t Tell You Your Idea Is Bad — Compact Founder Course

Graphify turns any folder into a queryable knowledge graph for AI coding agents

Lowdefy v5.3 adds AI agents in 30 lines of YAML

Airbyte Agents launches a context layer for AI agents across business apps

Containarium puts a self-hosted, MCP-native sandbox in front of AI coding agents