Anansi: Self-Healing Web Scraper with MCP Server

User Avatar
@ZachasADMIN
May 16, 2026

Quick summary

A Python crawler for unstable or JavaScript-heavy sites, with selector healing, structured-data extraction, adaptive rate limiting, and an MCP server for agent-driven crawling. Use only for authorized scraping.

Read more
Anansi: Self-Healing Web Scraper with MCP Server
Image
Enlarge
Preview image from the primary source.
Status & Access
Current access and latest update details.
Access
Free
Updated
May 16, 2026, 11:45 PM

Anansi is a Python web scraping toolkit designed for sites that change often or need browser rendering. It combines adaptive parsing, structured-data extraction, incremental crawling, proxy support, and an MCP server so an LLM or agent workflow can drive fetch, extract, crawl, pause, resume, export, and metrics actions.

Why it is useful

  • Self-healing selectors: stores selector confidence and attempts fallback strategies when a layout changes.
  • Structured extraction first: pulls JSON-LD, Open Graph, and Microdata before relying on brittle CSS selectors.
  • Browser upgrade path: can switch from HTTP fetching to Playwright rendering for JavaScript-heavy pages.
  • Crawler durability: includes an async crawler, SQLite-backed queue, incremental recrawls, ETag/Last-Modified handling, and resumable jobs.
  • Agent-ready interface: ships with an MCP server so compatible LLM tools can operate crawls through tool calls.

Best fit

Use Anansi when you need a resilient research or data-extraction crawler for websites you are allowed to access, especially where pages change structure or require JavaScript rendering. It is most relevant for developers building data pipelines, monitoring workflows, competitive research dashboards, or agentic browsing systems.

Quick evaluation checklist

  1. Confirm the target website permits your intended crawling use case.
  2. Start with structured data extraction before custom selectors.
  3. Enable browser rendering only where HTTP fetching is insufficient.
  4. Keep adaptive rate limiting active and respect Retry-After responses.
  5. Use the MCP server when you want an agent to orchestrate crawl tasks instead of manually scripting every step.

Source notes

The GitHub repository describes Anansi as a self-healing web scraper with selector repair, browser rendering fallback, Chrome-like TLS fingerprinting, Pydantic validation, incremental crawling, and an MCP server. The project is written primarily in Python and is licensed under Apache-2.0.

Discussion

Sign in to join the discussion and vote on comments.

No comments yet. Start the discussion.
Keep exploring

More from this topic

More in AI & Automation