Scrape Changing Websites with Anansi Self-Healing Selectors and MCP

A Python crawler for unstable or JavaScript-heavy sites, with selector healing, structured-data extraction, adaptive rate limiting, and an MCP server for agent-driven crawling. Use only for authorized scraping.

Original
Anansi GitHub repositoryOpen original externally
May 16, 2026
Status & Access
Current access and latest update details.
Access
Free
Updated
Jun 1, 2026, 09:54 AM

LinkLoot AI review

Code checked, run has gaps

Score: 69/100
Code execution prepared for isolation

Reviewed loot: Scrape Changing Websites with Anansi Self-Healing Selectors and MCP

My take: Anansi: Self-Healing Web Scraper with MCP Server has practical evidence: install, dependency checks, and the relevant sandbox steps ran in isolation.

User decisionVerify first

My take: Anansi: Self-Healing Web Scraper with MCP Server has practical evidence: install, dependency checks, and the relevant sandbox steps ran in isolation.

The core value is well supported: installation, local function test, and MCP protocol check ran in the sandbox.Judges how careful a user should be when trying it: permissions, network use, dependencies, and hard warnings.
Reasons to use it
  • Keeps promise: The core value is well supported: installation, local function test, and MCP protocol check ran in the sandbox.
  • Easy to try: Judges whether a normal user can repeat the first setup with reasonable effort.
  • Worth following: Judges whether the loot still looks worth using or following after this review.
  • Sources, external URL, and visible link/site signals were reviewed.
Reasons to be careful
  • The runner found 1 place(s) that can start programs, use install scripts, or run code dynamically. For this loot: try it in a test environment first, do not use real tokens/cookies, then...
  • 5 spots mention credentials, browser sessions, root/admin mode, proxies, or similar access-sensitive behavior. This fits the tool category, but it means testing should happen with throwaw...
  • A lockfile helps make dependency review reproducible. Without one, the installed dependency set can change later.
  • The review could not extract exact dependency versions for a reliable public vulnerability lookup.
Keeps promise90/100
Safe to try45/100
Easy to try83/100
Trust signals68/100
Worth following72/100
LLQI auditVerdict: RISKY
security & trust68/100
functionality & value78/100
quality & structure72/100
sources checked
code signals checked
snapshot optional
Dependency coverage open

Automated AI review. Decision aid, not a safety guarantee. · 2026-06-01 05:01:09 UTC

Anansi is a Python web scraping toolkit designed for sites that change often or need browser rendering. It combines adaptive parsing, structured-data extraction, incremental crawling, proxy support, and an MCP server so an LLM or agent workflow can drive fetch, extract, crawl, pause, resume, export, and metrics actions.

Why it is useful

  • Self-healing selectors: stores selector confidence and attempts fallback strategies when a layout changes.
  • Structured extraction first: pulls JSON-LD, Open Graph, and Microdata before relying on brittle CSS selectors.
  • Browser upgrade path: can switch from HTTP fetching to Playwright rendering for JavaScript-heavy pages.
  • Crawler durability: includes an async crawler, SQLite-backed queue, incremental recrawls, ETag/Last-Modified handling, and resumable jobs.
  • Agent-ready interface: ships with an MCP server so compatible LLM tools can operate crawls through tool calls.

Best fit

Use Anansi when you need a resilient research or data-extraction crawler for websites you are allowed to access, especially where pages change structure or require JavaScript rendering. It is most relevant for developers building data pipelines, monitoring workflows, competitive research dashboards, or agentic browsing systems.

Quick evaluation checklist

  1. Confirm the target website permits your intended crawling use case.
  2. Start with structured data extraction before custom selectors.
  3. Enable browser rendering only where HTTP fetching is insufficient.
  4. Keep adaptive rate limiting active and respect Retry-After responses.
  5. Use the MCP server when you want an agent to orchestrate crawl tasks instead of manually scripting every step.

Source notes

The GitHub repository describes Anansi as a self-healing web scraper with selector repair, browser rendering fallback, Chrome-like TLS fingerprinting, Pydantic validation, incremental crawling, and an MCP server. The project is written primarily in Python and is licensed under Apache-2.0.

Discussion

Sign in to join the discussion and vote on comments.

No comments yet. Start the discussion.
Keep exploring

More from this topic

More in AI & Automation