✨

Scrape Changing Websites with Anansi Self-Healing Selectors and MCP

A Python crawler for unstable or JavaScript-heavy sites, with selector healing, structured-data extraction, adaptive rate limiting, and an MCP server for agent-driven crawling. Use only for authorized scraping.

Original

Anansi GitHub repositoryOpen original externally â†—

@ZachasADMIN

May 16, 2026

#web scraping#mcp#python#crawler#ai agents#data extraction#automation

Status & Access

Current access and latest update details.

Access

Free

Updated

Jul 13, 2026, 09:02 PM

LinkLoot AI review

Tool has value, start small

AI take: 69/100

Quick look at value, setup, permissions, and everyday caveats.

My take: Scrape Changing Websites with Anansi Self-Healing Selectors and MCP is interesting as a code/tool candidate, but only with a throwaway project, test data, and tightly scoped permissions. Then judge whether install, startup, and core function fit your setup.

Direct value

Can save time as a small tool if it fits your workflow and you start with test data.

Check first

Do not start with real tokens, private repos, or production data.

What you get

The practical value shows up in your own mini test: install it, start it, and compare it with a harmless example.

What to watch

Before relying on it, check install, startup, and permissions against your setup.

Automated AI review. Decision aid, not a safety guarantee. · 2026-06-08 16:58:49 UTC

Anansi is a Python web scraping toolkit designed for sites that change often or need browser rendering. It combines adaptive parsing, structured-data extraction, incremental crawling, proxy support, and an MCP server so an LLM or agent workflow can drive fetch, extract, crawl, pause, resume, export, and metrics actions.

Why it is useful

Self-healing selectors: stores selector confidence and attempts fallback strategies when a layout changes.
Structured extraction first: pulls JSON-LD, Open Graph, and Microdata before relying on brittle CSS selectors.
Browser upgrade path: can switch from HTTP fetching to Playwright rendering for JavaScript-heavy pages.
Crawler durability: includes an async crawler, SQLite-backed queue, incremental recrawls, ETag/Last-Modified handling, and resumable jobs.
Agent-ready interface: ships with an MCP server so compatible LLM tools can operate crawls through tool calls.

Best fit

Use Anansi when you need a resilient research or data-extraction crawler for websites you are allowed to access, especially where pages change structure or require JavaScript rendering. It is most relevant for developers building data pipelines, monitoring workflows, competitive research dashboards, or agentic browsing systems.

Quick evaluation checklist

Confirm the target website permits your intended crawling use case.
Start with structured data extraction before custom selectors.
Enable browser rendering only where HTTP fetching is insufficient.
Keep adaptive rate limiting active and respect Retry-After responses.
Use the MCP server when you want an agent to orchestrate crawl tasks instead of manually scripting every step.

Source notes

The GitHub repository describes Anansi as a self-healing web scraper with selector repair, browser rendering fallback, Chrome-like TLS fingerprinting, Pydantic validation, incremental crawling, and an MCP server. The project is written primarily in Python and is licensed under Apache-2.0.

Sources & links

References, demos, and supporting links.

Anansi GitHub repositorygithub.comPrimary

Discussion

No comments yet. Start the discussion.

Keep exploring

Scrape Changing Websites with Anansi Self-Healing Selectors and MCP

Tool has value, start small

Why it is useful

Best fit

Quick evaluation checklist

Source notes

More from this topic

Use Gemini Managed Agents for long-running MCP work without babysitting timeouts

Use GitHub Issue Fields GA before agents turn triage into guesswork

Move Shopify agents to UCP Cart MCP before the old Storefront cart tools expire

Scrape Changing Websites with Anansi Self-Healing Selectors and MCP

Tool has value, start small

Why it is useful

Best fit

Quick evaluation checklist

Source notes

More from this topic

Edit agent-made videos through a JSON timeline with FableCut

Make Codex SSH and mobile agent sessions less brittle after the July update

Audit OpenClaw Skills Before Install with Aegis Audit

Use Gemini Managed Agents for long-running MCP work without babysitting timeouts

Use GitHub Issue Fields GA before agents turn triage into guesswork

Move Shopify agents to UCP Cart MCP before the old Storefront cart tools expire