#ai agents
Loot, blog posts and adjacent themes connected to this topic. Follow the tag to keep it in your orbit.
More from this topic
Related reads
Cloudflare adds temporary accounts so AI agents can deploy Workers without signup
Cloudflare Temporary Accounts let AI agents deploy Workers with Wrangler, keep the preview live for 60 minutes, iterate during that window, …
LedgerAgent tests structured state for policy-bound tool-calling agents
A new arXiv preprint proposes LedgerAgent, an inference-time method that keeps customer-service agent state in a separate ledger before poli…
WorkClaw Launches AI Coworkers for Slack, Teams, and 3,000+ Apps
WorkClaw is positioning AI coworkers as shared team members inside Slack and Microsoft Teams, with cloud-hosted workspaces, customizable ski…
API to MCP Launches a Hosted Path From Business APIs to Agent Tools
API to MCP is pitching a hosted way to turn REST and GraphQL APIs into remote MCP servers for Codex, Cursor, Claude Code, and other agent cl…
SIA Tests Self-Improving AI Across Agent Harnesses and Model Weights
A new arXiv paper and official implementation show SIA updating both an agent scaffold and model weights, with reported gains on LawBench, G…
MosaicLeaks shows how research-agent search queries can leak private data
MosaicLeaks is a new benchmark for deep-research agents that shows how external web queries can expose private enterprise facts through the …
Hugging Face Shows How to Benchmark Whether Tools Are Agent-Friendly
Hugging Face published an agent-evaluation harness that tests whether coding agents can use a library efficiently, not only whether they rea…
CEO-Bench Tests Whether AI Agents Can Run a Startup for 500 Days
DeepSeek V4 Vision quietly arrives in chat, but the API gap still matters
DeepSeek appears to have rolled out image upload and visual understanding in its web chat, but official API docs still frame DeepSeek V4 as …
WorkBench Revisited Shows Why Workplace Agent Scores Need Source-Level Checks
WorkBench Revisited updates a workplace-agent benchmark with 2026 model runs, but the arXiv abstract and GitHub repository currently surface…
Taste Lab turns website design DNA into agent-ready briefs
Taste Lab analyzes a website's visual decisions, tokens, and trade-offs so AI coding agents can reuse a design direction without blindly cop…
GitHub Agent Finder brings ARD discovery into Copilot
GitHub Agent Finder lets Copilot discover allowed agents, skills, tools, and MCP servers through the open Agentic Resource Discovery specifi…
CoDA-Bench tests whether coding agents can find the right data before writing code
CoDA-Bench is a new ICML 2026 benchmark for code agents that must search noisy data folders, identify relevant files, write code, and answer…
GitHub expands Copilot Agent Tasks API to paid individual plans
GitHub now lets Copilot Pro, Pro+, and Max users start and track Copilot cloud agent tasks through the Agent Tasks REST API. The practical v…
Deep-XPIA tests prompt injection across multi-agent handoffs
Deep-XPIA is an open-source benchmark for cross-prompt injection in multi-agent systems, with live Claude Haiku measurements, a confused-dep…
Novu Connect turns one AI agent into a multi-channel teammate
Novu Connect is a new Agent Communication Infrastructure layer for connecting Claude Managed Agents and custom agents to Slack, Microsoft Te…
OpenAI Agent Builder and Evals shutdown: what to migrate before November 30
OpenAI has scheduled Agent Builder, the Evals platform, and reusable prompt objects for shutdown on November 30, 2026, with Evals becoming r…
GitHub Copilot code review adds org runners, content exclusions, and longer instructions
GitHub added governance controls for Copilot code review: organization-level runner defaults, content exclusion support, and no 4,000-charac…
Firecrawl Prometheus turns web data requests into maintained collectors
Firecrawl launched Prometheus, an experimental forward-deployed agent that turns plain-English web data requests into Firecrawl SDK collecto…
SuperHQ Puts Coding Agents Inside Local microVM Sandboxes
SuperHQ is an early open source app for running AI coding agents in isolated local microVMs, with diff review and an auth gateway that keeps…
VS Code 1.124 makes agent sessions easier to queue, navigate, and govern
Visual Studio Code 1.124 sharpens the Agents window with background sessions, keyboard navigation, restored layouts, smarter Autopilot, brow…
Hugging Face Serge puts AI code review inside GitHub pull requests
Hugging Face released Serge, an open-source GitHub-native AI code reviewer that follows repository-owned review rules and works with OpenAI-…
OpenEnv gets broader open-source backing for agentic RL environments
Hugging Face says OpenEnv is moving under broader open-source coordination, positioning it as a protocol layer for agentic reinforcement lea…
GitHub Copilot SDK is now generally available for agent-powered apps
GitHub has moved Copilot SDK to general availability, giving teams a stable way to embed Copilot's agent runtime into apps, internal tools, …
Agents' Last Exam tests AI agents on real professional workflows
Agents' Last Exam is a new Berkeley-led benchmark for computer-use AI agents, with long-horizon professional tasks, verifiable outcomes, pub…
GitHub Agentic Workflows Moves Into Public Preview
GitHub Agentic Workflows is now in public preview, letting teams define AI-driven repository automation in Markdown and run it through GitHu…
Albato's AppSumo deal adds AI agents to no-code automation
Albato is back on AppSumo with a lifetime automation offer, and the June update adds Albato Copilot plus autonomous AI Agents for building a…
GitHub Copilot CLI adds an experimental security review command
GitHub Copilot CLI now has an experimental /security-review command that checks local code changes for high-impact vulnerability patterns be…
Browse.sh turns browser-agent memory into reusable web skills
Browserbase's Browse.sh gives AI agents a catalog of reusable browser skills, with Product Hunt traction showing fresh demand for web automa…