AI agents are entering their rebuild era as enterprises confront the reliability problem
As enterprise AI agents move into production, organizations are confronting a growing reliability problem. Many teams are discovering that LLM performance alone does not determine whether agents succeed in production. Long-running AI workflows must survive crashes, preserve state, recover from failures, manage inference costs, and coordinate across APIs, tools, and enterprise systems. After a first […]
Researchers automated LLM reasoning strategy design and cut token usage by 69.5%
Test-time scaling (TTS) has emerged as a proven method to improve the performance of large language models in real-world applications by giving them extra compute cycles at inference time. However, TTS strategies have historically been handcrafted, relying heavily on human intuition to dictate the rules of the model’s reasoning. To address this bottleneck, researchers from […]
Merck and Mastercard are seeing real agentic AI results. Both say the plumbing came first.
Merck is using AI agents to cut drug discovery cycles by a third and ship compliant marketing materials up to 80% faster — but VP of Digital Platforms Sean Finnerty says the only reason it’s working is because they built the infrastructure first. And the pharmaceutical manufacturer is seeing promising early results: AI is generating […]
Are designers the new SWEs? Figma Make’s new two-way GitHub integration turns designs into live, production code — with built-in governance
Cloud design software company Figma is officially transforming its AI design assistant, Figma Make, from a prototyping sandbox into a live, visual software editor that connects natively to production codebases. Announced today, the update allows product managers, designers, and non-technical builders to import an existing Git repository directly into the Figma desktop app, visually edit […]
SQL query logs hold the context AI agents need to stop hallucinating joins
When Miro’s data team pointed AI agents directly at its Snowflake environment, the agents got the wrong answer more than 65% of the time. The problem wasn’t the model — it was context. With more than 10,000 tables and no semantic layer to guide routing, the agents had no way to know which data assets […]
Control within connection: How data sovereignty is rewriting the rules of critical infrastructure
Presented by Equinix Digital systems are central to economic resilience. But the governance models supporting them were designed for a bygone era, when systems were smaller, often centralized, and rarely crossing multiple jurisdictions. This structural mismatch is driving the realization across boardrooms and governments that data sovereignty is not only core to critical infrastructure, but […]
MiniMax teases upcoming M3 model with new sparse attention mechanism and 15.6X long-context response speed boost
Among the many Chinese AI companies and laboratories vying for market share and attention (no pun intended) on the global marketplace, MiniMax stands out for its commitment to providing frontier-level intelligence across a range of modalities, including text, coding, and video (through its Hailuo model series) — often under permissive, enterprise-friendly, standard open source licenses. […]
DataGrail report finds your vendor may be sending data to AI models you never approved
The data processing agreement (DPA) — the bedrock contract companies use to evaluate how vendors handle personal data — can no longer be trusted at face value. That is the central, and arguably most alarming, conclusion of DataGrail’s Privacy and AI Trends Report 2026, released today. The San Francisco-based privacy platform analyzed 2,400 popular business […]
DeepSWE blows up the AI coding leaderboard, crowns GPT-5.5, and finds Claude Opus exploiting a benchmark loophole
For months, the leading AI coding benchmarks have told enterprise buyers a comforting but misleading story: the top models are all roughly the same. OpenAI’s GPT-5 family, Anthropic’s Claude Opus, and Google’s Gemini Pro have clustered within a narrow band on Scale AI’s SWE-Bench Pro leaderboard, making it nearly impossible for engineering leaders to determine […]
The attack dominating financial services doesn’t steal passwords. It resets MFA and steals the token.
The attacker who hit the most financial services organizations over the past 12 months never phished a password. They called an IT support line, convinced an employee to reset their MFA, and registered their own device on the network. CrowdStrike’s 2026 Financial Services Threat Landscape Report, released this month and covering activity from April 2025 […]
