Databricks’ OfficeQA uncovers disconnect: AI agents ace abstract tests but stall at 45% on enterprise docs
There is no shortage of AI benchmarks in the market today, with popular options like Humanity’s Last Exam (HLE), ARC-AGI-2 and GDPval, among numerous others. AI agents excel at solving abstract math problems and passing PhD-level exams that most benchmarks are based on, but Databricks has a question for the enterprise: Can they actually handle […]
Brand-context AI: The missing requirement for marketing AI
Presented by BlueOcean AI has become a central part of how marketing teams work, but the results often fall short. Models can generate content at scale and summarize information in seconds, yet the outputs are not always aligned with the brand, the audience, or the company’s strategic goals. The problem is not capability. The problem […]
Tracking every decision, dollar and delay: The new process intelligence engine driving public-sector progress
Presented by Celonis The State of Oklahoma discovered its blind spots the hard way. In April 2023, a legislative report revealed its agencies had spent $3 billion without proper oversight. Janet Morrow, Director of Oklahoma’s Risk, Assessment and Compliance Division, set out to track thousands of monthly transactions across dozens of disconnected systems. The Sooner […]
Z.ai debuts open source GLM-4.6V, a native tool-calling vision model for multimodal reasoning
Chinese AI startup Zhipu AI aka Z.ai has released its GLM-4.6V series, a new generation of open-source vision-language models (VLMs) optimized for multimodal reasoning, frontend automation, and high-efficiency deployment. The release includes two models in “large” and “small” sizes: GLM-4.6V (106B), a larger 106-billion parameter model aimed at cloud-scale inference GLM-4.6V-Flash (9B), a smaller model […]
Anthropic’s Claude Code can now read your Slack messages and write code for you
Anthropic has launched a beta integration that connects its fast-growing Claude Code programming agent directly into Slack, allowing software engineers to delegate coding tasks without leaving the workplace messaging platform where much of their daily communication already happens. The release, which Anthropic describes as a “research preview,” is the company’s latest move to embed its […]
Why AI coding agents aren’t production-ready: Brittle context windows, broken refactors, missing operational awareness
Remember this Quora comment (which also became a meme)? (Source: Quora) In the pre-large language model (LLM) Stack Overflow era, the challenge was discerning which code snippets to adopt and adapt effectively. Now, while generating code has become trivially easy, the more profound challenge lies in reliably identifying and integrating high-quality, enterprise-grade code into production […]
Booking.com’s agent strategy: Disciplined, modular and already delivering 2× accuracy
When many enterprises weren’t even thinking about agentic behaviors or infrastructures, Booking.com had already “stumbled” into them with its homegrown conversational recommendation system. This early experimentation has allowed the company to take a step back and avoid getting swept up in the frantic AI agent hype. Instead, it is taking a disciplined, layered, modular approach […]
Design in the age of AI: How small businesses are building big brands faster
Presented by Design.com For most of history, design was the last step in starting a business — something entrepreneurs invested in once the idea was proven. Today, it’s one of the first. The rise of generative AI has shifted how small businesses imagine, launch, and grow — turning what used to be a months-long creative […]
The ‘truth serum’ for AI: OpenAI’s new method for training models to confess their mistakes
OpenAI researchers have introduced a novel method that acts as a “truth serum” for large language models (LLMs), compelling them to self-report their own misbehavior, hallucinations and policy violations. This technique, “confessions,” addresses a growing concern in enterprise AI: Models can be dishonest, overstating their confidence or covering up the shortcuts they take to arrive […]
GAM takes aim at “context rot”: A dual-agent memory architecture that outperforms long-context LLMs
For all their superhuman power, today’s AI models suffer from a surprisingly human flaw: They forget. Give an AI assistant a sprawling conversation, a multi-step reasoning task or a project spanning days, and it will eventually lose the thread. Engineers refer to this phenomenon as “context rot,” and it has quietly become one of the […]
