OpenAI deploys Cerebras chips for ‘near-instant’ code generation in first major move beyond Nvidia

OpenAI on Thursday launched GPT-5.3-Codex-Spark, a stripped-down coding model engineered for near-instantaneous response times, marking the company’s first significant inference partnership outside its traditional Nvidia-dominated infrastructure. The model runs on hardware from Cerebras Systems, a Sunnyvale-based chipmaker whose wafer-scale processors specialize in low-latency AI workloads. The partnership arrives at a pivotal moment for OpenAI. The […]

MIT’s new fine-tuning method lets LLMs learn new skills without losing old ones

When enterprises fine-tune LLMs for new tasks, they risk breaking everything the models already know. This forces companies to maintain separate models for every skill. Researchers at MIT, the Improbable AI Lab and ETH Zurich have developed a new technique that enables large language models to learn new skills and knowledge without forgetting their past […]

‘Observational memory’ cuts AI agent costs 10x and outscores RAG on long-context benchmarks

RAG isn’t always fast enough or intelligent enough for modern agentic AI workflows. As teams move from short-lived chatbots to long-running, tool-heavy agents embedded in production systems, those limitations are becoming harder to work around. In response, teams are experimenting with alternative memory architectures — sometimes called contextual memory or agentic memory — that prioritize […]