The Year of the Agent
If there's one word that defined AI in 2025, it's "agentic." The industry's focus shifted decisively from models that respond to prompts toward systems that can take autonomous action. As Andrej Karpathy put it, this will be the decade of AI agents. The numbers back him up: according to G2's Enterprise AI Agents Report, 57% of companies now have AI agents running in production, not just in experiments or pilots but doing actual work.
The shift matters because agents represent a fundamentally different value proposition than chatbots. A chatbot answers questions; an agent researches a topic across multiple sources, synthesises findings, drafts a report, and distributes it to stakeholders, all from a single instruction. The human provides the goal; the AI figures out the steps. This is closer to how we delegate to competent colleagues than how we use traditional software.
Gartner reported a staggering 1,445% surge in enquiries about multi-agent systems between early 2024 and mid-2025. Rather than deploying one massive model to handle everything, leading organisations are implementing "puppeteer" architectures: orchestrators that coordinate teams of specialised agents, each optimised for a particular task. One agent handles research, another writes, another reviews, another takes action. The approach is more reliable and often more cost-effective than trying to build a single system that does everything.
This doesn't mean agents are ready for every application. Unpredictable behaviour remains a fundamental challenge. The same agent may produce different results for identical inputs due to the non-deterministic nature of the underlying models. For mission-critical applications where consistency is paramount, this variability creates real problems. Most organisations are deploying agents for tasks where some variability is acceptable, keeping humans in the loop for high-stakes decisions.
Reasoning Models Come of Age
OpenAI kicked off the "reasoning revolution" in late 2024 with o1, but 2025 was the year reasoning became a standard feature across the industry. The concept is deceptively simple: instead of generating an immediate response, the model first reasons through the problem step by step, then provides its answer. This "thinking time" dramatically improves performance on complex tasks: mathematical proofs, multi-step logic problems, nuanced analysis, and sophisticated coding challenges.
OpenAI doubled down with o3 and o3-mini early in 2025, followed by GPT-5 in August and GPT-5.1 in late autumn. The progression demonstrated how quickly the technology is advancing: GPT-5.1 introduced adaptive reasoning, dynamically adjusting thinking time based on task complexity. Simple questions get quick answers; hard problems get extended analysis. By December, GPT-5.2 emphasised long-context processing, "reasoning tokens," and agent workflows, clearly positioning reasoning as central to how AI systems will work going forward.
Anthropic's response came in May with Claude 4, branded as Claude Opus 4 and Claude Sonnet 4. Opus 4 was positioned as the flagship for sustained autonomous reasoning, capable of working on complex problems for up to seven hours, while Sonnet 4 offered a cost-effective option for general-purpose use. The focus on extended reasoning aligns with the agentic trend: agents need models that can think through multi-step problems, not just generate quick responses.
Google's Gemini 3 Pro entered the competition strongly, achieving an unprecedented 1501 Elo score on LMArena, the first model to break the 1500 barrier. Across benchmarks, the picture is one of tight competition: GPT-5.2, Claude 4.5 Sonnet, and Gemini 3 trade positions depending on the specific task, with each excelling in different areas. For practitioners, this means the choice of model matters less than it did a year ago. Any of the frontier models can handle most tasks competently. The question has shifted from "which model is best?" to "how do we integrate LLMs reliably with our data, manage costs, and ensure safety?"
The Model Context Protocol
Perhaps the most underappreciated development of 2025 was the rise of the Model Context Protocol (MCP). Running an MCP server has become almost as common as running a web server among teams building AI applications. The protocol provides a standardised way for AI models to access external tools, data sources, and capabilities, and its widespread adoption is solving one of the most practical problems in AI deployment.
Before MCP, every integration between an AI system and an external tool required custom development. Want your AI to access your CRM? Build a custom integration. Want it to query your database? Build another one. Want it to use a third-party API? Another integration. This approach was expensive, fragile, and didn't scale. MCP provides a common interface that any compliant tool can implement, dramatically reducing the integration burden.
The practical implication is that organisations can now assemble AI capabilities more like building with standardised components than crafting bespoke systems. A company might use one MCP server for database access, another for email, another for calendar, and another for their specific business applications. The AI system connects to all of them through the same protocol, and the team can swap out or upgrade individual components without rebuilding everything.
This matters because, as one industry observer noted, "Most organisations aren't agent-ready. The exciting work is exposing the APIs that you have in your enterprises today." MCP is the infrastructure that makes that possible. It's not glamorous technology, but it's the kind of standardisation that enables ecosystems to flourish.
Coding Agents Transform Development
The impact of AI on software development deserves special attention because it's where the agent revolution has advanced furthest. 2025 saw an explosion of coding agents: tools that don't just suggest code completions but autonomously implement features, fix bugs, and modify codebases based on high-level instructions.
The quiet release of Claude Code in February may have been the most impactful single event of the year for developers. Windsurf, Void Editor, GitHub Copilot's agent mode, CodeRabbit, Gemini CLI, and OpenAI's Codex followed or evolved. These aren't autocomplete tools; they're autonomous agents that understand your codebase, can reason about what changes are needed, and implement those changes across multiple files with appropriate tests.
The benchmarks reflect this progress. Claude 4.5 Sonnet achieved 77.2% on SWE-bench Verified, meaning it successfully resolved that proportion of real GitHub issues from actual repositories, with actual test suites that had to pass. It became the first model to crack the 60% barrier on Terminal-Bench 2.0, a particularly challenging benchmark for real-world coding tasks. These aren't toy problems; they're the kind of issues that professional developers spend their days solving.
For development teams, this changes the economics and workflows of software creation. The most effective pattern we're seeing isn't full autonomy but what might be called "pair programming with an agent." The human provides direction, reviews outputs, handles the genuinely novel problems, and makes architectural decisions. The agent handles implementation, boilerplate, testing, and iteration on feedback. Neither could work as effectively alone, but together they're dramatically more productive than traditional approaches.
Economics and Accessibility
One of the most important trends of 2025 was the dramatic reduction in AI costs. The cost of generating a response from a frontier model has dropped by a factor of 1,000 over the past two years, bringing it roughly in line with the cost of a basic web search. In June 2025, OpenAI slashed o3 input costs by 80%, bringing prices to $2 per million input tokens and $8 per million output tokens. These levels would have seemed fantastical even in early 2024.
This cost reduction is transformative because it changes what's economically viable. Use cases that couldn't possibly justify the expense a year ago, such as processing every customer email with AI analysis or running continuous AI monitoring of systems, now make financial sense. The barrier to experimentation has essentially disappeared; any organisation can try AI approaches to their problems without significant financial risk.
The flip side is that cost is no longer a meaningful differentiator between approaches. The conversation has shifted from "can we afford to do this with AI?" to "how do we do this with AI well?" The constraints are now around integration, data quality, reliability, and organisational readiness, not the cost of the AI capabilities themselves.
Enterprise Adoption Reality
The statistics on enterprise AI adoption in 2025 are striking. IBM and Morning Consult surveyed 1,000 developers building AI applications for enterprises: 99% reported they are exploring or developing AI agents. Not experimenting casually, but actively building. Gartner predicts that 40% of enterprise applications will embed AI agents by end of 2026, up from less than 5% in 2025. A McKinsey Global Survey found organisations adopting agentic AI seeing up to 20% gains in operational efficiency.
These numbers describe a fundamental shift in how organisations think about software and automation. AI isn't a separate initiative or an experimental project anymore. It's becoming a default consideration in how systems are designed and processes are structured. The question isn't "should we use AI?" but "where should we use AI first, and how do we do it well?"
That said, there's a gap between interest and capability. Most organisations aren't truly "agent-ready." They lack the instrumented APIs that agents need to take action, the data quality that AI systems require, and the organisational processes to manage AI reliability. The companies seeing the best results are those that invested in these foundations before trying to deploy advanced AI capabilities. The lesson for 2026 is clear: the exciting work isn't just building agents, it's building the infrastructure that makes agents useful.
What This Means for 2026
Looking ahead, several trends seem likely to shape the next year. Agent capabilities will continue advancing, but the focus will shift toward reliability and enterprise-readiness. Organisations that built foundations in 2025 (good APIs, clean data, integration infrastructure) will be positioned to move faster. Those that didn't will spend 2026 catching up.
Multi-agent architectures will become more sophisticated and more common. The pattern of specialised agents coordinated by orchestrators offers better reliability and easier maintenance than monolithic systems. Expect to see more standardisation around how agents communicate and collaborate.
The distinction between "AI companies" and "companies using AI" will continue to blur. As AI capabilities become table stakes, competitive advantage will come from how effectively organisations integrate AI into their specific workflows and value propositions, not from using AI at all.
For organisations planning their AI strategy, the message from 2025 is encouraging: the technology works, the costs are manageable, and the patterns for successful deployment are becoming clear. The challenge isn't whether AI can help; it's building the capability to capture that help effectively. That's a problem of execution, not possibility, and execution is something good organisations know how to do.