Cursor’s new coding mannequin Composer 2 is right here: It beats Claude Opus 4.6 however nonetheless trails GPT-5.4
Cursor, a San Francisco AI coding platform from startup Anysphere valued at $29.3 billion, has launched Composer 2, a brand new in-house coding mannequin now out there inside its agentic AI coding surroundings, and it gives drastically improved benchmarks from its prior in-house mannequin.
It's additionally launching and making Composer 2 Quick, a higher-priced however sooner variant, the default expertise for customers.
Right here's the price breakdown:
-
Composer 2 Normal: $0.50/$2.50 per 1 million enter/output tokens
-
Composer 2 Quick: at $1.50/$7.50 per 1 million enter/output tokens
That's a giant drop from Cursor's predecessor in-house mannequin, Composer 1.5, from February, which price $3.50 per million enter tokens and $17.50 per million output tokens; Composer 2 is about 86% cheaper on each counts.
Composer 2 Quick can be roughly 57% cheaper than Composer 1.5.
There's additionally reductions for "cache-read pricing," that’s, sending a number of the similar tokens in a immediate to the mannequin once more, of $0.20 per million tokens for Composer 2 and $0.35 per million for Composer 2 Quick, versus $0.35 per million for Composer 1.5.
It additionally issues that this seems to be a Cursor-native launch, not a broadly distributed standalone mannequin. Within the firm’s announcement and mannequin documentation, Composer 2 is described as out there in Cursor, tuned for Cursor’s agent workflow and built-in with the product’s device stack.
The supplies offered don’t point out separate availability by way of exterior mannequin platforms or as a general-purpose API exterior the Cursor surroundings.
Cursor is pitching long-horizon coding, not simply higher completions
The deeper technical declare on this launch will not be merely that Composer 2 scores increased than Composer 1.5. It’s that Cursor says the mannequin is healthier suited to long-horizon agentic coding.
In its weblog, Cursor says the standard features come from its first continued pretraining run, which gave it a stronger base for scaled reinforcement studying. From there, the corporate says it educated Composer 2 on long-horizon coding duties and that the mannequin can resolve issues requiring tons of of actions.
That framing is necessary as a result of it addresses one of many greatest unresolved points in coding AI. Many fashions are good at remoted code era. Far fewer stay dependable throughout an extended workflow that features studying a repository, deciding what to vary, modifying a number of information, operating instructions, decoding failures and persevering with towards a aim.
Cursor’s documentation reinforces that that is the use case it cares about. It describes Composer 2 as an agentic mannequin with a 200,000-token context window, tuned for device use, file edits and terminal operations inside Cursor.
It additionally notes coaching methods similar to self-summarization for long-running duties. For builders already utilizing Cursor as their essential surroundings, that tighter tuning might matter greater than a generic leaderboard declare.
The benchmark features are substantial, even when GPT-5.4 nonetheless leads on one key chart
Cursor’s revealed outcomes present a transparent enchancment over prior Composer fashions. The corporate lists Composer 2 at 61.3 on CursorBench, 61.7 on Terminal-Bench 2.0, and 73.7 on SWE-bench Multilingual.
That compares with Composer 1.5 at 44.2, 47.9 and 65.9, and Composer 1 at 38.0, 40.0 and 56.9.
The discharge is extra measured than some mannequin launches as a result of Cursor will not be claiming common management.
On Terminal-Bench 2.0, which measures how properly an AI agent performs duties in command line terminal-style interfaces, GPT-5.4 nonetheless leads at 75.1, whereas Composer 2 scores 61.7, forward of Opus 4.6 at 58.0, Opus 4.5 at 52.1 and Composer 1.5 at 47.9.
That makes Cursor’s pitch extra pragmatic and arguably extra helpful for consumers. The corporate will not be saying Composer 2 is the only finest mannequin at the whole lot. It’s saying the mannequin has moved right into a extra aggressive high quality tier whereas providing extra enticing economics and stronger integration with the product builders are already utilizing.
Cursor additionally included a performance-versus-cost chart on its CursorBench benchmarking suite that seems designed to make a Pareto-style argument for Composer 2.
In that graphic, Composer 2 sits at a stronger cost-to-performance level than Composer 1.5 and compares favorably with higher-cost GPT-5.4 and Opus 4.6 settings proven by Cursor. The corporate’s message will not be merely that Composer 2 scores increased than its predecessor, however that it might provide a extra environment friendly cost-to-intelligence tradeoff for on a regular basis coding work inside Cursor.
Why the “locked to Cursor” level issues for consumers
For readers deciding whether or not to make use of Composer 2, a very powerful query is probably not benchmark efficiency alone. It could be whether or not they need a mannequin optimized for Cursor’s personal product expertise.
That may be a energy. Based on the documentation, Composer 2 can entry Cursor’s agent device stack, together with semantic code search, file and folder search, file reads, file edits, shell instructions, browser management and net entry.
That type of integration could be extra worthwhile than uncooked mannequin high quality if the aim is to finish actual software program duties somewhat than produce spectacular one-shot solutions.
Nevertheless it additionally narrows the addressable viewers. Groups on the lookout for a mannequin they’ll deploy broadly throughout a number of exterior instruments and platforms ought to acknowledge that Cursor is presenting Composer 2 as a mannequin for Cursor customers, not as a usually out there standalone basis mannequin.
The larger image: Cursor is making an operational argument
The importance of Composer 2 will not be that Cursor has out of the blue taken the highest spot on each coding benchmark. It has not. The extra necessary level is that Cursor is making an operational argument: its mannequin is getting higher, its pricing is low sufficient to encourage broader use, and its sooner tier is responsive sufficient that the corporate is snug making it the default regardless of the upper price.
That mixture might resonate with engineering groups that more and more care much less about summary mannequin status and extra about whether or not an assistant can keep helpful throughout lengthy coding classes with out turning into prohibitively costly.
Cursor’s broader pricing structure helps body the aggressive stress round this launch. On its present pricing web page, Cursor gives a free Passion tier, a Professional plan at $20 per thirty days, Professional+ at $60 per thirty days, and Extremely at $200 per thirty days for particular person customers, with increased tiers providing extra utilization throughout fashions from OpenAI, Anthropic and Google.
On the enterprise aspect, Groups prices $40 per person per thirty days, whereas Enterprise is custom-priced and provides pooled utilization, centralized billing, utilization analytics, privateness controls, SSO, audit logs and granular admin controls. In different phrases, Cursor isn’t just charging for entry to a coding mannequin. It’s charging for a managed software layer that sits on high of a number of mannequin suppliers whereas including group options, governance and workflow tooling.
That mannequin is more and more below stress as first-party AI firms push deeper into coding itself. OpenAI and Anthropic are now not simply promoting fashions by way of third-party merchandise; they’re additionally delivery their very own coding interfaces, brokers and analysis frameworks — similar to Codex and Claude Code — elevating the query of how a lot room stays for an middleman platform.
Commenters on X, whereas unverified and never essentially consultant of the broader market, have more and more described shifting from Cursor to Anthropic’s Claude Code, particularly amongst energy customers drawn to terminal-first workflows, longer-running agent habits and decrease perceived overhead.
A few of these posts describe frustration with Cursor’s pricing, context loss or editor-centric expertise, whereas praising Claude Code as a extra direct and absolutely agentic method to work. Even handled cautiously, that type of social chatter factors to the strategic drawback Cursor faces: it has to show that its built-in platform, group controls and now its personal in-house fashions add sufficient worth to justify sitting between builders and the mannequin makers’ more and more succesful coding merchandise.
That makes Composer 2 strategically necessary for Cursor.
By providing a less expensive in-house mannequin than Composer 1.5, tuning it tightly to Cursor’s personal device stack and making a sooner model the default, the corporate is making an attempt to indicate that it supplies greater than a wrapper round exterior methods.
The problem is that as first-party coding merchandise enhance, builders and enterprise consumers might more and more ask whether or not they need a separate AI coding platform in any respect, or whether or not the mannequin makers’ personal instruments have gotten ample on their very own.
Source link
latest video
latest pick
news via inbox
Nulla turp dis cursus. Integer liberos euismod pretium faucibua














