z.ai debuts quicker, cheaper GLM-5 Turbo mannequin for brokers and 'claws' — nevertheless it's not open-source Chinese language AI startup Z.ai, identified for its highly effective, open supply GLM household of enormous language fashions (LLMs), has introduced

z.ai debuts quicker, cheaper GLM-5 Turbo mannequin for brokers and 'claws' — nevertheless it's not open-source

Last Updated: March 17, 2026By carl.franzen@venturebeat.com (Carl Franzen)

Chinese language AI startup Z.ai, identified for its highly effective, open supply GLM household of enormous language fashions (LLMs), has introduced GLM-5-Turbo, a brand new, proprietary variant of its open supply GLM-5 mannequin aimed toward agent-driven workflows, with the corporate positioning it as a quicker mannequin tuned for OpenClaw-style tasks akin to device use, long-chain execution and protracted automation.

It's out there now by way of Z.ai's software programming interface (API) on third-party supplier OpenRouter with roughly a 202.8K-token context window, 131.1K max output, and listed pricing of $0.96 per million enter tokens and $3.20 per million output tokens. That makes it about $0.04 cheaper per whole enter and output value (at 1 million tokens) than its predecessor, in keeping with our calculations.

Mannequin	Enter	Output	Whole Value	Supply
Grok 4.1 Quick	$0.20	$0.50	$0.70	xAI
Gemini 3 Flash	$0.50	$3.00	$3.50	Google
Kimi-K2.5	$0.60	$3.00	$3.60	Moonshot
GLM-5-Turbo	$0.96	$3.20	$4.16	OpenRouter
GLM-5	$1.00	$3.20	$4.20	Z.ai
Claude Haiku 4.5	$1.00	$5.00	$6.00	Anthropic
Qwen3-Max	$1.20	$6.00	$7.20	Alibaba Cloud
Gemini 3 Professional	$2.00	$12.00	$14.00	Google
GPT-5.2	$1.75	$14.00	$15.75	OpenAI
GPT-5.4	$2.50	$15.00	$17.50	OpenAI
Claude Sonnet 4.5	$3.00	$15.00	$18.00	Anthropic
Claude Opus 4.6	$5.00	$25.00	$30.00	Anthropic
GPT-5.4 Professional	$30.00	$180.00	$210.00	OpenAI

Second, Z.ai can also be including the mannequin to its GLM Coding subscription product, which is its packaged coding assistant service. That service has three tiers: Lite at $27 per quarter, Professional at $81 per quarter, and Max at $216 per quarter.

Z.ai’s March 15 rollout observe says Professional subscribers get GLM-5-Turbo in March, whereas Lite subscribers get the bottom GLM-5 in March and should wait till April for GLM-5-Turbo. The corporate can also be taking early-access applications for enterprises via a Google Form, which suggests some customers might get entry forward of that schedule relying on capability.

z.ai describes GLM-5-Turbo as designed for “quick inference” and “deeply optimized for real-world agent workflows involving lengthy execution chains,” with enhancements in complicated instruction decomposition, device use, scheduled and protracted execution, and stability throughout prolonged duties.

The discharge gives builders a brand new choice for constructing OpenClaw-style autonomous AI brokers, and serves as a sign about the place mannequin distributors suppose enterprise demand is heading: away from chat interfaces and towards programs that may reliably execute multi-step work.

That’s now the place a lot of the competitors is transferring, as nicely, particularly amongst distributors attempting to win builders and enterprise groups constructing inside assistants, workflow orchestrators and coding brokers.

Constructed for execution, not simply dialog

Z.ai’s supplies body GLM-5-Turbo as a mannequin for production-like agent conduct slightly than static prompt-response use.

The pitch facilities on reliability in sensible job flows: higher command following, stronger device invocation, improved dealing with of scheduled and protracted duties, and quicker execution throughout longer logical chains. That positioning places the mannequin squarely out there for brokers that do greater than reply questions.

It’s aimed toward programs that may collect data, name instruments, break down directions and hold working by way of complicated job sequences with much less supervision.

Reasonably than an easy successor to GLM-5, GLM-5-Turbo seems to be a extra execution-focused variant: tuned for pace, device use and long-chain agent stability, whereas the bottom GLM-5 stays Z.ai’s broader open-source flagship.

GLM-5-Turbo seems particularly aggressive in OpenClaw eventualities akin to data search and gathering, workplace and each day duties, knowledge evaluation, improvement and operations, and automation. These are company-supplied supplies, not unbiased validation, however they make the meant product positioning clear.

Background: z.ai and GLM-5 set the stage for Turbo

Based in 2019 as a Tsinghua College spinoff in Beijing, Z.ai — previously Zhipu AI — is now certainly one of China’s best-known basis mannequin firms. The corporate stays headquartered in Beijing and is led by CEO Zhang Peng

Z.ai listed on the Hong Kong Inventory Alternate on January 8, 2026, with shares priced at HK$116.20 and opening at HK$120, for a acknowledged market capitalization of HK$52.83 billion, making it China’s largest unbiased massive language mannequin developer.

As of September 30, 2025 its fashions had reportedly been utilized by greater than 12,000 enterprise prospects, greater than 80 million end-user gadgets and greater than 45 million builders worldwide.

Z.ai’s last major release, GLM-5, which debuted in February 2026, provides helpful context for what the corporate is now attempting to do with GLM-5-Turbo.

GLM-5 is an open-source flagship mannequin carrying an MIT license, posting a record-low hallucination rating on the AA-Omniscience Index, and debuted a local “Agent Mode” that might flip prompts or supply supplies into ready-to-use .docx, .pdf and .xlsx information.

That earlier launch was additionally framed as a serious technical step up for the corporate. GLM-5 scaled to 744 billion parameters with 40 billion lively per token in a mixture-of-experts structure, used 28.5 trillion pretraining tokens, and relied on a brand new asynchronous reinforcement-learning infrastructure referred to as “slime” to scale back coaching bottlenecks and assist extra complicated agentic conduct.

In that gentle, GLM-5-Turbo appears much less like a substitute for GLM-5 than a narrower business offshoot: a variant that retains the long-context, agentic orientation of the flagship line however emphasizes pace, stability and execution in real-world agent chains.

Developer options and mannequin packaging

On the technical facet, Z.ai has been packaging the GLM-5 household with the sorts of capabilities builders now count on from critical agent-facing fashions, together with lengthy context dealing with, instruments, reasoning assist and structured integrations.

OpenRouter’s GLM-5-Turbo web page lists assist for instruments, device alternative and response formatting, whereas additionally surfacing stay efficiency knowledge together with common throughput and latency.

OpenRouter’s supplier telemetry provides a helpful deployment-level comparability between GLM-5 and GLM-5-Turbo, although the info will not be completely apples-to-apples as a result of GLM-5 seems throughout a number of suppliers whereas GLM-5-Turbo is proven solely by way of Z.ai.

On throughput, GLM-5-Turbo averages 48 tokens per second on OpenRouter, which places it under the quickest GLM-5 endpoints proven within the screenshots, together with Fireworks at 70 tok/s and Friendli at 58 tok/s, however above Collectively’s 40 tok/s.

On uncooked first-token latency, GLM-5-Turbo is slower within the out there knowledge, posting 2.92 seconds versus 0.41 seconds for Friendli’s GLM-5 endpoint, 1.00 second for Parasail and 1.08 seconds for DeepInfra.

However the image improves on end-to-end completion time: GLM-5-Turbo is proven at 8.16 seconds, quicker than the GLM-5 endpoints, which vary from 9.34 seconds on Fireworks to 11.23 seconds on DeepInfra.

Probably the most notable operational benefit is in device reliability. GLM-5-Turbo exhibits a 0.67% device name error fee, materially decrease than the GLM-5 suppliers proven, the place error charges vary from 2.33% to six.41%.

For enterprise groups, that means a mannequin that won’t win on preliminary responsiveness in its present OpenRouter routing, however may nonetheless be higher suited to longer agent runs the place completion stability and decrease device failure matter greater than the quickest first token.

Benchmarking and pricing

A ZClawBench radar chart launched by z.ai exhibits GLM-5-Turbo as particularly aggressive in OpenClaw eventualities akin to data search and gathering, workplace and each day duties, knowledge evaluation, improvement and operations, and automation.

These are company-supplied benchmark visuals, not unbiased validation, however they do assist clarify how Z.ai needs the 2 fashions understood: GLM-5 because the broader coding and open flagship, and Turbo because the extra focused agent-execution variant.

A extra nuanced licensing sign

One notable caveat is licensing. Z.ai says GLM-5-Turbo is at present closed-source, nevertheless it additionally says the mannequin’s capabilities and findings will probably be folded into its subsequent open-source mannequin launch. That is a vital distinction. The corporate will not be clearly promising to open-source GLM-5-Turbo itself.

As an alternative, it’s saying that classes, strategies and enhancements from this launch will inform a future open mannequin. That makes the launch extra nuanced than a clear break from openness.

Z.ai’s earlier GLM technique leaned closely on open releases and open-weight distribution, which helped it construct visibility amongst builders.

China’s AI market could also be rebalancing away from open supply

GLM-5-Turbo’s licensing posture additionally lands in a wider Chinese language market context that makes the launch extra notable than a easy product replace.

In latest weeks, reporting round Alibaba’s Qwen unit has raised recent questions on how China’s main AI labs will stability open releases with business stress.

Earlier this month, Qwen division head Lin Junyang stepped down, changing into the third senior Qwen government to depart in 2026, although Alibaba’s Qwen household stays some of the prolific open-model efforts anyplace, with greater than 400 open-source fashions launched since 2023 and greater than 1 billion downloads.

Reuters then reported on March 16 that Alibaba CEO Eddie Wu would take direct control of a newly shaped AI-focused enterprise group consolidating Qwen and different models, amid scrutiny over technique, profitability and the brutal worth competitors surrounding open-model choices in China.

Even with out overstating these developments, they assist body the broader query hanging over the sector: whether or not the economics of frontier AI are beginning to push even traditionally open-leaning Chinese language labs towards a extra segmented technique.

That doesn’t imply Chinese language labs are abandoning open supply. However the sample is changing into more durable to disregard: open fashions assist drive adoption, developer goodwill and ecosystem attain, whereas sure high-value variants aimed toward enterprise brokers, coding workflows and different commercially engaging use circumstances might more and more arrive first as proprietary merchandise.

In that sense, GLM-5-Turbo matches a bigger doable shift in China’s AI market, one that appears more and more just like the playbook utilized by OpenAI, Anthropic and Google within the U.S.: openness as distribution, proprietary programs as enterprise.

Seen in that gentle, GLM-5-Turbo appears like greater than a speed-focused product replace. It could be one other signal that components of China’s AI sector are transferring towards the identical hybrid mannequin already frequent within the U.S.: openness as distribution, proprietary programs as enterprise.

That might not mark the top of open-source AI from Chinese language labs, nevertheless it may imply their most strategically vital agent-focused choices seem first behind closed entry, even when a few of their underlying advances later make their means into open releases.

For builders evaluating agent platforms, that makes GLM-5-Turbo each a product launch and a helpful sign. Z.ai remains to be talking the language of open fashions. However with this launch, it is usually displaying that a few of its most commercially related work might arrive first as proprietary infrastructure for enterprise-grade agent programs.

Source link

latest video

latest pick

you might also like

Technology
The billionaires made a promise — now some need out
In 2010, Warren Buffett and Invoice Gates launched a disarmingly [...]

read more
Technology
Step aboard NASA’s imminent moon mission and comply with the crew daily
NASA lately introduced that it’s targeting April 1 for the [...]

read more
Technology
Playdate video games to take a look at earlier than the Catalog’s 3-year anniversary sale ends
In case your Playdate wishlist is something like mine (countless), [...]

read more
Technology
Right this moment’s NYT Connections Hints, Solutions for March 16 #1009
In search of the most up-to-date Connections solutions? Click here [...]

read more
Technology
Methods to keep away from getting scammed on-line in 2026
If there’s something last year’s Qantas data breach has taught [...]

read more
Technology
ByteDance reportedly pauses world launch of its Seedance 2.0 video generator
ByteDance has paused plans to launch its new AI video [...]

read more
Technology
Adobe to supply customers free companies $75 million over hard-to-cancel subscription mess
Adobe has agreed to a $150 million settlement to resolve [...]

read more
Technology
Constructing the builders: Inside Xavor’s AI-native expertise engine
Xavor Vibe Coding Bootcamp, Jan 2026 Version Firms are spending closely [...]

read more
Technology
Fixing AI failure: Three modifications enterprises ought to make now
Latest reports about AI mission failure charges have raised uncomfortable [...]

read more
Technology
Anthropic is doubling Claude’s utilization limits throughout off-peak hours for the following two weeks
To capitalize on Claude’s recent spike in reputation, Anthropic is [...]

read more