OpenCV founders launch AI video startup to tackle OpenAI and Google

OpenCV founders launch AI video startup to tackle OpenAI and Google

Last Updated: November 20, 2025By


A brand new synthetic intelligence startup based by the creators of the world's most widely used computer vision library has emerged from stealth with expertise that generates life like human-centric movies as much as 5 minutes lengthy — a dramatic leap past the capabilities of rivals together with OpenAI's Sora and Google's Veo.

CraftStory, which launched Tuesday with $2 million in funding, is introducing Mannequin 2.0, a video era system that addresses one of the crucial vital limitations plaguing the nascent AI video trade: length. Whereas OpenAI's Sora 2 tops out at 25 seconds and most competing fashions generate clips of 10 seconds or much less, CraftStory's system can produce steady, coherent video performances that run so long as a typical YouTube tutorial or product demonstration.

The breakthrough may unlock substantial business worth for enterprises struggling to scale video manufacturing for coaching, advertising, and buyer training — markets the place temporary AI-generated clips have confirmed insufficient regardless of their visible polish.

"In case you actually attempt to create a video with one in all these video era programs, you discover that numerous the occasions you need to implement a sure artistic imaginative and prescient, and no matter how detailed the directions are, the programs mainly ignore part of your directions," stated Victor Erukhimov, CraftStory's founder and CEO, in an unique interview with VentureBeat. "We developed a system that may generate movies mainly so long as you want them."

How parallel processing solves the long-form video downside

CraftStory's advance rests on what the corporate describes as a parallelized diffusion structure — a basically completely different method to how AI fashions generate video in comparison with the sequential strategies employed by most opponents.

Conventional video era fashions work by operating diffusion algorithms on more and more massive three-dimensional volumes the place time represents the third axis. To generate an extended video, these fashions require proportionally bigger networks, extra coaching information, and considerably extra computational sources.

CraftStory as an alternative runs a number of smaller diffusion algorithms concurrently throughout your entire length of the video, with bidirectional constraints connecting them. "The latter a part of the video can affect the previous a part of the video too," Erukhimov defined. "And that is fairly essential, as a result of should you do it one after the other, then an artifact that seems within the first half propagates to the second, after which it accumulates."

Slightly than producing eight seconds after which stitching on further segments, CraftStory's system processes all 5 minutes concurrently via interconnected diffusion processes.

Crucially, CraftStory skilled its mannequin on proprietary footage reasonably than relying solely on internet-scraped movies. The corporate employed studios to shoot actors utilizing high-frame-rate digital camera programs that seize crisp element even in fast-moving components like fingers — avoiding the movement blur inherent in commonplace 30-frames-per-second YouTube clips.

"What we confirmed is that you just don't want numerous information and also you don't want numerous coaching price range to create top quality movies," Erukhimov stated. "You simply want top quality information."

Mannequin 2.0 presently operates as a video-to-video system: customers add a nonetheless picture to animate and a "driving video" containing an individual whose actions the AI will replicate. CraftStory offers preset driving movies shot with skilled actors, who obtain income shares when their movement information is used, or customers can add their very own footage.

The system generates 30-second clips at low decision in roughly quarter-hour. A sophisticated lip-sync system synchronizes mouth actions to scripts or audio tracks, whereas gesture alignment algorithms guarantee physique language matches speech rhythm and emotional tone.

Combating a battle chest battle with $2 million in opposition to billions

CraftStory's funding comes nearly completely from Andrew Filev, who bought his undertaking administration software program firm Wrike to Citrix for $2.25 billion in 2021 and now runs Zencoder, an AI coding firm. The modest increase stands in stark distinction to the billions flowing into competing efforts — OpenAI has raised over $6 billion in its newest funding spherical alone.

Erukhimov pushed again on the notion that huge capital is prerequisite for achievement. "I don't essentially purchase the thesis that compute is the trail to success," he stated. "It positively helps you probably have compute. However should you increase a billion {dollars} on a PowerPoint, in the long run, nobody is completely satisfied, neither the founders nor the buyers."

Filev defended the David-versus-Goliath method. "Once you put money into startups, you're basically betting on folks," he stated in an interview with VentureBeat. "To paraphrase Margaret Mead: by no means underestimate what a small group of considerate, dedicated engineers and scientists can construct."

He argued that CraftStory advantages from a targeted technique. "The massive labs are in an arms race to construct general-purpose video basis fashions," Filev stated. "CraftStory is driving that wave and going very deep into a selected format: long-form, participating, human-centric video."

Why laptop imaginative and prescient experience issues in generative AI video

Erukhimov's credibility stems from his deep roots in laptop imaginative and prescient reasonably than the transformer architectures which have dominated current AI advances. He was an early contributor to OpenCV — the Open Supply Pc Imaginative and prescient Library that has grow to be the de facto commonplace for laptop imaginative and prescient functions, with over 84,000 stars on GitHub.

When Intel diminished its assist for OpenCV within the mid-2000s, Erukhimov co-founded Itseez with the specific objective of sustaining and advancing the library. The corporate expanded OpenCV considerably and pivoted towards automotive security programs earlier than Intel acquired it in 2016.

Filev stated this background is exactly what makes Erukhimov well-positioned for video era. "What folks generally miss is that generative AI video isn't simply concerning the generative half. It's about understanding movement, facial dynamics, temporal coherence, and the way people truly transfer," Filev stated. "Victor has spent his profession mastering precisely these issues."

Enterprise focus targets coaching movies and product demos

Whereas a lot of the general public pleasure round AI video era has centered on artistic instruments for customers, CraftStory is pursuing a decidedly enterprise-focused technique.

"We’re positively excited about B2B greater than shopper," Erukhimov stated. "We're excited about firms, particularly software program firms, having the ability to make cool coaching movies and product movies and launch movies."

The logic is easy: company coaching, product tutorials, and buyer training movies typically run a number of minutes and require constant high quality all through. A ten-second AI clip can’t successfully exhibit easy methods to use enterprise software program or clarify a posh product characteristic.

"In case you want a longer-form video, then you must go together with us," Erukhimov stated. "We are able to create as much as 5 minutes, constant video, top quality."

Filev echoed this evaluation. "One enormous hole on this market is the dearth of fashions that may generate constant movies over longer sequences — and that's extraordinarily essential for real-world use," he stated. "In case you're making a business on your firm, a 10-second video, irrespective of how good it seems to be, simply isn't sufficient. You want 30 seconds, you want two minutes — you want extra."

The corporate anticipates value financial savings for patrons. Filev prompt that "a small enterprise proprietor may create content material in minutes that beforehand would have value $20,000 and brought two months to provide."

CraftStory can be courting artistic businesses that produce video content material for company shoppers, with the worth proposition centered on value and velocity: businesses can file an actor on digital camera and remodel that footage right into a completed AI video, reasonably than managing costly multi-day shoots.

The subsequent main improvement on CraftStory's roadmap is a text-to-video mannequin that might enable customers to generate long-form content material immediately from scripts. The workforce can be growing assist for moving-camera situations, together with the favored "walk-and-talk" format frequent in high-end promoting.

The place CraftStory suits in a fragmented aggressive panorama

CraftStory enters a crowded and quickly evolving market. OpenAI's Sora 2, whereas not but publicly obtainable, has generated vital buzz. Google's Veo models are advancing rapidly. Runway, Pika, and Stability AI all provide video era instruments with completely different capabilities.

Erukhimov acknowledged the aggressive stress however emphasised that CraftStory serves a definite area of interest targeted on human-centric movies. He positioned speedy innovation and market seize as the corporate's major technique reasonably than counting on technical moats.

Filev sees the market fragmenting into distinct layers, with massive tech firms serving as "API suppliers of highly effective, general-purpose era fashions" whereas specialised gamers like CraftStory deal with particular use instances. "If the massive gamers are constructing the engines, CraftStory is constructing the manufacturing studio and meeting line on prime," he stated.

Mannequin 2.0 is on the market now at app.craftstory.com/model-2.0, with the corporate providing early entry to customers and enterprises concerned about testing the expertise. Whether or not a calmly funded startup can seize significant market share in opposition to deep-pocketed incumbents stays unsure, however Erukhimov is characteristically assured concerning the alternative forward.

"AI-generated video will quickly grow to be the first means firms talk their tales," he stated.


Source link

Leave A Comment

you might also like