Yerevan, Armenia — October 2025.
Robi Labs, an independent AI research company building next-generation multimodal models, today announced MoVi — its groundbreaking Image-to-Video and Text-to-Video generation model. MoVi represents Robi Labs’ latest leap in AI creativity, designed to seamlessly generate synchronized motion and sound directly from natural language or static images.
A Model That Brings Imagination to Motion
MoVi (short for Motion Vision) is a multimodal generative model capable of producing lifelike video clips complete with synchronized audio, speech, and ambient sound — all from a single prompt.
Built on Robi Labs’ proprietary multimodal fusion architecture, MoVi jointly understands visual, textual, and auditory cues, allowing it to generate coherent, emotionally aligned scenes.
“We wanted to build a model that doesn’t just move pixels — it moves feelings,” said Alen Hovhannisyan, Founder and CEO of Robi Labs.“MoVi is our vision of where creativity meets cognition — where text, image, and sound merge to tell stories naturally.”
How It Works
MoVi uses a dual-tower architecture — one for video and one for audio — connected through cross-modal attention layers that ensure perfect temporal alignment. This design allows MoVi to generate both speech-synchronized lip motion and environmental sound without external synchronization tools.
Each generation produces a 5-second cinematic video at up to 24 FPS, supporting multiple aspect ratios such as 16:9, 1:1, and 9:16 — making it ideal for creative, educational, and social formats.
Key Capabilities
🖼️ Image-to-Video: Animate any static image into dynamic, expressive motion.
✍️ Text-to-Video: Generate short, story-driven video scenes from a natural language description.
🔊 Audio-Video Synchronization: Native speech and ambient sound generation in perfect sync with visuals.
🧠 Multimodal Understanding: Joint reasoning across text, image, and audio representations.
🎞️ Resolution Flexibility: Supports HD generation with cinematic motion smoothness.
Coming Soon
MoVi is currently in closed pre-release testing within Robi Labs’ internal research platform. A public demo and API access will be made available later this year via movi.robiai.com.
Developers, artists, and researchers can sign up for early access through the Robi Labs website or follow updates via @robilabs on social media.
About Robi Labs
Robi Labs is an independent AI research company based in Armenia, focused on building human-centric AI systems with local and global impact. Its growing ecosystem includes Lexa (multimodal language model), Picasoe (image generation), Framex (visual synthesis engine), and now MoVi — a step toward full multimodal intelligence that sees, hears, and understands.
For more information, visit robiai.com.
Robi Labs is an independent AI research company creating next-generation models and tools like Lexa, Picasoe, Framex, Echo, Mira, and MoVi. Our mission is to make AI more human-centric, accessible, and impactful for creators, educators, and developers worldwide.
Robi Labs Team
General
Subscribe to our newsletter
Sign up to get the most recent blog articles in your email every week.