Press

Robi Labs Unveils MoVi — A New Era of Multimodal Video Generation

Robi Labs Team

Oct 10, 2025

10 Min Read

MoVi (short for Motion Vision) is a multimodal generative model capable of producing lifelike video clips complete with synchronized audio, speech, and ambient sound — all from a single prompt. Built on Robi Labs’ proprietary multimodal fusion architecture, MoVi jointly understands visual, textual, and auditory cues, allowing it to generate coherent, emotionally aligned scenes.

Robi Labs Unveils MoVi — A New Era of Multimodal Video Generation

Yerevan, Armenia — October 2025.

Robi Labs, an independent AI research company building next-generation multimodal models, today announced MoVi — its groundbreaking Image-to-Video and Text-to-Video generation model. MoVi represents Robi Labs’ latest leap in AI creativity, designed to seamlessly generate synchronized motion and sound directly from natural language or static images.

A Model That Brings Imagination to Motion

MoVi (short for Motion Vision) is a multimodal generative model capable of producing lifelike video clips complete with synchronized audio, speech, and ambient sound — all from a single prompt.

Built on Robi Labs’ proprietary multimodal fusion architecture, MoVi jointly understands visual, textual, and auditory cues, allowing it to generate coherent, emotionally aligned scenes.

“We wanted to build a model that doesn’t just move pixels — it moves feelings,” said Alen Hovhannisyan, Founder and CEO of Robi Labs.“MoVi is our vision of where creativity meets cognition — where text, image, and sound merge to tell stories naturally.”

How It Works

MoVi uses a dual-tower architecture — one for video and one for audio — connected through cross-modal attention layers that ensure perfect temporal alignment. This design allows MoVi to generate both speech-synchronized lip motion and environmental sound without external synchronization tools.

Each generation produces a 5-second cinematic video at up to 24 FPS, supporting multiple aspect ratios such as 16:9, 1:1, and 9:16 — making it ideal for creative, educational, and social formats.

Key Capabilities

🖼️ Image-to-Video: Animate any static image into dynamic, expressive motion.
✍️ Text-to-Video: Generate short, story-driven video scenes from a natural language description.
🔊 Audio-Video Synchronization: Native speech and ambient sound generation in perfect sync with visuals.
🧠 Multimodal Understanding: Joint reasoning across text, image, and audio representations.
🎞️ Resolution Flexibility: Supports HD generation with cinematic motion smoothness.

Coming Soon

MoVi is currently in closed pre-release testing within Robi Labs’ internal research platform. A public demo and API access will be made available later this year via movi.robiai.com.

Developers, artists, and researchers can sign up for early access through the Robi Labs website or follow updates via @robilabs on social media.

About Robi Labs

Robi Labs is an independent AI research company based in Armenia, focused on building human-centric AI systems with local and global impact. Its growing ecosystem includes Lexa (multimodal language model), Picasoe (image generation), Framex (visual synthesis engine), and now MoVi — a step toward full multimodal intelligence that sees, hears, and understands.

For more information, visit robiai.com.

About author

Robi Labs is an independent AI research company creating next-generation models and tools like Lexa, Picasoe, Framex, Echo, Mira, and MoVi. Our mission is to make AI more human-centric, accessible, and impactful for creators, educators, and developers worldwide.

Robi Labs Team

General

Subscribe to our newsletter

Other blogs

Keep the momentum going with more blogs full of ideas, advice, and inspiration

Read all blogs

Press

Keep Reading

Robi Labs Announces Public Availability of Lexa Astra, the AI Tutor Model Powering Lexa for Education

Press

Keep Reading

Robi Labs Announces Public Availability of Lexa Astra, the AI Tutor Model Powering Lexa for Education

Press

Robi Labs, an independent AI research company, today unveiled Harvestr, a powerful new content scraping feature exclusively available within Lexa Chat. Harvestr allows users to automatically extract, clean, and structure content from standard web pages, Reddit posts, and YouTube videos, transforming them into actionable and citation-ready text for research, productivity, and content creation.

Keep Reading

Harvestr: AI-Powered Content Scraper Now Integrated into Lexa Chat

Press

Keep Reading

Harvestr: AI-Powered Content Scraper Now Integrated into Lexa Chat

Press

Robi Labs Launches Experimental Preview of Lexa Chat

Lexa Chat represents a new stage in accessible AI, giving the public direct interaction with Robi Labs’ advanced language and vision models. Users can explore freeform conversations, upload images, generate structured outputs, and collaborate through shareable chat sessions.

Keep Reading

Robi Labs Launches Experimental Preview of Lexa Chat

Press

Keep Reading

Robi Labs Launches Experimental Preview of Lexa Chat

Press

Keep Reading

Robi Labs Announces Public Availability of Lexa Astra, the AI Tutor Model Powering Lexa for Education

Press

Keep Reading

Harvestr: AI-Powered Content Scraper Now Integrated into Lexa Chat