Google has introduced VideoPOET breaking new ground in coherent video generation

Dec 21, 2023

105

After Microsoft‘s Copilot AI gets the ability to generate audio clips from text prompts, Google has introduced VideoPoet, a large language model (LLM) that pushes the boundaries in video generation with 10-second clips that produce fewer artifacts. The model supports an array of video generation tasks, including text-to-video conversion, image-to-video transformation, video stylization, inpainting, and video-to-audio functionalities.

It generates 10-sec video clips from text prompts and is also able to animate still images

Unlike its predecessors, VideoPoet sets itself apart by excelling in the generation of coherent large-motion videos. The model showcases its prowess by producing ten-second long videos, leaving its competition, including Gen-2 behind. Notably, VideoPoet doesn’t rely on specific data for video generation, distinguishing it from other models that require detailed input for optimal results.

This multifaceted capability is made possible by leveraging a multi-modal large model, setting it on a trajectory to potentially become the mainstream in video generation.

Google’s VideoPOET takes a departure from the prevailing trend in video generation models, which predominantly rely on diffusion-based approaches. Instead, VideoPoet harnesses the power of large language models (LLMs). The model seamlessly integrates various video generation tasks within a single LLM, eliminating the need for separately trained components for each function.

The resulting videos exhibit variable length and diverse actions and styles based on the input text content. Additionally, VideoPoet can perform the conversion of input images into animations based on provided prompts, showcasing its adaptability across different inputs.

The release of VideoPOET adds a new dimension to AI-driven video generation, hinting at the possibilities that lie ahead in 2024.

Related:

(Source)

Google has introduced VideoPOET breaking new ground in coherent video generation

It generates 10-sec video clips from text prompts and is also able to animate still images

Meizu doesn’t quit the smartphone market, 5 new Meizu phones to be released

Xiaomi BE5000 Wi-Fi 7 router launched with HyperOS & mesh support

MSI Launches the RTX 4070 Ti SUPER Shadow 3X Graphics Card with 16GB of DDR6X...

It generates 10-sec video clips from text prompts and is also able to animate still images

RELATED ARTICLESMORE FROM AUTHOR

10 Amazing Features Coming to Android 15

OpenAI is now developing an alternative to Google Search, and is hiring Googlers for that

Top 5 Google Pixel 8a alternatives: Know why the Pixel 8a still holds an edge

Meizu doesn’t quit the smartphone market, 5 new Meizu phones to be released

Xiaomi BE5000 Wi-Fi 7 router launched with HyperOS & mesh support

MSI Launches the RTX 4070 Ti SUPER Shadow 3X Graphics Card with 16GB of DDR6X...

RELATED ARTICLES MORE FROM AUTHOR