News

Alibaba showed off its image-to-video model AtomoVideo achieving “superior results” compared to Gen-2 and Pika 1.0

Mar 7, 2024

451

Alibaba’s research team has unveiled a high-fidelity framework for image-to-video generation, named AtomoVideo. The team has shared papers and Image-to-Video examples of AtomoVideo along with samples from Runway’s Gen-2 and also Pika 1.0.

A simpler but less artifact-ridden video output

Keeping in mind that AtomoVideo is a first-generation product, the provided samples do look promising, although they are still far from looking realistic. Surprisingly, comparing it with Runway’s second generation model (Gen-1 was released on February 2023) reveals that this just-unveiled model does a better job of mitigating some weird transitions between frames.

For example, in a comparison sample of an astronaut in space, the reflective glass covering or the visor just vanished from the sample of Gen-2 as he was moving around. While AtomoVideo kept the movement relatively simpler, it didn’t generate such a result. In another comparison sample, Gen-2 depicted people vanishing while skying on the snow while Pika 1.0 showed some weird movement on the slope that is hard to define with physics. AtomoVideo, again, kept it relatively simple but managed to avoid such mistakes. Nonetheless, these comparison samples are most likely some of the curated samples instead of being randomly generated ones.

Key features of Alibaba’s AtomoVideo

Specialties of AtomoVideo include its ability to maintain high fidelity to the input image, ensure smooth motion transitions, and support the prediction of subsequent video frames. Moreover, the framework boasts compatibility with various existing T2I (Text to Image) models and offers high semantic controllability. It allows users to customize video content according to their specific preferences.

AtomoVideo achieves its remarkable performance by leveraging pre-trained T2I models as a foundation and enhancing them with one-dimensional spatiotemporal convolution and attention modules. These additional layers enable the framework to capture intricate details and styles while ensuring temporal consistency throughout the generated videos. By incorporating advanced image semantics through Cross-Attention mechanisms, AtomoVideo further enhances its ability to produce videos with precise semantic control.

Despite the impressive capabilities demonstrated by AtomoVideo, the research team has yet to provide an online platform for users to experience the technology firsthand. Nonetheless, Alibaba’s AtomoVideo framework represents a significant addition to the field of image-to-video synthesis.

Related:

(Source)

Alibaba showed off its image-to-video model AtomoVideo achieving “superior results” compared to Gen-2 and Pika 1.0

A simpler but less artifact-ridden video output

Key features of Alibaba’s AtomoVideo

Honor 200, 200 Pro tipped to feature Snapdragon 8s Gen 3, Snapdragon 8 Gen 3

Deal:Make Mother’s Day memorable with personalized gifts from Creality Falcon Laser Engraver

Blaupunkt TV deals for Amazon Great Summer Sale, Big Savings Days Sale revealed: Read all...

A simpler but less artifact-ridden video output

Key features of Alibaba’s AtomoVideo

RELATED ARTICLESMORE FROM AUTHOR

iFlytek launches new AI-powered Hearing Aid boasting industry-leading 65dB gain, real-time subtitles

Nubia Teaser: AI powered smartphones to usher in a new era next week

Microsoft & OpenAI planning $100 billion supercomputer Stargate AI

Honor 200, 200 Pro tipped to feature Snapdragon 8s Gen 3, Snapdragon 8 Gen 3

Deal:Make Mother’s Day memorable with personalized gifts from Creality Falcon Laser Engraver

Blaupunkt TV deals for Amazon Great Summer Sale, Big Savings Days Sale revealed: Read all...

RELATED ARTICLES MORE FROM AUTHOR