Google – Imagen Video
Imagen Video was created by the Google Research Brain Team. It generates high resolution videos with Cascaded Diffusion Models. The first step is to take an input text prompt and encode it into textual embeddings with a T5 text encoder. A base Video Diffusion Model then generates a 16 frame video at 24×48 resolution and 3 frames per second; this is then followed by multiple Temporal Super-Resolution (TSR) and Spatial Super-Resolution (SSR) models to upsample and generate a final 128 frame video at 1280×768 resolution and 24 frames per second – resulting in 5.3s of high definition video!
Overview
Imagen Video was created by the Google Research Brain Team. It generates high resolution videos with Cascaded Diffusion Models. The first step is to take an input text prompt and encode it into textual embeddings with a T5 text encoder. A base Video Diffusion Model then generates a 16 frame video at 24×48 resolution and 3 frames per second; this is then followed by multiple Temporal Super-Resolution (TSR) and Spatial Super-Resolution (SSR) models to upsample and generate a final 128 frame video at 1280×768 resolution and 24 frames per second – resulting in 5.3s of high definition video!
Ask AI about Google – Imagen Video
Live Q&AGet instant answers from leading AI assistants for use cases, pricing, and alternatives.