On April 9, 2025, the highly anticipated Google Cloud NEXT 2025 conference officially kicked off a three-day innovation feast for the global tech community. At this conference, Google grandly announced a series of key upgrades for enterprise AI adoption, covering underlying infrastructure, innovative generative media platforms, powerful multimodal models like Gemini, and smarter data analysis tools. Its Vertex AI has now become the only generative AI platform that covers all four media formats (video, image, audio, music). This means that with just a text prompt, enterprises can extend from static images to full videos, and combine music and audio to create creative assets that are ready for immediate use.

This update not only introduces the music generation model “Lyria,” but also brings new functionalities to the existing Veo 2 (video), Chirp 3 (audio), and Imagen 3 (image), comprehensively upgrading the content creation experience.

Lyria: Text-to-Music Model, Creating Exclusive Brand Soundtracks

Lyria is Google’s latest text-to-music generative model, now available for preview on Vertex AI (whitelist application required). Lyria can generate high-quality, richly detailed music across various styles, helping enterprises to:

  • Build a sonic brand experience: Create exclusive music for marketing campaigns, product launches, and physical spaces to strengthen brand identity and emotional connection.
  • Accelerate video production workflows: When producing digital content such as videos and podcasts, there’s no longer a need to spend time searching for royalty-free music. Lyria can quickly generate exclusive soundtracks that fit the scenario and rhythm.

Example Application: Inputting a prompt can generate high-tension Bebop jazz, emphasizing improvisation and fast-paced dialogue, perfectly capturing the ambiance of a late-night jazz club.

Veo 2: From Video Generation to Complete Post-Production, An All-in-One Platform is Born

Veo 2 is Google’s leading video generation model, and this update to Vertex AI adds more powerful editing features, allowing users not only to generate videos but also to perform post-production and special effects adjustments:

  • Inpainting: Remove background clutter, logos, and other distracting elements from videos naturally and seamlessly. As shown in the figure below, it can naturally remove the actor’s wires.
  • Outpainting: Extend horizontal videos into vertical short videos, quickly adapting to different social media platforms.
  • Cinematic Director Function: Control shot composition, camera movement, time flow, etc., to achieve cinematic footage without professional skills.

  • Interpolation: Create natural transitions between two video segments, enhancing overall smoothness and professionalism.

Chirp 3: Just 10 Seconds of Audio, Create Exclusive Voice Characters

Chirp 3 is Google’s audio generation and understanding model, and this update adds two new features:

  • Instant Custom Voice: Upload a 10-second audio file to generate an exclusive voice, suitable for customer service centers, self-media, brand voice shaping, and other scenarios. This feature includes built-in security verification to ensure legal use.
  • Diarization: Distinguish utterances from different speakers in a recording, significantly improving the practicality and clarity of meeting minutes and podcast analysis.

Imagen 3: Higher Quality Text-to-Image and Image Repair Capabilities

Imagen 3 is Google’s most advanced text-to-image model, and this update strengthens its image “repair and object removal” functions:

  • High-quality Inpainting: Quickly fill in missing or damaged areas.
  • Natural Object Removal: Remove clutter and passersby from photos, leaving the scene natural and seamless.

Security and Responsible Governance: The Foundation of Enterprise-Grade AI

Google implements responsible AI principles across all generative models, providing enterprises with assurance for safe use:

  • Digital Watermarking (SynthID): Embed invisible watermarks in all generated images, videos, and audio files to prevent misattribution and misuse.
  • Safety Filtering Mechanism: Prevents the generation of harmful content and continuously improves model security.
  • Data Governance and Privacy Protection: Customer data will not be used to train models, and data processing fully complies with enterprise instructions.
  • Copyright Protection (Indemnification): Google pledges to provide third-party intellectual property protection for generated content under reasonable conditions.

Industry Cases: How Global Brands Utilize Vertex AI Generative Media Models

More and more enterprises are applying Vertex AI to their actual operations. For example, Kraft Heinz: After adopting Veo 2 and Imagen 3, their content development process was compressed from 8 weeks to 8 hours, significantly reducing costs and accelerating creative output.

Through the integration of four major generative models for video, image, audio, and sound, Vertex AI is no longer just an AI platform, but a key tool for enterprises to transform their creative productivity. With continuously evolving features, Google Cloud will help more brands accelerate their creative processes and achieve more impactful digital experiences. Whether you want to learn about the latest cloud knowledge or the latest events and industry applications, please feel free to contact Microfusion Technology. We will bring you more AI cloud new knowledge, so please pay close attention to our event information, and we look forward to seeing you at the events!