Gemini's Lyria 3 lets you create custom 30-second audio tracks

AI creativity is moving a step forward, and we are about to see a lot of transformation this 2026. The latest is from Google, which has integrated Lyria 3, its most advanced music generation model from Google DeepMind, directly into the Gemini app.

- Advertisement -

Announced on February 18th, 2026, and now rolling out in beta globally, this feature empowers anyone aged 18 and older to generate high-fidelity, custom 30-second audio tracks complete with instrumentals, vocals, and automatically created lyrics.

Simply describe your idea in text, upload a photo (or even a short video clip) for inspiration, and watch Gemini turn it into a fully realized musical piece, complete with matching AI-generated cover art.

- Advertisement -

This launch marks a major evolution in consumer-facing AI music tools, bringing professional-grade generation capabilities, which were previously limited to labs or select creators, into everyday use via a free, intuitive interface.

What is Lyria 3?

Lyria 3 is Google DeepMind’s most sophisticated music generation model to date, marking a significant leap forward in artificial intelligence’s creative capabilities.

Building upon the foundation established by earlier Lyria iterations, this latest version delivers transformative improvements across audio quality, creative flexibility, and user accessibility. The model represents a convergence of cutting-edge machine learning techniques with an intuitive understanding of musical composition, positioning it as a powerful tool for both amateur creators and professional musicians seeking to explore new sonic territories.

At the heart of Lyria 3’s advancement lies its dramatically enhanced audio fidelity and realism. The model produces tracks characterized by seamlessly flowing note-to-note transitions, sophisticated instrumentation with genuine depth and texture, and vocal performances that capture the nuanced expressiveness of human singers.

Perhaps most notably, Lyria 3 introduces automatic lyrics generation. This is a groundbreaking feature that eliminates the need for user-supplied text by autonomously crafting original, contextually appropriate lyrics that align with the creative prompt. This capability transforms the model from a mere accompaniment generator into a complete songwriting partner, capable of delivering fully realized musical compositions from a single descriptive input.

The model’s creative control mechanisms offer unprecedented precision, allowing users to sculpt every dimension of their musical vision. Creators can specify granular parameters including genre, emotional mood, exact tempo measurements (such as 90 BPM for upbeat tracks), vocal characteristics (male or female voices, singing versus rap delivery, laid-back versus energetic performance styles), instrumental arrangements, dynamic range, and overall atmospheric qualities. This level of control empowers users to move beyond generic outputs and craft music that genuinely reflects their artistic intentions, whether they’re seeking a specific nostalgic sound or exploring entirely novel sonic combinations.

Lyria 3’s versatility extends across an impressive spectrum of musical styles and cultural contexts. The model demonstrates fluency in diverse genres ranging from pop, funk, and Motown to lo-fi, Afrobeat, K-pop, 90s rap, Latin pop, R&B, EDM, classical, jazz, and experimental soundscapes.

Its multilingual vocal capabilities span English, Japanese, Korean, Hindi, Spanish, Portuguese, German, and French, with additional languages in development, a feature that opens doors for global creators and cross-cultural musical exploration. This broad genre and language support positions Lyria 3 as a truly universal creative tool, capable of honoring traditional musical forms while facilitating innovative fusion and experimentation.

Perhaps the most imaginative feature of Lyria 3 is its image-to-music capability, which bridges visual and auditory creative domains in a novel way. Users can upload photographs, and the model interprets the mood, scene, or narrative suggested by the image to inspire corresponding musical compositions.

For instance, a photograph of a dog hiking through a forest might generate an adventurous, folksy acoustic piece celebrating exploration and natural wonder. This multimodal approach not only demonstrates the model’s sophisticated understanding of emotional and contextual associations but also offers creators an entirely new pathway for musical inspiration; one that begins with visual storytelling and culminates in sonic expression.

Google emphasizes responsible development: The model is trained with “great care regarding copyright and partner agreements.” It focuses on original expression rather than direct imitation; for instance if you reference an artist, Gemini interprets it as stylistic inspiration (e.g., mood or vibe) rather than copying. Outputs include an invisible SynthID digital watermark to clearly identify them as AI-generated.

How to create your own custom tracks

Getting started is straightforward and requires no musical experience or equipment. The feature is free (with higher daily generation limits for Google AI Plus, Pro, or Ultra subscribers) and accessible via the Gemini web app or mobile app.

Visit gemini.google.com in your browser (or open the Gemini mobile app). Sign in with your Google account. Below the main prompt box, click Tools or look for a music note icon. Select Create music or alternatively, go directly to gemini.google.com/music if available in your region.

Then, upload a photo, image, or short clip. Gemini analyzes it to capture atmosphere, a perfect one for personalizing tracks around memories, places, pets, or events.

After this, write a descriptive request. The more details, the better the result. Include:

Genre and sub-style
Mood or emotion (upbeat, nostalgic, chill, epic)
Tempo or energy level
Vocal preferences (e.g., soulful female vocals, fast rap)
Theme, story, or lyrics idea (or let it auto-generate)
Instruments or production elements

Browse the template gallery for quick starts (e.g., Afropop hooks, R&B romance, 90s rap vibes) and customize from there, and submit the prompt. In seconds to a minute, Gemini produces a 30-second high-quality audio track along with a custom cover art powered by Google’s Nano Banana image model.

You can then download the MP3 file. Share via link, social media, messaging apps, or export for personal use (e.g., social posts, stories, intros, or fun gifts).

- Advertisement -