Vocal Isolation: What It Is and Which Model to Choose

What Is Vocal Isolation?

Vocal isolation (also known as vocal separation or vocal removal) is the process of using AI to split a song into its individual components — typically separating the vocals from the instrumental track. This is a core part of how MyKaraoke Video works, as it allows us to create karaoke-ready instrumental tracks directly from any song.

Why Is It Needed?

When you upload a song to MyKaraoke Video, the AI needs to accomplish two things:

Create an instrumental track — by removing the vocals, we generate a clean backing track that can be used in your karaoke video.
Sync the lyrics — the AI listens to the vocals to determine exactly when each word is sung, so the lyrics can be highlighted in time with the music.

Without vocal isolation, there would be no way to produce a karaoke version of a song from a standard audio file. It's the foundation that makes the entire process possible.

Base Model vs Pro Model

MyKaraoke Video offers two AI models for vocal separation, each suited to different needs.

Base Model

The Base model is our standard vocal isolation engine. It does a solid job at separating vocals from instrumentals and is included with all plans.

Key characteristics:

Costs 1 credit per minute of audio
Produces a clean instrumental and a vocal track
If you need to separate backing vocals from lead vocals, it requires an additional processing step after the initial separation
Great for most use cases, especially lyric videos

Pro Model

The Pro model is powered by a partnership with one of the best vocal isolation engines in the world. It delivers noticeably higher separation quality across the board.

Key characteristics:

Costs 5 credits per minute of audio
Superior vocal isolation — cleaner instrumentals with fewer artifacts
Separates lead vocals, backing vocals, and instrumentals all in a single pass — no extra processing step needed
Better handling of complex mixes and songs with heavy effects

Which Model Should I Choose?

Choose the Base model if:

You're creating lyric videos — since lyric videos keep the original vocals, the separated instrumental track won't be used in your final video. The Base model is more than sufficient for lyrics syncing purposes, and you'll save credits.
You don't need backing vocal separation — if a simple vocals-out instrumental is all you need, the Base model handles this well.
You want to conserve credits — at 1 credit per minute, the Base model is 5× cheaper than the Pro model.

Choose the Pro model if:

You're creating karaoke videos and want the best quality — since your audience will hear the instrumental track, cleaner separation makes a real difference. At 5 credits per minute, a typical 3-minute song costs just 15 credits.
You need control over backing vocals — the Pro model gives you separate lead and backing vocal tracks in one go, so you can decide whether to keep backing vocals in your karaoke track or remove them entirely.
You're working with complex or busy mixes — songs with lots of layered instruments, effects, or overlapping vocal harmonies benefit most from the Pro model's superior separation.

Quick Comparison

Feature	Base Model	Pro Model
Credit cost	1 credit/min	5 credits/min
Vocal/instrumental separation	✅	✅
Lead + backing vocal separation	Requires extra step	✅ Single pass
Separation quality	Good	Best available
Best for lyric videos	✅ Recommended	Works, but not necessary
Best for karaoke videos	Good	✅ Recommended

Summary

Both models get the job done, but they serve different priorities. If you're focused on lyric videos or want to keep things simple and cost-effective, the Base model at 1 credit per minute is a great choice. If you're producing karaoke content where audio quality is front and center and you want full control over lead and backing vocals without extra steps, the Pro model at 5 credits per minute is well worth the upgrade.

Help center