AI

This website lets you clone anyone’s voice in under 30 seconds

OpenVoice is an exciting glimpse into the future of voice technology.

by
Kevin Raposo
January 3, 2024

image of a laptop and microphone setup in a dark room

Image: KnowTechie

Just a heads up, if you buy something through our links, we may get a small share of the sale. It’s one of the ways we keep the lights on here. Click here for more.

Have you ever wished to have the captivating voice of Morgan Freeman narrate your daily life? Or perhaps you’ve imagined your GPS speaking in the sultry tones of Scarlett Johansson?

Thanks to an innovative new tool from MyShell.ai, called OpenVoice, this and much more are now within reach.

So, what is it? OpenVoice is an instant voice cloning tool that can mimic any voice from just a short audio sample.

But the real magic is that it doesn’t stop at imitating someone’s voice; it essentially cherry-picks all of its unique characteristics.

OpenVoice allows granular control over voice styles, including emotion, accent, rhythm, pauses, and intonation, a feature that other voice cloning tools simply don’t offer.

Video: YouTube

The technology works by decoupling the components of a voice as much as possible, meaning the tone, style, and language are treated as individual elements.

This enables the base voice, style, and language to be manipulated independently, offering an impressive level of customization.

What really sets OpenVoice apart from its predecessors, like ElevenLabs, is its zero-shot cross-lingual voice cloning ability. This means that OpenVoice can mimic voices in languages that aren’t included in its training set.

So, if you’ve ever wanted your audiobook read in French by the voice of an English speaker, OpenVoice has got you covered.

Video: YouTube

How to clone a voice with MyShell’s OpenVoice

Although the technology is complex, using OpenVoice is surprisingly simple.

All it requires is a short audio clip from the desired speaker, and within seconds, you can generate speech in that person’s voice, in multiple languages, and with a range of emotions and styles.

Here’s a step-by-step guide on how to use MyShell’s OpenVoice based on the instructions provided on their GitHub page:

Clone the OpenVoice repository

You can do this by navigating to the OpenVoice GitHub repository and clicking the green ‘Code’ button. Then click ‘Download ZIP’ to download the repository files to your local system.

Download Zip

Then click ‘Download ZIP’ to download the repository files to your local system.

Create and activate a Python environment

Create a new Python environment and activate it. If you’re using Anaconda, you can do this with the following commands in your terminal:

conda create -n openvoice python=3.9
conda activate openvoice

Install required packages

To install the required packages, you can do this with the following commands in your terminal:

conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-
cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt

Download the checkpoint from here and extract it to the checkpoints folder.

How to use OpenVoice

Note: Things get a bit technical here. If you don’t have any coding experience or are not familiar with Python environments, this is probably going to go over your head. But if you enjoy a bit of punishment, then let’s move forward.

Flexible Voice Style Control: You can see an example of how OpenVoice enables flexible style control over the cloned voice in demo_part1.ipynb.
Cross-Lingual Voice Cloning: You can see an example for languages seen or unseen in the MSML training set in demo_part2.ipynb.
Gradio Demo: You can launch a local Gradio demo with the following command in your terminal:

python -m openvoice_app --share

Advanced Usage: The base speaker model can be replaced with any model (in any language and style) that you prefer.

openview coding example on a purple background — Image: KnowTechie

You can use the se_extractor.get_se function as demonstrated in the demo to extract the tone color embedding for the new base speaker.

2.5 Tips to Generate Natural Speech: There are many single or multi-speaker TTS methods that can generate natural speech, which are readily available.

By simply replacing the base speaker model with the model you prefer, you can push the speech naturalness to a level you desire.

Please note that this repository is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which prohibits commercial usage.

How much does OpenVoice cost?

The service is currently free to use, and the team at MyShell.ai has made the source code and trained model available on GitHub, allowing developers to experiment and extend the technology.

What’s OpenVoice’s potential?

OpenVoice isn’t just a fun gimmick.

It has the potential to revolutionize industries, from entertainment and media, where it could be used to dub films or create personalized chatbots, to accessibility, where it could give a voice to those who have lost their own.

While the potential for misuse, such as deepfake audio or identity theft, is a concern, the team at MyShell.ai is committed to following ethical guidelines and exploring safeguards to prevent such misuse.

openvoice voice cloning tech example — Image: KnowTechie

In terms of speed and accuracy, OpenVoice outshines its competitors. The tool is computationally efficient, and the team claims it can generate a second of speech in just 85 milliseconds.

OpenVoice is an exciting glimpse into the future of voice technology.

With its ability to clone any voice instantly, the possibilities seem endless. So why not give it a try and see who you could become?

The brains behind this powerhouse include Zengyi Qin from the halls of MIT and MyShell, Wenliang Zhao, and Xumin Yu, both from Tsinghua University and last but not least, Ethan Sun from MyShell.

Have any thoughts on this? Drop us a line below in the comments, or carry the discussion to our Twitter or Facebook.

Editors’ Recommendations:

Just a heads up, if you buy something through our links, we may get a small share of the sale. It’s one of the ways we keep the lights on here. Click here for more.