AI
This website lets you clone anyone’s voice in under 30 seconds
OpenVoice is an exciting glimpse into the future of voice technology.
Just a heads up, if you buy something through our links, we may get a small share of the sale. It’s one of the ways we keep the lights on here. Click here for more.
Have you ever wished to have the captivating voice of Morgan Freeman narrate your daily life? Or perhaps you’ve imagined your GPS speaking in the sultry tones of Scarlett Johansson?
Thanks to an innovative new tool from MyShell.ai, called OpenVoice, this and much more are now within reach.
So, what is it? OpenVoice is an instant voice cloning tool that can mimic any voice from just a short audio sample.
But the real magic is that it doesn’t stop at imitating someone’s voice; it essentially cherry-picks all of its unique characteristics.
OpenVoice allows granular control over voice styles, including emotion, accent, rhythm, pauses, and intonation, a feature that other voice cloning tools simply don’t offer.
The technology works by decoupling the components of a voice as much as possible, meaning the tone, style, and language are treated as individual elements.
This enables the base voice, style, and language to be manipulated independently, offering an impressive level of customization.
What really sets OpenVoice apart from its predecessors, like ElevenLabs, is its zero-shot cross-lingual voice cloning ability. This means that OpenVoice can mimic voices in languages that aren’t included in its training set.
So, if you’ve ever wanted your audiobook read in French by the voice of an English speaker, OpenVoice has got you covered.
How to clone a voice with MyShell’s OpenVoice
Although the technology is complex, using OpenVoice is surprisingly simple.
All it requires is a short audio clip from the desired speaker, and within seconds, you can generate speech in that person’s voice, in multiple languages, and with a range of emotions and styles.
Here’s a step-by-step guide on how to use MyShell’s OpenVoice based on the instructions provided on their GitHub page:
Clone the OpenVoice repository
Download Zip
Create and activate a Python environment
conda create -n openvoice python=3.9
conda activate openvoice
Install required packages
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-
cuda=11.7 -c pytorch -c nvidia
pip install -r requirements.txt
Download the checkpoint from here and extract it to the
checkpoints
folder.How to use OpenVoice
Note: Things get a bit technical here. If you don’t have any coding experience or are not familiar with Python environments, this is probably going to go over your head. But if you enjoy a bit of punishment, then let’s move forward.
- Flexible Voice Style Control: You can see an example of how OpenVoice enables flexible style control over the cloned voice in
demo_part1.ipynb
. - Cross-Lingual Voice Cloning: You can see an example for languages seen or unseen in the MSML training set in
demo_part2.ipynb
. - Gradio Demo: You can launch a local Gradio demo with the following command in your terminal:
python -m openvoice_app --share
Advanced Usage: The base speaker model can be replaced with any model (in any language and style) that you prefer.
You can use the se_extractor.get_se
function as demonstrated in the demo to extract the tone color embedding for the new base speaker.
2.5 Tips to Generate Natural Speech: There are many single or multi-speaker TTS methods that can generate natural speech, which are readily available.
By simply replacing the base speaker model with the model you prefer, you can push the speech naturalness to a level you desire.
How much does OpenVoice cost?
The service is currently free to use, and the team at MyShell.ai has made the source code and trained model available on GitHub, allowing developers to experiment and extend the technology.
What’s OpenVoice’s potential?
OpenVoice isn’t just a fun gimmick.
It has the potential to revolutionize industries, from entertainment and media, where it could be used to dub films or create personalized chatbots, to accessibility, where it could give a voice to those who have lost their own.
While the potential for misuse, such as deepfake audio or identity theft, is a concern, the team at MyShell.ai is committed to following ethical guidelines and exploring safeguards to prevent such misuse.
In terms of speed and accuracy, OpenVoice outshines its competitors. The tool is computationally efficient, and the team claims it can generate a second of speech in just 85 milliseconds.
OpenVoice is an exciting glimpse into the future of voice technology.
With its ability to clone any voice instantly, the possibilities seem endless. So why not give it a try and see who you could become?
The brains behind this powerhouse include Zengyi Qin from the halls of MIT and MyShell, Wenliang Zhao, and Xumin Yu, both from Tsinghua University and last but not least, Ethan Sun from MyShell.
Have any thoughts on this? Drop us a line below in the comments, or carry the discussion to our Twitter or Facebook.
Editors’ Recommendations:
- Who owns ElevenLabs?
- What is ElevenLabs?
- Figma’s bet on FigJam’s AI will make meetings more bearable
- Microsoft’s AI Copilot takes flight on iOS
Just a heads up, if you buy something through our links, we may get a small share of the sale. It’s one of the ways we keep the lights on here. Click here for more.
srujana
January 4, 2024 at 5:26 am
Hey Kevin, I was reaching out to you since two months. Mailed you and messaged you on linkedin multiple times. please get back to me when you have few minutes.
waiting for a sooner reply!