Use AI to Clone Voices & Speak OTHER LANGUAGES! – Elevenlabs + ChatGPT 4
Today we’re going to be taking a look at a very important update for a phenomenal AI in terms of text to speech AI. So this is where you just type regular text and then speech synthesis happens on terms of this front. This is the best that we have seen to dates, and it was the best at English, but recently, it’s got an update to make it multilingual. So if you speak another language, I’m going to need your help in this video because I only speak English. I want you guys to tell me in the comments below if you speak another language how these different languages sound today in this video as we put this AI to the test.
So yes, essentially, your comments as viewers at home are going to be a big part of making this video work as a concept. So viewers, if you’re watching right now, I recommend you scroll down in the comments and see what people have to say about the different languages as you watch through this video, and we test these things out. And as we go through today’s video, I want you to think about real-time voice translation. Picture this; we use technology very similar to this, an AI text-to-speech model cloned and trained on my particular voice. I wear some sort of a headset or maybe even just use my phone, some sort of device to translate in real-time what I am saying and actually make it sound like I’m saying it in another language in my actual voice because essentially everything that we need for that is here today.
Oh, and one other thing is that they actually have a way for you to create voices now just from scratch with 11 Labs, which is very interesting. So we’re going to need to test that out as well. That’s another update. The reason I want to follow this though is because this is single-handedly the best text-to-speech. I mean, it sounds like a real person. Alright viewers, here is the 11 Labs site. I gotta say I love the layout and design of their site. You know, I don’t talk about it much on this channel, but I go through a lot of different GUI and website designs, and I can always tell when it’s a really well-thought-out and good one, and this 11 lab site is designed phenomenally.
So first of all, there is a clone, a perfect clone pretty much of my voice on the site that I’ve already created. If you go to your voice lab here, you can see this is the different cloned voices. Here is a voice that I actually generated from scratch. It’s a randomly generated voice. Here is the first clone of my voice here, and then I retrained it because I was wondering if maybe 11 Labs did an update on how they train the different voices and they allowed you to add tags and stuff. So I’m not sure if these tags actually make a difference when you’re using these voices, so I redid it for this update, and I’ve also got a narrator clone as well. So we’ll be creating another randomly generated voice but first I want to test out the bilingual stuff and see if it works actually on my clone voice.
Let’s just do a little Test in English and ChatGPT is going to help us out for this one and we’re pulling out all of the stops here. ChatGPT is going to come up with a lemon-themed comedic tongue twister pronunciation reading test to test one’s ability to pronounce English words. I probably won’t even be able to pronounce some of these correctly but we’re going to see if 11 Labs has it on lock. Wow okay it’s definitely doing it in lemon-themed and it’s definitely a tongue twister. My God this is Lively lemons lavishly launched leaping and looping through lemonade Lakes Larry the limping lemur licked luscious lemon lollipops lingeringly licking his lips luxuriating in Lemony lusciousness Lucy the Lioness lounged on Lemon lined Lanes lazily laughing lemon lime Libations longing for lemon zest lanky Larry lit luminous lemon lanterns leading light-hearted lemon lovers laughing loudly lost in lemon lore. Wow I think I actually got that correctly though.
Let’s try it out in 11 Labs again. This is just the 11 monolingual as you can see. Here are the two different generators that we have now. So the monolingual one was the one that we had before. Now we have the multilingual one which is the new experimental one. As you can see it supports English German Polish Spanish Italian French Portuguese and Hindi and it’s experimental and they say that some numbers and symbols May currently be pronounced incorrectly for best results please spell them out so we’ll put that to the test as well that’s another update. Here are the settings that I’m using on my own trained voice so this will sound like me this is 25 stability and 80 percent on the clarity and similarity enhancement.
Let’s give it a generation Lively lemons lavishly launched leaping and looping through lemonade Lakes Larry the limping lemur licked luscious lemon lollipops lingeringly licking his lips luxuriating in Lemony lusciousness Lucy the Linus lounged on Lemon lined Lanes lazily laughing lemon lime Libations longing for lemon zest lanky Larry lit luminous lemon lanterns leading light-hearted lemon lovers laughing loudly lost in lemon lore. Wow okay that was pretty good I see no issue with what it generated there. I wonder if it sounds any different if we switch to this multilingual model and just run the same exact English text Lively lemons lavishly launched leaping and looping through lemonade Lakes Larry the limping lemur licked luscious lemon lollipops lingeringly licking his lips luxuriating and Lemony lusciousness Lucy the Lioness lounged on Lemon lined Lanes lazily laughing lemon lime Libations longing for lemon zest lanky Larry lit luminous lemon lanterns leading light-hearted lemon lovers laughing loudly lost in lemon lore sounds pretty much the same I don’t really see a difference in terms of quality and pronunciation it definitely spoke it differently but I gotta say that could just be variations in the generation because you do get a very degeneration in terms of this let’s just translate this into another language let’s just start off with German the second one here I wonder if it would still be a tongue twister in German I I’m guessing that’s not what would happen but yeah this is straight German so we’re not doing any phonetic pronunciation and what that means when I say phonetic it means that it’s trying to spell the word out in English so that a text generator would speak it correctly if that makes sense well try phonetic after we just do direct German but apparently this thing works pretty good with phonetic pronunciation and by the way I mean my voice was trained only on English words obviously I don’t speak German so give it a generate Larry Lecter lucidity. Okay I mean I don’t speak German but that sounds accurate to my English ears. Let’s go ahead and swap back to the monolingual one and see if it sounds any different insane Larry Durkin Lamer Lecter zitron and luchir and list the zonga Lang some Uber saying a lip and gliten schwelgan in zatronicite Lucid running something Legend I’m gonna take a guess and that it sounds correct in German but it’s like a very very thick heavy American accent where there’s definitely a difference in the generation there the multilingual 100 sounds more Germany if if I could even say that interesting let’s go ahead and try this now phonetically meaning it’s going to try to spell it out all right now we’re going to try it phonetically and it’s generating stuff that still looks very German to me however it’s definitely different from the top one let’s take a listen started list.
Does that sound more accurate? Does it sound less accurate if you speak German? Let me know. There’s definitely a difference with the multi-bilingual model, though, in comparison to the monolingual. Let’s try the most spoken language in the world, which I believe is Spanish, correct? All right, Chad GPT is now giving us a story in Spanish about a guy who just can’t stop eating spaghetti. You viewers who actually speak Spanish are going to understand this story before me because I’m going to listen in Spanish first and then hear it in English translated. It’s very interesting because a lot of languages have to be spoken in a different accent than just the basic American accent, and my particular cloned voice for 11 Labs was trained on a very American accent. So, it has to do some thinking in the background there with its AI brain to make me sound like I have a Spanish accent. It should be more recognizable to my English ear that this accent is Spanish, man. The story is pretty long, though. Oh my gosh, uh, we have to listen to this though. All right, this is a huge story. Let’s go ahead and listen to it though.
[Story in Spanish follows]
In the end, Ricardo’s nose returned to its normal size, but his cheerful spirit and sense of humor remained intact, and he always remembered the day when spaghetti made him smile in a completely unexpected way. Yeah, that was a lot better, like way better. So, if you’re gonna be doing English, you really should just stick to the monolingual model for now. But if you’re doing multilingual, you’re definitely gonna obviously want to use this one. Let’s go ahead and try some Italian, why don’t we? And I’m gonna use my previously generated Common Guy voice-over model that I created. And, you know what? I’ll give you guys a little demo. This is again a voice that I entirely generated just using 11 Labs’ little Creator, which we’ll go over at the end of the video. So, we’ll do a little test here so you can hear this completely generated voice, not based off of any particular person.
Hello viewers, subscribe to Matvid Pro AI or else you don’t even know what you’re in for if you don’t obey my I command and like and subscribe. Honestly, I hope you don’t. I am eager to do a little mind control. It’s been a while.
So yeah, that’s a completely AI-created voice, which is really astonishing. It sounds pretty good. I’m pretty impressed with that. Let’s see if it can do other languages. This one came out a lot slower, so I’m not sure what that’s all about. I don’t know which one is better for Italian speakers, Mattvid Pro or the AI-generated voice. I gotta know what that says in English. Oh God, it’s like the worst joke ever, “mom’s spaghetti.” I wish I had Eminem’s voice on this. We’re gonna go back to the monolingual and listen to this in English at the best quality.
Dear friends, we beg you to subscribe to the Matvid Pro AI Channel. Remember to leave a like and comment, or else be ready to face the fury of mom’s spaghetti. Jokes aside, the Matvid Pro AI channel will offer you fun, stimulating, and irresistible content that will make you forget the taste of Sicilian cannoli. You won’t want to miss a single second of this laughter and entertainment party, so subscribe, click like, and comment to secure a front-row seat in the wonderful world of Matvid Pro AI. Have fun, everyone.
When we set little, little, he gave out at the end there. Interesting. All right, we have a few more languages to test out. Let’s do Polish, French, Portuguese, and Hindi. We’ll try French next. All right, apparently this is an interesting and comedic fact about turtles in French. We’ll let the basic pre-made 11 Labs character say this one.
“Turtles can hold their breath for a long period of time until they need to breathe.” What kind of a fact is that, Chad GPT? Boring! I wanted to hear something real fun. All right, now we’re gonna get a fact about cats, and it’s going to be in Polish. Down here, we’ve also got it in English, thanks Chad GPT.
“They have an incredible sense of balance, flexibility, agility, gravity-defying stunts. Yes, cats can do that. Cats are so dedicated to their craft that they spend a significant portion of their lives practicing leaping onto narrow surfaces to walking on tight ropes made of fences.” Yes, interesting. Now we’ve got it in Polish. Let’s hear it, and we’ll go back to my voice for this one. Turn the stability down just a hair on this so it doesn’t speak so fast.
“I don’t know. I mean, Polish is very far away from what I would be able to pick up on in terms of a language being spoken correctly, so I have no idea.” All right, now we’re gonna go for Hindi, and this time it’s… well, it’s definitely written in Hindi, first of all. This time is going to be about a dead goldfish. All right, let’s hear it.
“I don’t know anyone who speaks Hindi. How does that sound? Dear friend, it is with sadness that I have to inform you that our beloved goldfish passed away yesterday. It was always a part of our home’s happiness, and now its absence is felt. It was always there with us when we were relaxing at the end of the day, witnessing our laughter and joy. It was a tiny fish, but it had an ocean’s worth of love in its heart. With this message, we remember its memories and mourn for it. Our little goldfish will always remain in our hearts. Sorrowfully yours, your name. Wow, we’ve got one language left, and that’s Portuguese. Portuguese can be spoken many different ways. I mean, I’m sure all of those languages can be spoken many different ways, but I know mainly there’s like Mainland Portuguese and then Brazilian Portuguese. So I’m wondering, well, what first? What Chad GPT is gonna go for, and then what this thing is gonna speak?
All right, this is gonna be a response. All right, let’s hear this Portuguese response. “I don’t know, that’s pretty funny. All right, let’s go ahead and build a quick voice with 11 Labs, and then we’ll listen to the response in English here with that created voice. So what we’ll do is we’ll go to the voice lab here and click ‘Add a Voice,’ and we’ll do voice design. By the way, to get access to voice design and instant voice cloning, it’s five bucks a month, which I think is a pretty good deal for this. You can create 10 voices with that. We’re gonna go for a middle-aged voice. We’ll go for a British accent. And, might as well, we’ll make it male. Accent strength will make pretty high. Let’s take a listen. It’s gonna randomly generate a voice. I mean, these are pretty much our only parameters here.”
First, we thought the PC was a calculator, then we found out how to turn numbers into letters, and we thought it was a typewriter. Let’s try Australian instead. “First, we thought the PC was a calculator, then we found out how to turn numbers into letters, and we thought it was a typewriter. See, that was not Australian. It was super low on the accent. First, we thought the PC was a calculator, then we found out how to turn numbers into letters, and we thought it was a typewriter. Go straight American accent. “First, we thought the PC was a calculator, then we found out how to turn numbers into letters, and we thought it was a typewriter. That’s, like, straight fitnessgram. Pacer test is a multi-stage aerobic capacity test. Yeah, let’s go with that one. We’ll name this ‘Pacer tests.’ There. Create the voice, and now we’ll go ahead and let our newly created Pacer test voice read this one out.
I am extremely upset and disappointed to learn of the sad news about the death of our beloved goldfish. When I entrusted you with the care of my cherished pet, I expected you to look after it with the same dedication and love that I have always had. However, it now appears that the fish did not receive proper care during my absence. Your lack of attention and negligence in your responsibilities have led to the irreplaceable loss of an innocent creature that only brought joy and happiness to our lives. I cannot express the pain and sorrow I feel at this moment, knowing that my dear goldfish suffered due to your inability to fulfill the obligations entrusted to you. I ask that in the future, you reflect on your actions and the consequences they may have on the lives of other living beings. I hope you understand the gravity of this situation and take the necessary steps to ensure that such a tragedy never repeats itself. With disappointment and indignation, your name.
All right, that’s good. That’s good. So viewers, how do you think 11 Labs is doing with all of this? If you speak multiple languages, if you speak any of those languages that we listened to today, I would absolutely love your response down below. Let me know how it did because that’s really the most important part about this. I don’t speak those languages, so I can’t tell. They sounded pretty good to my American ear, but that’s as far as it goes for me. So, you know, I know it does English really well, but does it do the other languages as well? And again, I want you to keep in mind some of the really amazing capabilities that this can bring up in the future. I mean, again, picture a real-time translation of someone’s voice, like it could be my voice cloned on this thing speaking another language in real-time, being translated, and you could use Chad GPT to translate it in real-time and use this thing to make it sound like it’s my voice actually saying it. Really cool stuff. Let me know what you think in the comments, and I’ll see you in the next video. Thanks. Thanks for watching.