I recently stumbled upon a Substack article by Ethan Mollick, a professor of management at the Wharton School of the University of Pennsylvania, titled “A quick and sobering guide to cloning yourself.”
“With just a photograph and 60 seconds of audio, you can now create a deepfake of yourself in just a matter of minutes by combining a few cheap AI tools. I’ve tried it myself, and the results are mind-blowing, even if they’re not completely convincing. Just a few months ago, this was impossible. Now, it’s a reality.”
As a former radio guy I was more interested in the audio portion of Professor Mollick’s experiment.
“Clone a voice from a clean sample recording. Samples should contain 1 speaker and be over a 1 minute long and not contain background noise. Currently works best on US-English accent.”
I created an account at 11ElevenLabs, picked a voice and uploaded some text from my blog bio.
For $5 a month (first month free) you can synthesize your own voice. I uploaded a recording of me reading that same bio.
Finally, I pasted in some text from one of my blog posts and my voice was “cloned.”
Just to be clear, the first audio is one of their “voices.” The the second audio is a recording of my voice. The real me, if you will. And the third audio is the synthesized Steve voice. I’m not sure someone could tell the difference. I sort of prefer the synthesized reading over my own. In two years (?), this technology will be so good it will be nearly impossible to tell real from cloned.