'Frank Sinatra' was crooning about hot tubs in 2020, more than 20 years after his death.
Key points:
- Artificial intelligence is being used to create deepfakes of singers' voices
- There are legal and moral questions around the use of the technology
- Some musicians are beginning to see potential for its use
However, it was only as a deepfake that the iconic voice of Ol' Blue Eyes sang, "Ohh, it's hot tub time" over a warbled backing track of piano and horns.
The song, titled Hot Tub Christmas, is the product of artificial intelligence algorithms developed by San Francisco company OpenAI, which counts Microsoft among its investors.
Produced by the company's AI system known as Jukebox — which can generate new songs and vocals which sound almost exactly like real artists — Hot Tub Christmas is just one of thousands of audio deepfakes which AI researchers and enthusiasts have shared online.
There are legal and moral questions around the technology, but some artists are beginning to see it as an opportunity to create a digital likeness, which may allow their voice to live forever.
Loading
What is an audio deepfake?
Deepfakes are realistic video or audio of events that never actually took place, generated by artificial intelligence and what's known as machine learning — basically, computer algorithms that can improvise by ingesting and recompiling a large amount of data.
The technology has already been used to create fake videos of Tom Cruise, which set off alarm bells in national security circles.
It is also sometimes used in image-based abuse, when people's faces are added into pornographic material without their consent.
Audio deepfakes — usually an AI recreation of someone's voice — have, so far, received less attention.
However, earlier this year one recreated an old Eminem song for a 2020 audience, and another controversially recreated the voice of late chef Anthony Bourdain for use in a documentary.
How are audio deepfakes made?
OpenAI's Jukebox system was trained by ingesting and examining 1.2 million songs, their corresponding lyrics and information, such as artist names, genres and years of release.
Using this data, Jukebox can create new music samples from scratch.
An OpenAI spokesperson said that Jukebox and similar algorithms were still in their early stages of development.
"Some might say they're like students at a conservatory who are learning how to imitate the great musicians of the past," they said.
Loading
To deepfake … or not
Last year, American rapper Jay-Z's company, Roc Nation, claimed its copyright had been infringed by deepfake audio of the hip hop mogul published on a YouTube channel called Voice Synthesis.
The anonymous person behind the channel had used the AI-powered Tacotron 2 system developed by Google to make it sound like Jay-Z had rapped the "To be, or not to be" soliloquy from Shakespeare's Hamlet, and sung the Billy Joel song We Didn't Start the Fire.
Roc Nation's claim against the deepfakes alleged the YouTube channel had "unlawfully" used AI "to impersonate [their] client's voice".
The channel's creator, whose videos did not have ads, argued the clips weren't malicious and did not generate income.
YouTube initially removed the deepfakes, but the Google-owned platform later reinstated them with little explanation, highlighting the murky legal situation around recreating someone's voice using artificial intelligence.
Loading
Can you copyright a voice?
The way a particular human voice sounds is not protected under copyright law, which can leave artists in a precarious position if someone tries to create an audio deepfake of them.
Queensland University of Technology senior law lecturer Dr Kylie Pappalardo said the copyright holders of an artist's sound recordings could potentially still claim copyright infringement if they could prove their recordings were copied in the process of creating a deepfake, whether someone made money from it or not.
"So, if Jay-Z and his company can't prove any of those steps, then their allegations might fall over."
Chris Chow, the managing director of entertainment law firm Creative Lawyers, said the likelihood of copyright infringement also depended on the way an algorithm worked.
"In the context of music, if an AI system simply listened to sound recordings without reproducing them, and then an entirely new sound recording was created — as is the case with some deepfakes — without including any part of the original sound recordings, it would be highly unlikely that copyright infringement of any of the original sound recordings would have occurred," Mr Chow said.
Complicating factor
The situation was more complicated if a newly created deepfake contained any part of an existing copyright-protected "musical work", such as an existing melody, Mr Chow said.
"From a copyright perspective, there are complexities and intricacies at play which make deepfakes a difficult phenomenon to analyse and explain in brevity, and even more challenging to sue over," he said.
Dr Pappalardo says while the boundaries of copyright around audio deepfakes are still being tested, Australian artists may have other protections from their voices becoming deepfakes without their permission.
This is because Australia recognises the moral rights of an artist's performance as part of the Copyright Act of 1968, and these rights cannot be transferred to another person.
Legal differences in US
Such rights do not exist in the same way in the United States, partly because of the freedom of speech promised by that country's First Amendment.
However, Dr Pappalardo said, celebrities in America could sometimes use US privacy laws and right of publicity protections to stop unauthorised use of their likeness.
OpenAI said it was conducting research into intellectual property rights, and its Jukebox system was only for non-commercial activities.
Australian music copyright agency APRA AMCOS — which represents more than 108,000 songwriters, composers and publishers — said it did not yet have a position on audio deepfakes, but was watching the technology closely.
Moral concerns around voices living forever
Dr Pappalardo's colleague, Professor Patrik Wikstrom, said audio deepfakes and attempts to make an artist's voice live forever were very problematic from a moral point of view.
"Would it be accepted by fans? That's very doubtful, I think. I struggle to see a [deepfake] song on the charts. The music industry is all about novelty. Artists recreate themselves constantly, but what would that look like if we hold onto the voice?
"I would be very surprised if that would be a big thing – I think it would be just a curiosity in the music industry. An artist is so much more than their voice.
"There is no shortage of talent. It's not that we don't have talented singers and musicians who can start new careers."
Creating your own 'digital twin'
Despite the legal ambiguity and ethical questions around audio deepfakes, some are beginning to see potential for the technology's use in the music industry.
Some believe record companies will use deepfakes to exploit a new kind of commercial immortality for their artists, amid the increasing prevalence of things such as hologram concerts for musicians who are no longer alive.
There is already a deepfake audio platform called Marvel.ai, where celebrities can license AI-generated clips of their own voice, and musicians are beginning to think about how they can control their own digital likeness.
In July, American electronic musician, singer and composer Holly Herndon released a "digital twin" of herself called Holly+.
Anyone can upload audio to the Holly+ platform, which turns that audio into a new composition using a deepfake version of Herndon's voice.
The system is trained using multiple hours of Herndon's raw vocals according to its developers, Never Before Heard Sounds, who hope to create similar tools for other artists in the future.
Death steering future of audio deepfakes
Herndon's Holly+ platform is governed by a Decentralised Autonomous Organisation, or DAO.
It's a group of friends, family, collectors and other artists who Herndon allows to vote on which new works using her voice model can be sold, even after she dies.
Herndon takes a 10 per cent cut of profits from the Holly+ compositions which the DAO agrees can be sold, for the use of her voice.
The rights to profitable music catalogues are often passed to family estates when an artist dies, which Herndon says can focus too much on "short-term gains", so the Tennessee native is taking a more collaborative approach.
Loading
Herndon said that, because of technological advances, she believed creating convincing voices using AI would soon become standard practice for artists and celebrities.
"In stepping in front of a complicated issue, we think we have found a way to allow people to perform through my voice, reduce confusion by establishing official approval, and invite everyone to benefit from the proceeds generated from its use.