A prominent Australian novelist says writers face a "David and Goliath" battle to regain ownership of their work after thousands of books were "stolen" by a Silicon Valley-based AI aggregator.
Esperance author Fleur McDonald has written 21 books that have gone on to sell more than 750,000 copies.
But those decades of work were quickly nullified after she learned more than half of her books had been uploaded to an AI training program without her consent.
The revelation came as a shock, but turned to anger when Ms McDonald realised the scale of the "theft".
"It took a while for it to filter through to understand the ramifications of it," she said.
"I was just angry that it could happen, and then when I realised the extent of it, I was angry for a lot of the [other] authors."
The database in question, Books3, contains more than 180,000 works of literature, taken without permission, and used to develop AI linguistic software.
Ms McDonald said 12 of her 21 books had been uploaded, but said it was likely her whole catalogue had been used.
"There's still quite a few websites out there — there's no transparency in all of this, so they haven't let people know what books they're using," she said.
"We suspect that the whole lot has been taken."
Ms McDonald said the AI capture had become so pervasive, that she believed early transcripts of her next book due, for release at the end of October, may have already been ingested.
"I think AI is such a fast-evolving beast that we possibly didn't know what was coming," she said.
Tech in new ethical territory
Dozens of other top Australian authors including Tim Winton, Miles Franklin and Jane Harper have also been swept up in the Silicon Valley-based scandal, raising questions about the ethics of AI companies and the people who run them.
RMIT Professor of Information Sciences Lisa Given said the rapid development in AI had exposed long over-looked ethical shortfalls within the industry.
"The computing world, traditionally, has not had a history necessarily of training its scientists around ethical practices — I think they now have some new territory," she said.
"I don't believe that these creators are Machiavellian or that they're setting out to do harm.
"But, once you move that approach into society at large, you are going to have people that use those systems, not with social good at the front of their minds."
Professor Given said authors seeking remuneration or to stop tech giants using their work faced an uphill battle.
"The genies out of the bottle, as they say — it's very challenging when things are already being used, and people start to get creative about what those outputs can be," she said.
"Unfortunately, there's often a huge drive in human nature towards, at least in our current situation, people wanting to make money, they want to create products.
"Sometimes they're happy to turn a blind eye to who might be hurt in the process or for constraints and controls to be put in place."
The widespread co-opting of copyrighted work has been complicated, with Australian copyright laws unenforceable in other countries like the US.
Books3 creator Shaun Presser told The Atlantic he developed the dataset as a training resource for other developers to compete with tech giants such as OpenAI.
OpenAI, the developer of ChatGPT, is believed to have trained the AI system using two mystery datasets known as Books1 and Books2.
Federal Arts Minister Tony Burke was contacted and asked how the government would help authors caught up in the scandal, but did not respond.
US lawsuit to set precedent
Australian Publishers Association policy and government relations manager Stuart Glover said the international nature of the dispute meant there were few channels available to those seeking reparation.
"What Australian publishers face in terms of Australian-authored books, and also in terms of international-authored books that have been published in Australian editions in Australia, is a kind of jurisdictional problem about how to take action," Dr Glover said.
"Some AI machine operators are saying there hasn't been any illegal copying of material, so there's a contestation of the claim that there's been a copyright breach."
He said lawsuits being brought against AI and tech companies in the US by American authors would test the limits of what protections were available, and that could be done for authors closer to home.
"The outcomes of those suits will go a long way to determining what the next steps are and, for example, what the industry and authors might be able to do here, and what actions might be appropriate for government to take," he said.