Discover more from Legal AI
Stories from the AI Frontier
AI is Colliding with The Law at an Increasing Rate
Judges Hate ChatGPT Technology….Well, Not All of Them
There are judges in England. Yes, they have them. At a recent conference a judge was presenting on a topic unrelated to AI and admitted that he had recently used ChatGPT to assist in drafting an eventually published legal opinion. This was not portrayed as a mistake or some concealed and now revealed improper use of the tool. Instead the judge was marveling at how efficient it was for him to both summarize topics for his opinion and draft portions consistent with his intended decision.
Unlike the lawyer in the U.S. who recently, quite unintentionally, filed a pleading filled with hallucinated case citations, this judge was fully aware of what he was doing. He knew the legal basis for his decision, or the portion he used ChatGPT for, and was and did not
According to the Law Society Gazette, Lord Justice Birss spoke about AI, ChatGPT and generative large language models at a conference, saying: “I think what is of most interest is that you can ask these large language models to summarise information.”
The judge was enthusiastic in his praise of the utility of the tool to make judges more efficient in researching, preparing and writing their opinions.
ChatGPT falsely listed an American law professor as one of many legal scholars who had sexually harassed someone. In a case of damaging hallucination the ChatGPT response cited a non-existent The Washington Post article.
AI Can Act. This Is Precisely The Problem.
As many of you know, the union representing Hollywood writers and actors has been on strike for months now. The most noticeable effect of that strike thus far has been the cancellation of late night comedy shows (among others which are broadcasting re-runs or other content entirely). British actor Steven Fry recently spoke at a conference referencing his membership in that union, the strike and AI.
Among those warning about the technology’s potential to cause harm is British actor and author Stephen Fry, who told an audience at the CogX Festival in London on Thursday about his personal experience of having his identity digitally cloned without his permission.
“I’m a proud member of [actors’ union SAG-AFTRA]. As you know, we’ve been on strike for three months now. And one of the burning issues is AI,” Fry said.
The actors have a reasonable career/industry/paradigm ending concern - the possibility that notoriously efficiency-minded studios will use AI to digitally replicate their image without compensating them fairly for using their likeness. (Can you hear the lawyers for studios scrambling to re-write actor contracts for all future movies, television shows, commercials???)
One argument is that those paying actors can just begin to outline the actors’ pay as X amount to include their work on the given movie/show and to include the ability of the studio to use AI to represent that actor throughout the use of that movie/show into the future. (Remember those “contracts in perpetuity” from first year law school? All those arguments when we thought we knew so much and then, well, you got real familiar with unconscionable contracts. Good times).
Fry went on to give a powerful example. He played a clip of a documentary with a voiceover of his voice narrating.
“I said not one word of that—it was a machine,” he said. “They used my reading of the seven volumes of the Harry Potter books, and from that dataset an AI of my voice was created, and it made that new narration.” He pointed out that the audio of his voice was not a creative mashup of recordings of his actual voice. It was the creation of an AI algorithm that was fed typewritten text and was able to mimic all the familiar pauses, intonations and other subtle sounds that we are used to hearing in naturally spoken language.
Again with the trade-offs. The sympathy to the position of actors with thousands of hours of their voices and video depictions available to studios as rich datasets about their entire personhood simply being hijacked by an AI algorithm and a studio owner is real. However, to simply take a guillotine to the whole enterprise seems at once both too broad a solution and impossible to implement. Authors are now weighing in.
What Actually Happens To Create LLMs?
The most recent class action lawsuit against the usual suspects in the world of LLMs, OpenAI, Microsoft and others was filed recently in the United States District Court for the Southern District of New York. Up to now, the most well know of those suing for the use of their content was author/comedian Sarah Silverman. This most recent lawsuit, was filed by the following as listed in the complaint. AUTHORS GUILD, DAVID BALDACCI, MARY BLY, MICHAEL CONNELLY, SYLVIA DAY, JONATHAN FRANZEN, JOHN GRISHAM, ELIN HILDERBRAND, CHRISTINA BAKER KLINE, MAYA SHANBHAG LANG, VICTOR LAVALLE, GEORGE R.R. MARTIN, JODI PICOULT, DOUGLAS PRESTON, ROXANA ROBINSON, GEORGE SAUNDERS, SCOTT TUROW, and RACHEL VAIL, individually and on behalf of others similarly situated.
This is not only a legitimate complaint to have filed given the recent spate of such claims, but also certainly bolsters the arguments to be made. It is unlikely that any judge, attorney or other member of the staff of this district court has not read more than one of these authors. These are names we all recognize. They cannot be merely dismissed as some unknown content creators claiming their images were used by a tool like MidJourney or DALL-E to produce those image generative AI tools.
The Core Claim
These authors and the guild itself argue that the only means by which LLMs, such as ChatGPT, can be trained, is by copying their works as part of their datasets. The complaint notes that OpenAI does not specify the contents of its datasets, but has publicly admitted they include copyrighted works.
This is the esoteric question. The parsing here will undoubtedly border on the linguistic equivalent of twister. What does it mean to “copy” something? I have addressed what seems an apt analogy to this in the past. If a struggling author purchases all of Grisham’s books, reads them all, memorizes huge passages, evaluates how he sets scenes, builds characters, creates narrative arcs, etc. and then writes a unique book using these same techniques such that his book appears to be “in the style” of Grisham, what can Grisham do about that? Can he claim that the new author “copied” his books by reading and memorizing them?
What if it turns out that OpenAI trashes its current model and simply goes out and purchases a copy of each of the copyrighted works that were formerly in its dataset. Can it not read those works now that they have been purchased? And, is having a human read a copyrighted work that they purchased, obviously not a wrong under the law, different than putting a machine to that task instead?
In a fairly recent case on web scraping, a federal court ruled that the First Amendment protected a company using a web scraping tool to gather publicly available LinkedIn information (provided that use did not degrade the LinkedIn service). The court analogized this to a reporter at a public agency meeting writing notes in a notepad versus recording the meeting with his phone. The means of recording the otherwise publicly available information did not change the character of that information from First Amendment materials to something proprietary. The interesting thing about this case in the AI context is that, undoubtedly, any publicly available articles, quotes, posts on LinkedIn are protected by copyright. Can the web scraping company be required to compensate all persons whose copyrighted posts it scraped (i.e. copied, if that is considered copying).
In reality, copyright law is not designed to prevent or even punish someone who purchases a book and then proceeds to put each page on a copier and print them out….and then does nothing else. No one cares about the waste of time involved in the purposeless exercise of copying a book exclusively for personal use (i.e. no sale, dissemination or even communication that you have copied it) even if the book is copyrighted. The point of the copyright law was to protect the ability of that creator to exclusively monetize their work for a limited period of time. These lawsuits will have another hurdle to overcome as the Supreme Court has repeatedly stated its willingness to wade into copyright coverage for conduct not explicitly in the relevant statutes like this:
The judiciary's reluctance to expand the protections afforded by the copyright without explicit legislative guidance is a recurring theme. See, e. g., Teleprompter Corp. v. Columbia Broadcasting System, Inc., 415 U. S. 394 (1974); Fortnightly Corp. v. United Artists Television, Inc., 392 U. S. 390 (1968); White-Smith Music Publishing Co. v. Apollo Co., 209 U. S. 1 (1908); Williams & Wilkins Co. v. United States, 203 Ct. Cl. 74, 487 F. 2d 1345 (1973), aff'd by an equally divided Court, 420 U. S. 376 (1975). Sound policy, as well as history, supports our consistent deference to Congress when major technological innovations alter the market for copyrighted materials. Congress has the constitutional authority and the institutional ability to accommodate fully the varied permutations of competing interests that are inevitably implicated by such new technology. Sony Corp. of America v. Universal City Studios, Inc., 464 US 417 - Supreme Court 1984.
Is the use of copyrighted works to build LLMs explicitly prohibited by any federal copyright law? No. The arguments of these authors and others has to then be that the current notion of what the act covers encompasses the use to which the Defendants put their works. We shall see.
One Last Argument - The Irony
“Using copyrighted works in datasets for Language Learning Models (LLMs) without proper authorization constitutes a violation of copyright law because it involves the reproduction, distribution, and potentially the public display or performance of the copyrighted material. Copyright law is designed to protect the rights of creators and to incentivize the production of new creative works by granting exclusive rights to authors and creators. By incorporating copyrighted works into its datasets without permission, an LLM would be bypassing the established economic system that rewards creators for their efforts. Furthermore, the unauthorized use could harm the market for the original work or deprive the copyright holder of potential revenue. It's essential that companies and individuals respect these rights to maintain a fair and vibrant creative ecosystem.” — ChatGPT SEP 24, 2023.
My ChatGPT Prompt: “Write a paragraph about why it is a violation of copyright for LLMs to use copyrighted works in its datasets.”
The next logical prompt: “Write a paragraph about why it is not a violation of copyright for LLMs to use copyrighted works in its datasets.”
“In some arguments, incorporating copyrighted works into datasets for Language Learning Models (LLMs) does not necessarily constitute a violation of copyright law, especially if the usage is transformative, non-commercial, or falls under "fair use" provisions. For one, LLMs often process vast amounts of data to generalize patterns in language, and they don't typically reproduce, distribute, or publicly display the original works in their entirety. Instead, they extract and internalize statistical information to generate new content. This kind of use can be seen as transformative rather than duplicative. Moreover, the primary purpose of such models is educational or research-based, which can sometimes be exempt from typical copyright restrictions. Also, the inclusion of copyrighted material in these datasets does not usually harm the market value of the original work, as they aren't used as direct substitutes. It's a nuanced issue, and while careful consideration is essential, there are valid reasons to argue that LLM use might not infringe upon copyright laws.”
Who will win, ChatGPT or ChatGPT?
One last note for those of you software developers (current and former) and also legally trained, I noticed on twitter a user mention that the “chat with your PDF documents” concept in LLM development is becoming the “Hello World” of LLMs. Y’all know what I mean. Until next time….