|
Post by bulkey on Dec 27, 2023 10:34:45 GMT -5
Some 30 years ago, I was raising money for my academic department in D.C. and they had me talk to a prominent lawyer-alum. He had recently moved into the field of digital copyright--at the time it was all about copyrighting code. Sounded a bit "niche" to me then. Now, it's among the most important questions facing us. In the academy of course we struggle with AI as plagiarism, but I never appreciated how copyright infringement of materials like the Times can redirect profits from their original authors. Entire article easily accessed at: www.bbc.com/news/technology-67826601US news organisation the New York Times is suing ChatGPT-owner OpenAI over claims its copyright was infringed to train the system.
The lawsuit, which also names Microsoft as a defendant, says the firms should be held responsible for "billions of dollars" in damages. ChatGPT and other large language models (LLMs) "learn" by analysing a massive amount of data often sourced online. ....
The lawsuit claims "millions" of articles published by the New York Times were used without its permission to make ChatGPT smarter, and claims the tool is now competing with the newspaper as a trustworthy information source.
It alleges that when asked about current events, ChatGPT will sometimes generate "verbatim excerpts" from New York Times articles, which cannot be accessed without paying for a subscription.
According to the lawsuit, this means readers can get New York Times content without paying for it - meaning it is losing out on subscription revenue as well as advertising clicks from people visiting the website...... NY Times itself has a longer article here: www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.htmlAmong interesting points: The Times is the first major American media organization to sue the companies, the creators of ChatGPT and other popular A.I. platforms, over copyright issues associated with its written works. .... The actress Sarah Silverman joined a pair of lawsuits in July that accused Meta and OpenAI of having “ingested” her memoir as a training text for A.I. programs. Novelists expressed alarm when it was revealed that A.I. systems had absorbed tens of thousands of books, leading to a lawsuit by authors including Jonathan Franzen and John Grisham. Getty Images, the photography syndicate, sued one A.I. company that generates images based on written prompts, saying the platform relies on unauthorized use of Getty’s copyrighted visual materials.
|
|
|
Post by swash on Dec 27, 2023 13:12:33 GMT -5
Some 30 years ago, I was raising money for my academic department in D.C. and they had me talk to a prominent lawyer-alum. He had recently moved into the field of digital copyright--at the time it was all about copyrighting code. Sounded a bit "niche" to me then. Now, it's among the most important questions facing us. In the academy of course we struggle with AI as plagiarism, but I never appreciated how copyright infringement of materials like the Times can redirect profits from their original authors. Entire article easily accessed at: www.bbc.com/news/technology-67826601US news organisation the New York Times is suing ChatGPT-owner OpenAI over claims its copyright was infringed to train the system.
The lawsuit, which also names Microsoft as a defendant, says the firms should be held responsible for "billions of dollars" in damages. ChatGPT and other large language models (LLMs) "learn" by analysing a massive amount of data often sourced online. ....
The lawsuit claims "millions" of articles published by the New York Times were used without its permission to make ChatGPT smarter, and claims the tool is now competing with the newspaper as a trustworthy information source.
It alleges that when asked about current events, ChatGPT will sometimes generate "verbatim excerpts" from New York Times articles, which cannot be accessed without paying for a subscription.
According to the lawsuit, this means readers can get New York Times content without paying for it - meaning it is losing out on subscription revenue as well as advertising clicks from people visiting the website...... NY Times itself has a longer article here: www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.htmlAmong interesting points: The Times is the first major American media organization to sue the companies, the creators of ChatGPT and other popular A.I. platforms, over copyright issues associated with its written works. .... The actress Sarah Silverman joined a pair of lawsuits in July that accused Meta and OpenAI of having “ingested” her memoir as a training text for A.I. programs. Novelists expressed alarm when it was revealed that A.I. systems had absorbed tens of thousands of books, leading to a lawsuit by authors including Jonathan Franzen and John Grisham. Getty Images, the photography syndicate, sued one A.I. company that generates images based on written prompts, saying the platform relies on unauthorized use of Getty’s copyrighted visual materials.Fascinating. Was the material used for training properly purchased or accessed for free via the open internet? If someone with an eidetic memory were to read all of those sources legally, no-one can fault them for having remembered. Not sure they'll win that point. Can they demonstrate that their material represents a substantive enough percentage of the total training base to render damage? Those bots reviewed an enormous pile of material ... far greater than all of us together could consume in our entire lifetimes. So the Times may represent a millionth of the total training material, maybe less. How would we even measure that percentage, by words, topics, articles, sentences, megabytes, time to process? If they prove even say one hundredth of one percent ... will those who despise that particular publication for its clear political bent scream for equal time/representation? Precedent already permits: a literary critique, a synopsis, a description of the text, a derivative of the theme using different characters or circumstances, and many more uses of the original outside of direct citation. NYT articles by their nature have a measurable value at publication. For the vast majority of their content, that value drops precipitously in the following weeks, and quickly asymptotically approaches zero. Anyone can ready older articles today ... for free ... on the NYT website ... Example selected at random: Top Stories of 2017The attribution portion can/should be addressed. Using someone else's words is not allowed, and AI training will need to include recognition and understanding of same. Trouble for the NYT is that they would have to prove that there is competitive gain to be had from the unattributed works. A high bar. If a student (or university president) uses unattributed text, that person can be punished by consumers of the derivative works, but the rarely the original creators. The Times would need to demonstrate that the plagiarist both gained from the action AND that said action harmed the copywrite owner. Perhaps their best hope, but not an easy row to hoe. Finally, in order to win the above battle, the AI would need to be shown to be in competition with the plaintiff. And the ground gets really squirrelly there, because the publisher may well use GPT to produce articles it then publishes... for profit.
|
|
|
Post by bulkey on Dec 27, 2023 13:24:37 GMT -5
Such a good point, swash, about the diminishing value of material by time (what in options trading is the theta). And yet, authors of books can easily gain remuneration for copyrighted materials beyond "fair use." So, what is the worth of news, either in print or digital? How long does it contain that value? Presumably that is the harm. The gain is that ultimately ChatGPT is charging a fee.
|
|
|
Post by bulkey on Dec 28, 2023 8:49:16 GMT -5
Noah Feldman, who is really, really smart, wrote a long piece on this in today's Bloomberg. Here: www.bloomberg.com/opinion/articles/2023-12-28/the-new-york-times-has-an-edge-in-suit-against-openai-microsoft?srnd=premium&sref=UGneIVZ3Fair use on this requires me to cut more than I want, but this is his final argument (edited). LLM = large language model But Microsoft and OpenAI will have a hard time refuting the final point — that their product, which relies on newsgathering businesses like the Times, will harm those businesses. ChatGPT and other LLMs cannot go out into the world to gather and vet new facts. They are restricted, for the foreseeable future, to “learning” from information that has already been published.
It follows that for LLMs to provide useful information, someone else — that is, a human LLM — must first gather the information, ascertain that it is accurate, and publish it. This is the essence of newsgathering. It’s costly to get it right.
What’s more, to know that we can rely on news, we need it to come from an institution that we can trust — one with a track record and a reputation it has a business interest in upholding. Otherwise, we would not have news. We would have an iterative echo chamber untethered from reality.
Here is where the fundamental public interest in the maintenance of the free press becomes relevant to the fair use question. If you can get information more cheaply from an LLM than from the New York Times, you might drop your subscription. But if everyone did that, there would be no New York Times at all. Put another way, OpenAI and Microsoft need the New York Times and other news organizations to exist if they are to provide reliable news as part of their service. Rationally and economically, therefore, they ought to be obligated to pay for the information they are using. .... The courts will need to be attuned to all this. If they don’t get it right, Congress will have to act. The news infrastructure is already tottering. If we destroy it altogether, democracy will be the loser.
|
|
|
Post by swash on Dec 28, 2023 12:50:48 GMT -5
Noah Feldman, who is really, really smart, wrote a long piece on this in today's Bloomberg. Here: www.bloomberg.com/opinion/articles/2023-12-28/the-new-york-times-has-an-edge-in-suit-against-openai-microsoft?srnd=premium&sref=UGneIVZ3Fair use on this requires me to cut more than I want, but this is his final argument (edited). LLM = large language model But Microsoft and OpenAI will have a hard time refuting the final point — that their product, which relies on newsgathering businesses like the Times, will harm those businesses. ChatGPT and other LLMs cannot go out into the world to gather and vet new facts. They are restricted, for the foreseeable future, to “learning” from information that has already been published.
It follows that for LLMs to provide useful information, someone else — that is, a human LLM — must first gather the information, ascertain that it is accurate, and publish it. This is the essence of newsgathering. It’s costly to get it right.
What’s more, to know that we can rely on news, we need it to come from an institution that we can trust — one with a track record and a reputation it has a business interest in upholding. Otherwise, we would not have news. We would have an iterative echo chamber untethered from reality.
Here is where the fundamental public interest in the maintenance of the free press becomes relevant to the fair use question. If you can get information more cheaply from an LLM than from the New York Times, you might drop your subscription. But if everyone did that, there would be no New York Times at all. Put another way, OpenAI and Microsoft need the New York Times and other news organizations to exist if they are to provide reliable news as part of their service. Rationally and economically, therefore, they ought to be obligated to pay for the information they are using. .... The courts will need to be attuned to all this. If they don’t get it right, Congress will have to act. The news infrastructure is already tottering. If we destroy it altogether, democracy will be the loser.Two thoughts: 1. The crux of this argument seems to assume that LLMs will be used primarily for reporting (thus the differentiating from news gathering) of current events. That's like saying Science infringes upon weather forecasting. Generative AI (LLM) is fundamentally an effort to introduce contextual language to systems. Until recently, one might dictate to a phone and it would do a fair job at interpreting the words, but since Siri has no context and cannot know which homonym to select or understand the context, it might intersperse items humorously because it cannot know that dialogue about women and basketball is unlikely to include kitchen implements or exoplanets. But ChatGPT is vastly broader. It is created to understand that context. Think of it as moving from interacting with a 2 year old to a precocious middle-schooler with a huge vocabulary. Using it for ditching schoolwork or for writing articles is not the purpose. Those are simply the first (least creative) applications of the new capabilities. 2. I fear that this will spur posts we do not need or want but we cannot consider the above statement without it.... The other point ... News you can trust ... is a critical societal issue. By every measure, those outlets (on all sides, certainly including the NYT) are growing more and more slanted over time. Gone are the days of "The most trusted man [person] in America" coming from ANY news outlet. The very thought is absurd today. Any wonder than that people choose to "get all of their news" from social media"? It should be no surprise that traditional sources are crumbling from literal disinterest.
|
|
|
Post by knightsbridgeaz on Dec 28, 2023 15:26:03 GMT -5
Noah Feldman, who is really, really smart, wrote a long piece on this in today's Bloomberg. Here: www.bloomberg.com/opinion/articles/2023-12-28/the-new-york-times-has-an-edge-in-suit-against-openai-microsoft?srnd=premium&sref=UGneIVZ3Fair use on this requires me to cut more than I want, but this is his final argument (edited). LLM = large language model But Microsoft and OpenAI will have a hard time refuting the final point — that their product, which relies on newsgathering businesses like the Times, will harm those businesses. ChatGPT and other LLMs cannot go out into the world to gather and vet new facts. They are restricted, for the foreseeable future, to “learning” from information that has already been published.
It follows that for LLMs to provide useful information, someone else — that is, a human LLM — must first gather the information, ascertain that it is accurate, and publish it. This is the essence of newsgathering. It’s costly to get it right.
What’s more, to know that we can rely on news, we need it to come from an institution that we can trust — one with a track record and a reputation it has a business interest in upholding. Otherwise, we would not have news. We would have an iterative echo chamber untethered from reality.
Here is where the fundamental public interest in the maintenance of the free press becomes relevant to the fair use question. If you can get information more cheaply from an LLM than from the New York Times, you might drop your subscription. But if everyone did that, there would be no New York Times at all. Put another way, OpenAI and Microsoft need the New York Times and other news organizations to exist if they are to provide reliable news as part of their service. Rationally and economically, therefore, they ought to be obligated to pay for the information they are using. .... The courts will need to be attuned to all this. If they don’t get it right, Congress will have to act. The news infrastructure is already tottering. If we destroy it altogether, democracy will be the loser.Two thoughts: 1. The crux of this argument seems to assume that LLMs will be used primarily for reporting (thus the differentiating from news gathering) of current events. That's like saying Science infringes upon weather forecasting. Generative AI (LLM) is fundamentally an effort to introduce contextual language to systems. Until recently, one might dictate to a phone and it would do a fair job at interpreting the words, but since Siri has no context and cannot know which homonym to select or understand the context, it might intersperse items humorously because it cannot know that dialogue about women and basketball is unlikely to include kitchen implements or exoplanets. But ChatGPT is vastly broader. It is created to understand that context. Think of it as moving from interacting with a 2 year old to a precocious middle-schooler with a huge vocabulary. Using it for ditching schoolwork or for writing articles is not the purpose. Those are simply the first (least creative) applications of the new capabilities. 2. I fear that this will spur posts we do not need or want but we cannot consider the above statement without it.... The other point ... News you can trust ... is a critical societal issue. By every measure, those outlets (on all sides, certainly including the NYT) are growing more and more slanted over time. Gone are the days of "The most trusted man [person] in America" coming from ANY news outlet. The very thought is absurd today. Any wonder than that people choose to "get all of their news" from social media"? It should be no surprise that traditional sources are crumbling from literal disinterest. Your 2nd point is well taken - I hate the amount of slant and opinion passing for news I read on what are purportedly "news" sites. Although traditional media I "think" generally gets the facts mostly right, the issue being whether or not they report something and the context they give it being how the slant is handled. The only reason I'm posting is the 2nd part of your quote, people choosing to "get all their news" from social media - getting any news from social media is an issue. Of course, trusting any source is getting harder - if you saw the piece on AI done by the weekly program Anderson Cooper hosts, they alternated an AI generated Anderson Cooper with the real one in the intro (done by someone who is going in the field, but had just graduated from high school). It was totally scary.
|
|
UHF
Husky Puppy
Posts: 249
|
Post by UHF on Jan 23, 2024 14:31:24 GMT -5
LISP/Lisp in college in early 1980s.
|
|