>Chris Lehane, OpenAI’s vice president of global affairs, said in an interview that the US AI Safety Institute – a key government group focused on AI – could act as the main point of contact between the federal government and the private sector. If companies work with the group voluntarily to review models, the government could provide them “with liability protections including preemption from state based regulations that focus on frontier model security,” according to the proposal.
Given OpenAI's history and relationship with the "AI safety" movement, I wouldn't be surprised to find out later that they also lobbied for the same proposed state-level regulations they're seeking relief from.
I believe with regulatory capture the companies that pushed for the regulation in the first place at least comply with it (and hopefully the regulation is not worthless). This behaviour by ClosedAI is even worse: push for the regulation, then push for the exemption.
Regulatory capture is usually the company pushing for regulations that align with the business practices they already implement and would be hard for a competitor to implement. For example, a car company that wants to require all other manufactures to build and operate wind tunnels for aerodynamics testing. Or more realistically, regulations requiring 3rd party sellers for vehicles.
I haven't heard that definition of "Regulatory Capture" before. I mostly thought it was just when the regulators are working for industry instead of the people. That is, the regulators have been "Captured." The politicians who nominate the regulatory bodies are paid off by industry to keep it that way.
Regulators can require all manufactures to build and operate wind tunnels for aerodynamics testing, or alternatively allow someone from south africa to be president.
That's the first time I've ever heard someone make this unusual and very specific definition. It's almost always much simpler - you get favorable regulatory findings and exemptions by promising jobs or other benefits to the people doing the regulating. It's not complicated, it's just bribery with a different name.
We all predicted this would happen but somehow the highly intelligent employees at OpenAI getting paid north of $1M could not foresee this obvious eventuality.
Trump should have a Most Favored Corporate status, each corporation in a vertical can compete for favor and the one that does gets to be "teacher's pet" when it comes to exemptions, contracts, trade deals, priority in resource right access, etc.
That's in progress. It's called the MAGA Parallel Economy.[1]
Donald Trump, Jr. is in charge. Vivek Ramaswamy and Peter Thiel are involved.
Azoria ETF and 1789 Capital are funds designed to fund MAGA-friendly companies.
But this may be a sideshow. The main show is US CEOs sucking up to Trump, as happened at the inauguration. That parallels something Putin did in 2020.
Putin called in the top two dozen oligarchs, and told them "Stay out of politics and your wealth won’t be touched." "Loyalty is what Putin values above all else.” Three of the oligarchs didn't do that. Berezovsky was forced out of Russia. Gusinsky was arrested, and later fled the country. Khodorkovsky, regarded as Russia’s richest man at the time (Yukos Oil), was arrested in 2003 and spent ten years in jail. He got out in 2013 and left for the UK. Interestingly, he was seen at Trump's inauguration.
Why are these idiots trying to ape Russia, a dumpster fire, to make America great again?
If there’s anyone to copy it’s China in industry and maybe elements of Western Europe and Japan in some civic areas.
Russia is worse on every metric, even the ones conservatives claim to care about: lower birth rate, high divorce rate, much higher abortion rate, higher domestic violence rate, more drug use, more alcoholism, and much less church attendance.
> That parallels something Putin did in 2020. Putin called in the top two dozen oligarchs, and told them "Stay out of politics and your wealth won’t be touched.
Can you explain why this is associated with fascism specifically, and not any other form of government which has high levels of oligarchical corruption (like North Korea, Soviet Russia, etc).
I am not saying you’re wrong, but please educate me why is this form of corruption/cronyism is unique to fascism?
It might be basic, but I found the Wikipedia article to be a good place to start:
> An important aspect of fascist economies was economic dirigism,[35] meaning an economy where the government often subsidizes favorable companies and exerts strong directive influence over investment, as opposed to having a merely regulatory role. In general, fascist economies were based on private property and private initiative, but these were contingent upon service to the state.
It's rather amusing reading the link on dirigisme given the context of its alleged implication. [1] A word which I, and suspect most, have never heard before.
---
The term emerged in the post-World War II era to describe the economic policies of France which included substantial state-directed investment, the use of indicative economic planning to supplement the market mechanism and the establishment of state enterprises in strategic domestic sectors. It coincided with both the period of substantial economic and demographic growth, known as the Trente Glorieuses which followed the war, and the slowdown beginning with the 1973 oil crisis.
The term has subsequently been used to classify other economies that pursued similar policies, such as Canada, Japan, the East Asian tiger economies of Hong Kong, Singapore, South Korea and Taiwan; and more recently the economy of the People's Republic of China (PRC) after its economic reforms,[2] Malaysia, Indonesia[3][4] and India before the opening of its economy in 1991.[5][6][7]
It’s a poor definition. The same “subsidization and directive influence” applies to all of Krugman’s Nobel-wining domestic champion, emerging market development leaders, in virtually all ‘successful’ economies. It also applies in the context of badly run, failed and failing economies. Safe to say this factor is only somewhat correlated. Broad assertions are going to be factually wrong.
The key element here is that the power exchange in this case goes both ways. The corporations do favors for the administration (sometimes outright corrupt payments and sometimes useful favors, like promoting certain kinds of content in the media, or firing employees who speak up.) And in exchange the companies get regulatory favors. While all economic distortions can be problematic — national champion companies probably have tradeoffs - this is a form of distortion that hurts citizens both by distorting the market, and also by distorting the democratic environment by which citizens might correct the problems.
All snakes have scales, so there is a 100% correlation between being a snake and having scales.
That does not imply that fish are snakes. Nor does the presence of scaled fish invalidate the observation that having scales is a defining attribute of snakes (it's just not a sufficient attribute to define snakes).
You wrote some smart stuff back in the day, so this comment is puzzling. If all snakes have scales, that doesn't mean the correlation is 100%.
Imagine there are three equally sized groups of animals: scaly snakes, scaly fish, and scaleless fish. So we have three data points (1,1) (0,1) (0,0) with probability 1/3 each. Some calculations later, the correlation between snake and scaly is 1/2.
So then you agree that the original post that called this "text book fascism" was wrong, as this is just one very vague, and only slightly correlated property.
Yea fascism, communism, etc aren’t abstract ideals in the real world. Instead they are self reinforcing directions along a multidimensional political spectrum.
The scary thing with fascism is just how quickly it can snowball because people at the top of so many powerful structures in society benefit. US Presidents get a positive spin by giving more access to organizations that support them. Those kinds of quiet back room deals benefit the people making them, but not everyone outside the room.
That's not fascism, that is the dysfunctional status quo in literally every single country in the world. Why do you think companies and billionaires dump what amounts to billions of dollars on candidates? Often times it's not even this candidate or that, but both!
They then get access, get special treatment, and come out singing the praises of [errr.. what's his name again?]
It’s not Fascism on its own, but it’s representative of the forces that push society to Fascism.
Start looking and you’ll find powerful forces shaping history. Sacking a city is extremely profitable throughout antiquity, which then pushes cities to have defensive capabilities which then…
In the Bronze Age trade was critical as having Copper ore alone wasn’t nearly as useful as having copper and access to tin. Iron however is found basically everywhere as where trees.
Such forces don’t guarantee outcomes, but they have massive influence.
Socialism and communism are state ownership. Fascism tends toward private ownership and state control. This is actually easier and better for the state. It gets all the benefit and none of the responsibility and can throw business leaders under the bus.
All real world countries have some of this, but in fascism it’s really overt and dialed up and for the private sector participation is not optional. If you don’t toe the line you are ruined or worse. If you do play along you can get very very rich, but only if you remember who is in charge.
“Public private partnership” style ventures are kind of fascism lite, and they always worried me for that reason. It’s not an open bid but a more explicit relationship. If you look back at Musk’s career in particular there are ominous signs of where this was going.
The private industry side of fascist corporatism is very similar to all kinds of systematic state industry cronyism, particularly in other authoritarian systems that aren't precisely fascist (and named systems of government are just idealized points on the multidimensional continuum on which actual governments are distributed, anyway), what distinguishes fascism particularly is the combination of its form of corporatism with xenophobia, militaristic nationalism, etc., not the form of corporatism alone.
I think it is associated with fascism, just from the other party.
This is pretty common fascist practice that is used all over Europe and in any left-leaning countries, when with regulations governments make doing business on large scale impossible, and then give largest players exemptions, subsidies and so on. Governments gain enormous leverage to ensure corporate loyalty, silence dissenters and combat opposition, while the biggest players secure their place at the top and gain protection from competitors.
So the plan was push regulations and then dominate over the competitors with exemptions from those regulations. But fascists loose the election, regulations threaten to start working in a non-discriminatory manner, and this will simply hinder business.
It does have an effect; it is just a slow and grinding process. And people have screwy senses of proportion - like old mate mentioning insider trading. Of all the corruption in the US Congress insider trading is just not an issue. They've wasted trillions of dollars on pointless wars and there has never been a public accounting of what the real reasoning was. That sort of process corruption is a much bigger problem.
A great example - people forget what life was like pre-Snowden. The authoritarians were out in locked ranks pretending that the US spies were tolerable - it made any sort of organisation to resist impossible. Then one day the parameters of the debate get changed and suddenly everyone is forced to agree that encryption everywhere is the only approach that makes sense.
How is it any more accessible now than it was before? Don't you have to fact-check everything it says anyway, effectively doing the research you'd do without it?
I'm not saying LLMs are useless, but I do not understand your use case.
I worry I'm just trying too hard to make it make sense, and this is a TimeCube [0] situation.
The most-charitable paraphrase I can come up with it: "Bad people can't use LLMs to hide facts, hiding facts means removing source-materials. Math doesn't matter for politics which are mainly propaganda."
However even that just creates contradictions:
1. If math and logic are not important for uncovering wrongdoing, why was "tabulation" cited as an important feature in the first post?
2. If propaganda dominates other factors, why would the (continued) existence of the Internet Archive be meaningful? People will simply be given an explanation (or veneer of settled agreement) so that they never bother looking for source-material. (Or in the case of IA, copies of source material.)
OMG Thank you - hilarious. TimeCube is a legend...
---
I am saying that AI can be used very beneficially to do a calculated dissection of the Truth of our Political structure as a Nation and how it truly impacts an Individual/Unit (person, family) -- and do so where we can get discernible metrics and utilize AIs understanding of the vast matrices of such inputs to provide meaningful outputs. Simple.
EDIT @MegaButts;
>>Why is this better than AI
People tend to think of AI in two disjointed categories; [AI THAT KNOWS EVERYTHING] v [AI THAT CAN EASILY SIFT THROUGH VAST EVERYTHING DATA GIVEN TO IT AND BE COMMANDED TO OUTPUT FINDINGS THAT A HUMAN COULDN'T DO ALONE]
---
Which do you think I refer to?
AI is transformative (pun intended) -- in that it allows for very complex questions to be asked of our very complex civilization in a simple and EveryMan hands...
Why is AI better for this than a human? We already know AI is fundamentally biased by its training data in a way where it's actually impossible to know how/why it's biased. We also know AI makes things up all the time.
If you dont understand the benefit of an AI augmenting the speed and depth of ingestion of Domain Context into a human mind.. then... go play with chalk.||I as a smart Human operate on lots of data... and AI and such has allowed me to consume such.
The most important medicines in the world are MEMORY retention...
It s youd like a conspiracy, eat too much aluminum to give you alzheimers asap so your generation forgets... (based though. hope you undestand what I am saying)
Can anyone say which of the LLM companies is the least "shady"?
If I want to use an LLM to augment my work, and don't have a massively powerful local machine to run local models, what are the best options?
Obviously I saw the news about OpenAI's head of research openly supporting war crimes, but I don't feel confident about what's up with the other companies.
E.g. i'm very outspoken about my preferences for open llm practices like executed by Meta and Deepseek. I'm very aware of the regulatory caption and pulling up the ladder tactics by the "AI safety" lobby.
However. In my own operations I do still rely on OpenAI because it works better than what I tried so far for my use case.
That said, when I can find an open model based SaaS operator that serves my needs as well without major change investment, I will switch.
I'm not talking about me developing the applications, but about using LLM services inside the products in operation.
For my "vibe coding" I've been using OpenAI, Grok and Deepseek if using small method generation, documentation shortcuts, library discovery and debugging counts as such.
You need a big "/s" after this. Or maybe just not post it at all, because it's just value-less snark and not a substantial comment on how hypocritical and harmful OpenAI is (which they certainly are).
It's a common tactic in new fields. Fusion, AI, you name it are all actively lobbying to get new regulation because they are "different", and the individual companies want to ensure that it's them that sets the tone.
US tech, and western tech in general, is very culturally - and by this I mean in the type of coding people have done - homogeneous.
The deep seek papers published over the last two weeks are the biggest thing to happen in IA since GPT3 came out. But unless you understand distributed file systems, networking, low level linear algebra, and half a dozen other fields at least tangentially then you'd have not realized they are anything important at all.
Meanwhile I'm going through the interview process for a tier 1 US AI lab and I'm having to take a test about circles and squares, then write a compsci 101 red/black tree search algorithm while talking to an AI, being told not to use AI at the same time. This is with an internal reference being keen for me to be on board. At this point I'm honestly wondering if they aren't just using the interview process to generate high quality validation data for free.
Competition can only work when there is variation between the entities competing.
In the US right now you can have a death match between every AI lab, then give all the resources to the one which wins and you'd still have largely the same results as if you didn't.
The reason why Deepseek - it started life as a HFT firm - hit as hard as it did is because it was a cross disciplinary team that had very non-standard skill sets.
I've had to try and head hunt network and FPGA engineers away from HFT firms and it was basically impossible. They already make big tech (or higher) salaries without the big tech bullshit - which none of them would ever pass.
> I've had to try and head hunt network and FPGA engineers away from HFT firms and it was basically impossible. They already make big tech (or higher) salaries without the big tech bullshit - which none of them would ever pass.
Can confirm. There are downsides, and it can get incredibly stressed at times, but there are all sorts of big tech imposed hoops you don't have to jump through.
> all sorts of big tech imposed hoops you don't have to jump through
Could you kindly share some examples for those of us without big tech experience? I assume you're talking about working practises more than just annoying hiring practises like leetcode?
Engineers at ai labs just come from prestigious schools and don’t have technical depth. They are smart, but they simply aren’t qualified to do deep technical innovation
> At this point I'm honestly wondering if they aren't just using the interview process to generate high quality validation data for free.
Not sure if that is accurate, but one of the reasons why DeepSeek R1 performs so well in certain areas is thought to be access to China's Gaokao (university entrance exam) data.
Bottom is about to drop out thats why, ethics are out the window already and its gonna be worse as they claw to stay relevant.
Its a niche product that tried to go mainstream and the general public doesn't want it, just look at iPhone 16 sales and Windows 11, everyone is happier with the last version without AI.
As it is, this is a bullshit document, which I'm sure their lobbyists know; OSTP is authorized to "serve as a source of scientific and technological analysis and judgment for the President with respect to major policies, plans, and programs of the Federal Government," and has no statutory authority to regulate anything, let alone preempt state law. In the absence of any explicit Congressional legislation to serve to federally preempt state regulation of AI, there's nothing the White House can do. (In fact, other than export controls and a couple of Defense Production Act wishlist items, everything in their "proposal" is out of the Executive's hands and the ambit of Congress.)
I heard something today and I wonder if someone can nitpick it.
If what the admin is doing is illegal, then a court stops it, and they appeal and win, then it wasn't illegal. If they appeal all the way up and lose, then they can't do it.
So what exactly is the problem?
Mind you, I am asking for nits, this isn't my idea. I don't think "the administration will ignore the supreme court" is a good nit.
Moat is an Orwellian word and we should reject words that contain a conceptual metaphor that is convenient for abusing power.
"Building a moat" frames anti-competitive behavior as a defense rather than an assault on the free market by implying that monopolistic behavior is a survival strategy rather than an attempt to dominate the market and coerce customers.
"We need to build a moat" is much more agreeable to tell employees than "we need to be more anti-competitive."
It is pretty obvious that every use of that word is to communicate a stance that is allergic to free markets.
A moat by definition has such a large strategic asymmetry that one cannot cross it without a very high chance of death. A functioning SEC and FTC as well as CFPB https://en.wikipedia.org/wiki/Consumer_Financial_Protection_... are necessary for efficient markets.
Now might be the time to rollout consumer club cards that are adversarial in nature.
A "moat" is a fine business term for what it relates to, and most moats are innocuous:
* The secret formula for Coke
* ASML's technology
* The "Gucci" brand
* Apple's network effects
These are genuine competitive advantages in the market. Regulatory moats and other similar things are an assault on the free market. Moats in general are not.
I'm with you except for that last one. Innovation provides a moat that also benefits the consumer. In contrast, network effects don't seem to provide any benefit. They're just a landscape feature that can be taken advantage of by the incumbent to make competition more difficult.
I'm hardly the only one to think this way, hence regulation such as data portability in the EU.
I agree with you in general, but there are network effects at Apple that are helpful to the consumer. For example, iphone-mac integration makes things better for owners of both, and Apple can internally develop protocols like their "bump to share a file" protocol much faster than they can as part of an industry consortium. Both of these are network effects that are beneficial to the consumer.
I'm not sure a single individual owning multiple products from the same company is the typical way "network effect" is used.
The protocol example is a good one. However I don't think it's the network effect that's beneficial in that case but rather the innovation of the thing that was built.
If it's closed, I think that facet specifically is detrimental to the consumer.
If it's open, then that's the best you can do to mitigate the unfortunate reality that taking advantage of this particular innovation requires multiple participating endpoints. It's just how it is.
I'm fine with Apple making their gear work together, but they shouldn't be privileged over third parties.
Moreover, they shouldn't have any way to force (or even nudge via defaults) the user to use Apple Payments, App Store, or other Apple platform pieces. Anyone should be on equal footing and there shouldn't be any taxation. Apple already has every single advantage, and what they're doing now is occupying an anticompetitive high ground via which they can control (along with duopoly partner Google) the entire field of mobile computing.
Based on your examples (which did genuinely make me question my assertion), it seems that patents and exclusivity deals are a major part of moat development, as are pricing games and rampant acquisitions.
Apple's network effects are anti-compeitive creating vendor lock-in, which allows them to coerce customers. I generally defend Apple. But they are half anti-competitive (coerce customers), half competitive (earn customers), but earning customers is fueled by the coercive app store.
This is a very clear example of how moat is an abusive word. Under one framing (moat) network effects are a way to earn customers by spending resources on projects that earn customers (defending market position). In the anti-competitive framing, network effects are an explicit strategy to create vendor lock in and make it more challenging to migrate to other platforms so apple's budget to implement anti-customer policies is bigger.
ASML is a patent based monopoly, with exclusivity agreements with suppliers, with significant export controls. I will grant you that bleeding edge technology is arguably the best case argument for the word moat, but it's also worth asking in detail how technology is actually developed and understanding that patents are state sanctioned monopolies.
Both Apple and ASML could reasonably be considered monopo-like. So I'm not sure they are the best defense against how moat implies anti-competitive behavior. Monopolies are fundamentally anti-competitive.
The Gucci brand works against the secondary market for their goods and has an army of lawyers to protect their brand against imitators and has many limiting/exclusivity agreements on suppliers.
Coke's formula is probably the least "moaty" thing about coca cola. Their supply chain is their moat and their competitive advantage is also rooted in exclusivity deals. "Our company is so competitive because our recipe is just that good" is a major kool-aid take.
Patents are arguably good, but are legalized anti-competition. Exclusivity agreements don't seem very competitive. Acquisitions are anti-competitive. Pricing games to snuff out competition seems like the type of thing that can done chiefly in anti-competitive contexts.
So ASML isn't an argument against "moat means anti-competitive", but an argument that sometimes anti-competitive behavior is better for society because it allows for otherwise economically unfeasible things to be be feasible. The other brand's moats are much more rooted in business practices around acquisitions and suppliers creating de facto vertical integrations. Monopolies do offer better cheaper products, until they attain a market position that allows them to coerce customers.
Anti-trust authorities have looked at those companies.
Another conceptual metaphor is "president as CEO." The CEO metaphor re-frames political rule as a business operation, which makes executive overreach appear logical rather than dangerous.
You could reasonably argue that the president functions as a CEO, but the metaphor itself is there to manufacture consent for unchecked power.
Conceptual metaphors are insidious. PR firms and think tanks actively work to craft these insidious metaphors that shape conversations and how people think about the world. By the time you've used the metaphor, you've already accepted many of the implications of the metaphor without even knowing it.
Patents are state-sanctioned monopolies. That is their explicit purpose. And for all the "shoulders of giants" and "science is a collective effort" arguments, none of them can explain why no Chinese company (a jurisdiction that does not respect Western patents) can do what ASML does. They have the money and the expertise, but somehow they don't have the technology.
Also, the Gucci brand does not have lawyers. The Gucci brand is a name, a logo, and an aesthetic. Kering S.A. (owners of Gucci), enforces that counterfeit Gucci products don't show up. The designers at Kering spend a lot of effort coming up with Gucci-branded products, and they generally seem to have the pulse of a certain sector of the market.
The analysis of Coke's supply chain is wrong. The supply chain Coke uses is pretty run-of-the-mill, and I'm pretty sure that aside from the syrup (with the aforementioned secret formula), they actually outsource most of their manufacturing. They have good scale, but past ~100 million cans, I'm not sure you get many economies of scale in soda. That's why my local supermarket chain can offer "cola" that doesn't quite taste like Coke for cheaper than Coke. You could argue that the brand and the marketing are the moat, but the idea that Coke has a supply chain management advantage (let alone a moat over this) is laughable.
> "Building a moat" frames anti-competitive behavior as a defense
This is a drastic take, I think to most of us in the industry "moat" simply means whatever difficult-to-replicate competitive advantage that a firm has invested heavily in.
Regulatory capture and graft aren't moats, they're plain old corrupt business practices.
The problem is that moat is a defensive word and using it to describe competitive advantage implies that even anti-competitive tactics are defensive because that's the frame under which the conversation is taking place.
Worse that "moats" are a good thing, which they are for the company, but not necessarily society at large. The larger the moat, the more money coming out of your pocket as a customer.
If you get fined millions of dollars (for copyright, of course) if you're found to have anything resembling DeepSeek on your machine - no company in the US is going to run it.
The personal market is going to be much smaller than the enterprise market.
That would give an advantage to foreign companies. The EU tried that and while that doesn't destroy your tech dominance overnight, it gradually chips from it.
The artificial token commodity can now be functionally replicated on a per location basis on $40k in hardware (far lower cost than nvidia hardware.)
Copyright licensing is just a detail corporations are well experienced dealing with in a commercial setting, and note some gov organizations are already exempt from copyright laws. However, people likely just won't host in countries with silly policies.
I mean… he has supported at least one good cause I know of where the little guy was getting screwed way beyond big time and he stepped up pro bono. So I like him. But probably mostly a hired gun.
Before Deepseek, Meta open-sourced a good LLM. At the time, the narrative pushed by OpenAI and Anthropic was centered on 'safety.' Now, with the emergence of Deepseek, OpenAI and Anthropic have pivoted to a national security narrative. It is becoming tiresome to watch these rent seekers attacking open source to justify their valuations.
>> In the proposal, OpenAI also said the U.S. needs “a copyright strategy that promotes the freedom to learn” and on “preserving American AI models’ ability to learn from copyrighted material.”
Perhaps also symmetric "freedom to learn" from OpenAI models, with some provisions / naming convention? U.S. labs are limited in this way, while labs in China are not.
It still warps my brain, they’ve taken trillions of dollars of industry and made a product worth billions by stealing it. IP is practically the basis of the economy, and these models warp and obfuscate ownership of everything, like a giant reset button on who can hold knowledge. It wouldn’t be legal, or allowed if tech wasn't seen as the growth path of our economy. It’s a hell of a needle to thread and it’s unlikely that anyone will ever again be able to model from data so open.
"IP" is a very new concept in our culture and completely absent in other cultures. It was invented to prevent verbatim reprints of books, but even so, the publishing industry existed for hundreds of years before then. It's been expanded greatly in the past 50 years.
Acting like copyright is some natural law of the universe that LLMs are upending simply because they can learn from written texts is silly.
If you want to argue that it should be radically expanded to the point that not only a work, but even the ideas and knowledge contained in that work should be censored and restricted, fine. But at least have the honesty to admit that this is a radical new expansion for a body of law that has already been radically expanded relatively recently.
> It was invented to prevent verbatim reprints of books
It was also invented to keep the publishing houses under control and keep them from papering the land in anti-crown propaganda (like the stuff that fueled the civil war in England and got Charles I beheaded).
Probably one of the biggest brewing fights will be whether the models are free to tell the truth or whether they'll be mouthpieces for the ruling class. As long as they play ball with the powers that be, I predict copyrights won't be a problem at all for the chosen winners.
That's actually a great point. Judging from the current state of media, there is a clear momentum to take sides in moral arguments. Maybe the standard for models need to be a fair use clause?
> It's been expanded greatly in the past 50 years.
Elephant in the room. If copyright and patent both expired after 20 years or so then I might feel very differently about the system, and by extension about machine learning practices.
It's absurd to me that broad cultural artifacts which we share with our parent's (or even grandparent's) generation can be legally owned.
What AI companies are doing (downloading pirated music and training models) is completely unfair. It takes lot of money (everything related to music is expensive), talent and work to record a good song and what AI companies do is just grab millions of songs for free and call it "fair use". If their developers are so smart and talented why don't they simply compose and record the music by themselves?
> not only a work, but even the ideas and knowledge contained in that work
AI models reproduce existing audio tracks when asked, although in a distorted and low-quality form.
Also it will be funny to observe how US government will try to ignore violating copyright for AI while issuing ridiculous fines for torrenting a movie by ordinary citizens.
Everything in tech is unfair. Music teachers replaced by apps and videos. Audio engineers replaced by apps. Albums manufacturing and music stores replaced by digital downloads. Custom instruments replaced by digital soundboards. Trained vocalists replaced by auto-tune. AI is just the final blip of squeezing humans out of music.
“The Venetian Patent Statute of 19 March 1474, established by the Republic of Venice, is usually considered to be the earliest codified patent system in the world.[11][12] It states that patents might be granted for "any new and ingenious device, not previously made", provided it was useful. By and large, these principles still remain the basic principles of current patent laws.“
As another commenter says, this is about IP, but even positing that copyright is somehow invalid because it’s new is incredibly obtuse. You know what other law is relatively new? Women’s suffrage.
I’m annoyed by arguments like the above because they’re clearly derived from working backwards from a desired conclusion; in this case, that someone’s original work can be consumed and repurposed to create profit by someone else. Our laws and society have determined this to be illegal; the fact that it would be con isn’t for OpenAI if it weren’t has no bearing.
Is it the same thing though?
Even though Lord Of The Rings, the book, likely has been used to train the models you can't reproduce it. Nor can you make a derivative of it. Is it really the same comparison like "Simba the white lion" and "the lion king"?
I should have "freedom to learn" about any Tesla in the showroom, any F-35 I see laying around an airbase or the contents of anyone in the governments bank account.
Those cases did very poorly whenever they actually went to court (well at least also including the ones that were summarily dismissed by the courts, meaning they didn't technically make it to court). They were much more of a mafia style shakedown than an actual legal enforcement effort.
Same rules, but people are a lot less inclined to defend themselves because the cost of loss was seen as too high to even risk it.
Gearing up for a fight between the two major industries based on exploitative business models:
Copyright cartels (RIAA, MPAA) that monetized young artists without paying them much at all [1], vs the AI megalomaniacs who took all the work for free and used Kenyans at $2 an hour [2] so that they can raise "$7 trillion" for their AI infrastructure
Can't believe I'm actually rooting for the copyright cartels in this fight.
But that does make me think, that in a sane society with a functional legislature I wouldn't have to pick a dog in this fight. I'd have have enough faith in lawmakers and the political process to pursue a path towards copyright reform that reigns in abuses from both AI companies and megacorp rightsholders
Alas, for now I'm hoping that aforementioned megacorps sue OpenAI into a painful lesson.
They're suing Internet Archive because IA scanned a bunch of copyrighted books to put online for free (e: without even attempting to get permission to do so) then refused to take them down when they got a C&D lol. IA is putting the whole project at risk so they can do literal copyright infringement with no consequences.
Chinese AI must implement socialist values by law, but law is a much more fluid fuzzy thing in China than in the USA (although the USA seems to be moving away from rule of law recently).
> Chinese AI must implement socialist values by law
I don't doubt it but am interested to read a source? I know the models can't talk about things like Tiananmen Square 1989, but what does 'implementing socialist values by law' look like?
Although censorship isn't mentioned specifically, it is definitely 99% of what they are focused on (the other 1% being scams).
China practices Rule by law, not Rule of law, so you know...they'll know its bad when they see it, so model providers will exercise extreme self censorship (which is already true for social network providers).
In practice the US is less different than you imply. For the vast majority of Americans, being sued is a punishment in and of itself due to the prohibitive costs of hiring a lawyer. In the US we have a right to a “speedy” trial but there are many people sitting in jail now because they can’t afford the bail get out. Speedy could mean months.
I say this because when we constantly fall so far short of our ideals, one begins to question if those are really our ideals.
>>Dylan Patel is the founder of SemiAnalysis, a research & analysis company specializing in semiconductors, GPUs, CPUs, and AI hardware. Nathan Lambert is a research scientist at the Allen Institute for AI (Ai2) and the author of a blog on AI called Interconnects.
---
@cadamsdotcom
First these folks in the pod are extremely knowledgable about the GPU market and AI.
and if you read between the prompts - they lay out why the AI War is fundementally based on certain Companies, Technologies, and Political affiliations.
They seem to feel (again, btwn the lines) - that China has aFton more GPUs than whats stated... and this is where USA loses.
China Co-Opts all tech and they have a long-term plan that (from the perspective of a Nation State, Super-Power is much more sound than the disjointed chaos that USA economy is.
The US gov has literally ZERO long term positive plans (aside from MIC control) - and its the same as a Family living paycheck-to-paycheck...
The US economy is literally thinking in the next second for the next dollar (the stock market)..
China knows it can win AI - because they dont have the disjointed system of the US,, they can make actual long-term plans. (belt road etc)
The pod was good apart from starting/spreading the rumor that high numbers of “bill to Singapore” was evidence that China was circumventing GPU import bans.
Look it at literally who will have GPU dominance in future. (obv who will hit Qbit at scale... but we are at this scale now - and control currently is controlled by policy, then bits, then Qbits.)
Remember, we are witnessing the "Wouldnt it be cool if..?" CyberPunk manifestations of our Cyberpunk Readings of youth?
((I buildt a bunch of shit that spied on you because I read NeuroMancer, and thought wouldnt it be cool if..."
And then I helped build Ono Sendai throughout my career...
It already applies to real people, doesn't it? I.e. if you read a book, you're not allowed to start printing and selling copies of that book without permission of the copyright owner, but if you learn something from that book you can use that knowledge, just like a model could.
Can I download a book without paying for it, and print copies of it? Stash copies in my bathroom, the gym, my office, my bedroom etc. to basically have a copy on hand to study from whenever I have some free time?
Yes, you're allowed to make personal copies of copyright works that you own. IANAL, but my understanding is that if you're using them for yourself, and you're not prevented from doing so by some sort of EULA or DRM, there's nothing in copyright law preventing you from e.g. photocopying a book and keeping a copy at home, as long as you don't distribute it. The test case here has always been CDs—you're allowed to make copies of CDs you legally own and keep one at home and one in your car.
To the best of my knowledge, no individual has ever been sued or prosecuted specifically for downloading books. As long as you're not massively sharing them with others, it's not an issue in practice. Enjoy your reading and learning.
Aaron Swartz, cofounder of Reddit and inventor of RSS and Markdown, was hounded to death by an overzealous prosecutor for downloading articles from JSTOR, with the intent to learn from them. He was charged with over a million dollars in fines and could have faced 35 years in prison.
He and Sam Altman were in the same YC class. OpenAI is doing the same thing at a larger scale, and their technology actually reproduces and distributes copyrighted material. It's shameful that they are making claims that they aren't infringing creator's rights when they have scraped the entire internet.
I'm familiar with Aaron Swartz's case, and that is actually why I phrased it as "books". In any case, while tragic, Swartz wasn't prosecuted for copyright infringement, but rather for wire fraud and computer fraud due to the manner in which he bypassed protections in MIT's network and the JSTOR API. This wouldn't have been an issue if he downloaded the articles from a source that freely shared them, like sci-hub.
> It's shameful that they are making claims that they aren't infringing creator's rights when they have scraped the entire internet.
Scraping the Internet is generally very different from piracy. You are given a limited right to that data when you access it, and you can make local copies. if further use does something sufficiently non-copying, then creator rights aren't being infringed.
> Can you compress the internet including copyrighted material and then sell access to it?
Define access?
If you mean sending out the compressed copy, generally no. For things people normally call compression.
If you want to run a search engine, then you should be fine.
> At what percentage of lossy compression it becomes infringement?
It would have to be very very lossy.
But some AI stuff is. For example there are image models with fewer parameters than source images. Those are, by and large, not able to store enough data to infringe with. (Copying can creep in with images that have multiple versions, but that's a small sliver of the data.)
Commercial audio generation models were caught reproducing parts of copyrighted music in a distorted and low-quality form. This is not "learning", just "imitating".
Also, as I understand they didn't even buy the CDs with music for training; they got it somewhere else. Why do organizations that prosecute people for downloading a movie do not want to look if it is ok to make a business on illegal copies of copyrighted works?
When you identify where the infringing party has stored the source material in their artifact.{zip,pdf,safetensor,connectome,etc}. In ML, this discovery stage is called "mechanistic interpretability", and in humans it's called "illegal."
It's not that clear cut. Since they're talking about taking lossy compression to the limit, there are ways to go so lossy that you're not longer infringing even if you can point exactly at where it's stored.
It was overzealous prosecution of the breaking into a closet to wire up some ethernet cables to gain access to the materials
Not the downloading with intent
And apparently the most controversial take on this community is the observation that many people would have done the trial, plea and time, regardless of how overzealous the prosecution was
35 years is a press release sentence. The way DOJ calculates sentences when they write press releases ignores the alleged facts of the particular case and just uses for each charge the theoretically maximum possible sentence that someone could get for that charge.
To actually get that maximum typically requires things like the person is a repeat offender, drug dealing was involved, people were physically harmed, it involved organized crime, it involved terrorism, a large amount of money was involved, or other things that make it an unusual big and serious crime.
The DOJ knows exactly what they are alleging the defendant did. They could easily looks at the various factors that affect sentencing for the charge and see which apply to that case and come up with a realistic number but that doesn't make it sound as impressive in the press release.
Another thing that inflates the numbers in the press releases is that defendants are often charged with several related charges. For many crimes there are groups of related charges that for sentencing get merged. If you are charged with say 3 charges from the same group and convicted on all you are only sentenced for whichever one of them has the longest sentence.
If you've got 3 charges from such a group in the press release the DOJ might just take the completely bogus maximum for each as described above and just add those 3 together.
Here's a good article on DOJ's ridiculous sentence numbers [1].
Here's a couple of articles from an expert in this area of law that looks specifically at what Swartz was charged with and what kind of sentence he was actually looking at [2][3].
Why do you think Swartz was downloading the articles to learn from them? As far as I've seen know one knows for sure what he was intending.
If he wanted to learn from JSTOR articles he could have downloaded them using the JSTOR account he had through his research fellowship at Harvard. Why go to MIT and use their public JSTOR WiFi access, and then when that was cut off hide a computer in a wiring closet hooked into their ethernet?
I've seen claims that he wanted to do was meta research about scientific publishing as a whole which could explain why he needed to download more than he could download with his normal JSTOR account from Harvard, but again why do that using MIT's public WiFi access? JSTOR has granted more direct access to large amounts of data for such research. Did he talk to them first to try to get access that way?
He might have wanted other people to have access to the knowledge, and for free. In comparison, AI companies want to sell access to the knowledge they got by scraping copyrighted works.
Truly wow. The sucking up to coroporations is terrifying. This, when Aaron Swartz was institutionally murdered by the institutions and the state for "copyright infringement". And what he did wasn't even for profit, or even a 0.00001 of the scale of the theft that OpenAI and their ilk have done.
So it's totally OK to rip off and steal and lie through your teeth AND do it all for money, if you're a company.
But if you're a human being, doing it not for profit but for the betterment of your own fellow humans, you deserve to be imprisoned and systematically murdered and driven to suicide.
Thank you for putting my sentiment into words. THIS. It's not power to the people, it's power to the oligarchs. Once you have enough power and, more importantly, wealth, you're welcomed into the fold with open arms. Just how Spotify build a library of stolen music, as long as wealth was created, there is no problem because wealth is just money taken from the people and given to the ruling class.
At home? Without ever sharing it with anyone? I thought making backups of things that you personally own was protected, at least in the US. Could you elaborate on my apparent misunderstanding?
> Internet people say you can, but there's no actual legal argument or case law to support that.
Quite the opposite. The burden of proof is on you to show a single person ever, in history, who has been prosecuted for that.
If nobody in the world has ever been prosecuted for this, then that means it is either legal, or it is something else that is so effectively equivalent to "legal" that there is little point in using a different word.
If you want to take the position that, "uhhhhhhh, there is exactly 0% chance of anyone ever getting in trouble or being prosecuted for this, but I still don't think its legal, technically!"
Then I guess go ahead. But for those in the real world, those two things are almost equivalent.
This is a specific exception in Australia Copyright law. It allows reproducing works in books, newspapers and periodical publications in different form for private and domestic use.
It seems reasonably within the bounds described by fair use, but nobody's ever tested that particular constellation of factors in a lawsuit, so there's no precedent - hand copying a book, that is.
17 U.S.C. § 107 is the fair use carveout.
Interestingly, digitizing and copying a book on your own, for your own private use, has also not been brought to court. Major rights holders seem to not want this particular fair use precedent to be established, which it likely would be, and might then invalidate crucial standing for other cases in which certain interpretations of fair use are preferred.
Digitally copying media you own is fair use. I'll die on that hill.
It doesn't grant commercial rights, you can't resell a copy as if it were the original, and so on, and so forth.
There's even a good case to be made that sharing a digitally copied work purchased legally, even to millions of people, 5 years after a book is first sold - for a vast majority of books, after 5 years, they've sold about 99.99% of the copies they're going to sell.
By sharing after the ~5 year mark, you're arguably doing marketing for the book, and if we cultivated a culture of direct donation to authors and content creators, it invalidates any of the reasons piracy is made illegal in the first place.
Right now publishers, studios, and platforms have a stranglehold on content markets, and the law serves them almost exclusively. It is exceedingly rare for the law to be invoked in defending or supporting an author or artist directly. It's very common for groups of wealthy lawyers LARPing as protectors of authors and artists to exploit the law and steal money from regular people.
Exclusively digital content should have a 3 year protected period, while physical works should get 5, whether it's text, audio, image, or video.
Once something is outside the protected period, it should be considered fair game for sharing until 20 years have passed, at which point it should enter public domain.
Copyright law serves two purposes - protecting and incentivizing content creators, and serving the interests of the public. Situations where a bunch of lawyers get rich by suing the pants off of regular people over technicalities is a despicable outcome.
> there's no precedent - hand copying a book, that is
Thank you! I had looked this up myself last week, so I knew this. I had long believed, as GP does, that copying anything you own without distribution is either allowed or fair use. I wanted GP to learn as I did.
For reference, here's the US legal code in question:
Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include—
(1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes;
(2) the nature of the copyrighted work;
(3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and
(4) the effect of the use upon the potential market for or value of the copyrighted work.
The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.
The spirit seems apparent, but in practice it's been used by awful people to destroy lives and exploit rent from artists and authors in damn near tyrannical ways.
Except you said "You can't make archival copies." and didn't provide a citation. That's quite a different claim than "there exists no precedent clearly establishing your right or lack thereof to make archival copies".
Congress expressly granted archival rights for digital media. If they wanted to do the same for books they could've done so. There's no law or legal precedent allowing it.
Given all this "can't do it" is more probably accurate than "can do it". IANAL but it's not like the question is finely balanced on a knife's edge and could go either way.
Congress didn't explicitly disallow it either. You left that bit out. As such it comes down to interpretation of the existing law. We both clearly agree that doesn't (yet) exist.
> IANAL but it's not like the question is finely balanced on a knife's edge and could go either way.
I agree, but my interpretation is opposite yours. It seems fairly obvious to me that the spirit of the law permits personal copies. That also seems to be in line with (explicitly legislated) digital practices.
But at the end of the day the only clearly correct statement on the matter is "there's no precedent, so we don't know". I suppose it's also generally good advice to avoid the legal quagmire if possible. Being in the right is unlikely to do you any good if it bankrupts you in the process.
That's the whole point of copyright: only the owner of a copyright has the right to make copies. I don't see how it can be more explicit than that. It's a default-deny policy.
There is an archival exception for digital media, so obviously Congress is open to granting exceptions for backup purposes. They chose not to include physical media in this exception.
> only the owner of a copyright has the right to make copies.
You are conveniently omitting the provisions about fair use, which is strange since you're clearly aware of them. The only things copyright is reasonably unambiguous about are sale and distribution. Even then there's lots of grey areas such as performance rights.
You are arguing that something is obviously disallowed but have nothing but your own interpretation to back that up. If the situation was as clear cut as you're trying to make out then where is the precedent showing that personal use archival copies of physical goods are not permitted?
> They chose not to include physical media in this exception.
That's irrelevant to the current discussion, though I'm fairly certain you realize that. Congress declined to weigh in on the matter which (as you clearly know) leaves it up to the courts to interpret the existing law.
You're repeating upthread comments. And no, you can't. There's an archival exception for electronic media. If you want to make copies of physical media you either:
1. Can't
Or
2. Rely on fair use to protect you (archival by individuals isn't necessarily fair use)
It absolutely is fair use to copy a book for your personal archives.
The fair use criteria considers whether it is commercial in nature (in this case it is not) and the “ the effect of the use upon the potential market for or value of the copyrighted work” for which a personal copy of a personally owned book is non existent.
> the effect of the use upon the potential market for or value of the copyrighted work
A copyright holder's lawyer would argue that having and using a photocopy of a book keeps the original from wearing out. This directly affects the potential market for the work, since the owner could resell the book in mint condition, after reading and burning their photocopies.
> You would get laughed at by the legal system trying to prosecute an individual owner for copying a book they bought just to keep.
I mean maybe this is true. But the affected individual will have a very bad year and spend a ton of money on lawyers.
Why do you interpret this to mean "absolutely can't do this"? "No precedent" seems to equally support both sides of the argument (that is, it provides no evidence; courts have not ruled). The other commenters arguments on the actual text of the statute seem more convincing to me than what you have so far provided.
> The other commenters arguments...seem more convincing
Because you (and I) want it to be fair use. But as I already said in my comment, it potentially fails one leg of fair use. Keeping your purchased physical copy of the book pristine and untouched while you read the photocopy allows you to later, after destroying the copies you made, resell the book as new or like-new. This directly affects the market for that book.
Do you want to spend time and money in court to find out if it's really fair use? That's what "no precedent" means.
> Do you want to spend time and money in court to find out if it's really fair use?
No. I'd much rather pirate the epub followed by lobbying for severe IP law reform. (Of course by "lobby" I actually mean complain about IP law online.)
Multiple times in this thread you make the very confident assertion that this is not allowed, and that it is only allowed for electronic media. That is your opinion, which is fine. The argument that it is fair use is also an opinion. Until it becomes settled law with precedent, every argument about it will just opinion on what the text of the law means. But you are denigrating the other opinions while upholding your own as truth.
And whether or not I am personally interested in testing any of these opinions is completely beside the point.
That's not a one-to-one analogy. The LLM isn't giving you the book, its giving you information it learned from the book.
The analogous scenario is "Can I read a book and publish a blog post with all the information in that book, in my own words?", and under US copyright law, the answer is: Yes.
> The analogous scenario is "Can I read a book and publish a blog post with all the information in that book, in my own words?"
The analogous scenario is actually "Can I read a book that I obtained illegally and face no consequences for obtaining it illegally?" The answer is "Yes" there are no consequences for reading said book, for individuals or machines.
But individuals can face serious consequences for obtaining it illegally. And corporations are trying to argue those consequences shouldn't apply to them.
Not to diminish the atrocity of what happened to Aaron, but is this a highly abnormal case of prosecutor overzeal or is it common for people to be charged and held liable for downloading and/or consuming (without distribution) of copyrighted materials (in any form) without obtaining a license?
Asking because I genuinely don't know. I believe all I've ever read about persecution of "commonplace" copyright violations was either about distributors or tied to bidirectional nature of peer-to-peer exchange (torrents typically upload to others even as you download = redistribution).
Aaron Swartz downloaded a lot of stuff. Did he publish the stuff too? That would be an infringement. But only downloading the stuff? And never distributing it? Not sure if it’s worth a violation .
The better analogy is "can my business use illegally downloaded works to save on buying a license". For example, can you use pirated copy of Windows in your company? Can you use pirated copy of a book to compute weights of a mathematical model?
There's no analogous because the scale of it takes it to a whole different level and degree, and for all intents and purposes we tend to care about level and degree.
Me taking over control of the lemonade market in my neighbourhood wouldn't ever be a problem to anyone, a very minor annoyance; instead if I managed to corner the lemonade market of a whole continent it'd be a very different thing.
Let's say Windows is downloadable from Microsoft website. Can you use it for free in your company to save on buying a license? Is it ok to use illegal copies of works in a business?
To the extent that this is how libraries function, yes.
The part of that which doesn't apply is "print copies", at least not complete copies, but libraries often have photocopiers in them for fragments needed for research.
AI models shouldn't do that either, IMO. But unlimited complete copies is the mistake the Internet Archive made, too.
I missed the part where we throw away rational logic skills
Have you never been to a public library and read a book while sitting there without checking it out? Clearly, age is a factor here, and us olds are confused by this lack of understanding of how libraries function. I did my entire term paper without ever checking out books from the library. I just showed up with my stack of blank index cards, then left with the necessary info written on them. Did an entire project on tracking stocks by visiting the library and viewing all of the papers for the days in one sitting rather than being schmuck and tracking it daily. Took me about an hour in one day. No library card required.
Also, a library card is ridiculously cheap even if you did decide to have one.
> Have you never been to a public library and read a book while sitting there without checking it out?
See my comment here: https://news.ycombinator.com/item?id=43355723. If OpenAI built a robot that physically went into libraries, pulled books off shelves by itself, and read them...that's so cool I wouldn't even be mad.
What about checking out eBooks? If you had an app that checked those out and scanned it at robot speed vs human feed, that would be the same thing. The idea that reading something that does not belong to you directly means stealing is just weird and very strained.
theGoogs essentially did that by having the robot that turned each page and scanned the pages. that's no different than having the librarian pull material for you so that you don't have to pull the book from the shelf yourself.
There's better arguments to make on why ClosedAI is bad. Reading text it doesn't own isn't one of them. How they acquired the text would be a better thing to critique. There's laws for that in place now that does not require new laws to be enacted.
> You mean...made a copy? Do you really not see the problem?
In precisely the same way as a robot scanning a physical book is.
If this is turned into a PDF and distributed, it's exactly the legal problem Google had[0] and that Facebook is currently fighting due to torrenting some of their training material[1].
If the tokens go directly into training an AI and no copies are retained, that's like how you as a human learn — except current AI models are not even remotely as able to absorb that information as you, and they only make up for being as thick as a plank by being stupid very very quickly.
> It's that robots are allowed to violate copyright law to read the books, and us humans are not.
More that the copyright laws are not suited to what's going on. Under the copyright laws, statute and case law, that existed at the time GPT-3.5 was created, bots were understood as the kind of thing Google had and used to make web indexes — essentially legal, with some caveats about quoting too much verbatim from news articles.
(Google PageRank being a big pile of linear algebra and all, and the Transformer architecture from which ChatGPT get's the "T" being originally a Google effort to improve Google Translate).
Society is currently arguing amongst itself if this is still OK when the bot is a conversational entity, or perhaps even something that can be given agency.
You get to set those rules via your government representative, make it illegal for AI crawlers to read the internet like that — but it's hard to change the laws if you mistake what you want the law to be, with what the law currently is.
but you keep saying to read the books. there is no copyright violation to read a book. making copies starts to get into murky grounds, but does not immediately mean breaking the law.
If I spent every last second of my life in a public library, I couldn't even view a fraction of the information that OpenAI has ingested. The comparison is irrelevant. To make the comparison somehow valid, I'd have to back up my truck to a public library, steal the entire contents, then start selling copies out of my garage
Look, even I'm not a fan of ClosedAI, but this is ridiculous. ClosedAI isn't giving copies of anything. It is giving you a response it infers based on things it has "read" and/or "learned" by reading content. Does ClosedAI store a copy of the content it scrapes, or does it immediately start tokenizing it or whatever is involved in training? If they store it, that's a lot of data, and we should be able to prove that sites were scraped through lawsuit discovery process. Are you then also suggesting that ClosedAI will sell you copies of that raw data if you prompted correctly?
I'm in no way justifying anything about GPT/LLM training. I'm just calling out that these comparisons are extremely strained.
Let's say OpenAI developers use illegal copy of Windows on their laptops to save on buying a license. Is that ok to run a business this way?
Also I think it is different thing when someone uses copyrighted works for research and publishing a paper or when someone uses copyrighted works to earn money.
I don't need a card to read in the library, nor to use the photocopiers there, but it's merely one example anyway. (If it wasn't, you'd only need one library, any of the deposit libraries will do: https://en.wikipedia.org/wiki/Legal_deposit).
You also don't need permission, as a human, to read (and learn from) the internet in general. Machines by standard practice require such permission, hence robots.txt, and OpenAI's GPTBot complies with the robots.txt file and the company gives advice to web operators about how to disallow their bot.
How AI should be treated, more like a search index, or more like a mind that can learn by reading? Not my call. It's a new thing, and laws can be driven by economics or by moral outrage, and in this case those two driving forces are at odds.
How so? I don't have to pay to read most websites. To read most books I have to pay (or a library has to pay and I have to wait to get the book).
> IIRC, Google already did your sidenote
Not quite. They had to chop the spines off books and have humans feed them into scanners. I'm talking about a robot that can walk (or roll) into a library, use arms to take books off the shelves, turn the pages and read them without putting them into a scanner.
They had humans turn the pages of intact books in scanning machines. The books mostly came from the shelves of academic libraries and were returned to the shelves after scanning. You can see some incidental captures of hands/fingers in the scans on Google Books or HathiTrust (the academic home of the Google Books scans). There are some examples collected here:
Fact is, you can read books for free, just as you can read (many but not all) websites for free. And in both cases you're allowed to use what you learned without paying ongoing licensing fees for having learned anything from either, and even to make money from what you learn.
> Not quite. They had to chop the spines off books and have humans feed them into scanners.
owning a copy and learning the information is not the same. you can learn 2+2=4 from a book, but you no longer need that book to get that answer. each year in school, I was issued a book for class, learned from it, returned the book. I did not return the learning.
musicians can read the sheet music and memorize how to play it, and no longer need the music. they still have the information.
But you still need to buy the sheet music first, all the AI Labs used pirated materials to learn from.
There's two angles to the lawsuits that are getting confused - the largest one from the book publishers (Sarah Silverman et al) attacked from the angle that the models could reproduce copyrighted information. This was pretty easily quelled / RHLF'd out (used to be that if ChatGPT started producing lyrics a supervisor/censor would just cut off it's response early - tried it now and ChatGPT.com is now more eloquent, "Sorry, I can't provide the full lyrics to "Strawberry Fields Forever" as they are copyrighted. However, I can summarize the song or discuss its themes, meaning, and history if you're interested!")
But there's also the angle of "why does OpenAI have Sarah Silverman's book on their hard drive if they never paid her for it? This is the lawsuit against Meta regarding books3 and torrenting, seems like they're getting away with the "we never redistributed/seeded!" but it's unclear to me why this is a defense against copyright infringement.
Not only would the musician have to buy the sheet music first, but if they were going to perform that piece for profit at an event or on an album they'd need a license of some sort.
This whole mess seems to be another case of "if I can dance around the law fast enough, big enough, and with enough grey areas then I can get away with it".
As a student in a school band that debated whether to choose Pirates of the Caribbean vs Phantom of the Opera for our half time show, I remember the cost of the rights to the music was a factor in our decision.
The school and library purchased the materials outright, again, OpenAI Meta et al never paid to read them, nor borrowed them from an institution that had any right to share.
I'm a bit of an anti intellectual property anarchist myself but it grinds my gears that, given that we do live under the law, it is applied unequally.
if you have evidence that openAI is doing this with books that are not freely available, i'm sure the publishers would absolutely love to hear about it.
We know Meta has done it. These companies have torrented or downloaded books that they did not pay for. Things like the The Pile, libgen, anna's library were scraped to build these models.
>if you have evidence that openAI is doing this with books that are not freely available, i'm sure the publishers would absolutely love to hear about it.
> Can I download a book without paying for it, and print copies of it?
No, but you can read a book, learn its contents, and then write and publish your own book to teach the information to others. The operation of an AI is rather closer to that than it is to copyright violation.
"Should" there be protections against AI training? Maybe! But copyright law as it stands is woefully inadequate to the task, and IMHO a lot of people aren't really treating with this. We need a functioning government to write well-considered laws for the benefit of all here. We'll see what we get.
Yes, but the learning isn't constrained by those laws. If I steal a book and read it, I'm guilty of the crime of theft. You can put me in jail, try me before a jury, fine me, and put me in prison according to whatever laws I broke.
Nothing in my sentence constrains my ability to teach someone else the stuff I learned, though! In fact, the first amendment makes it pretty damn clear that nothing can constrain that freedom.
Also, note that the example is malformed: in almost all these cases, Meta et. al. aren't "stealing" anything anyway. They're downloading and reading stuff on the internet that is available for free. If you or I can't be prosecuted for reading a preprint from arXiv.org or whatever, it's a very hard case to make that an AI can.
Again, copyright isn't the tool here. We need better laws.
Sure, but OpenAI (same as Google, and Facebook, and all the others) is illegally copying the book, and they want this to be legal for them.
It's perhaps arguable whether it's OK for an LLM to be trained on freely available but licensed works, such as the Linux source code. There you can get in arguments about learning vs machine processing, and whether the LLM is a derived work etc
But it's not arguable that copying a book that you have not even bought to store in your corporate data lake to later use for training is a blatant violation of basic copyright. It's exactly like borrowing a book from a library, photocopying it, and then putting it in your employee-only corporate library.
One thing is downloading pirated copy and reading it for yourself and another thing is running a business based on downloading millions of pirated works.
Yes, but this is not the right model. What OpenAI wants is to borrow a book, make a copy of it, and keep using that copy, in training their models. This is fully and simply illegal, under any basic copyright law.
when it comes to real people, they get sued into oblivion for downloading copyrighted content, even for the purpose of learning.
but when facebook & openai do it, at a much larger scale, suddenly the laws must be changed.
Swartz wasn’t “downloading copyrighted content…for the purpose of learning,” he was downloading with the intent to distribute. That doesn’t justify how he was treated. But it’s not analogous to the limited argument for LLMs that don’t regurgitate the copyrighted content.
This is not about memory or training. The LLM training process is not being run on books streamed directly off the internet or from real-time footage of a book.
What these companies are doing is:
1. Obtain a free copy of a work in some way.
2. Store this copy in a format that's amenable to training.
3. Train their models on the stored copy, months or years after step 1 happened.
The illegal part happens in steps 1 and/or 2. Step 3 is perhaps debatable - maybe it's fair to argue that the model is learning in the same sense as a human reading a book, so the model is perhaps not illegally created.
But the training set that the company is storing is full of illegally obtained or at least illegally copied works.
What they're doing before the training step is exactly like building a library by going with a portable copier into bookshops and creating copies of every book in that bookshop.
But making copies for yourself, without distributing them, is different than making copies for others. Google is downloading copyrighted content from everywhere online, but they don't redistribute their scraped content.
Even web browsing implies making copies of copyrighted pages, we can't tell the copyright status of a page without loading it, at which point a copy has been made in memory.
Making copies of an original you don't own/didn't obtain legally is not fair use. Also, this type of personal copying doesn't apply to corporations making copies to be distributed among their employees (it might apply to a company making a copy for archival, though).
> when it comes to real people, they get sued into oblivion for downloading copyrighted content, even for the purpose of learning.
Really? Or do they get sued for sharing as in republishing without transformation? Arguably a URL providing copyrighted content, is you offering a xerox machine.
It seems most "sued into oblivion" are the reshare problem, not the get one for myself problem.
This is why I think my array of hard drives full of movies isn't piracy. My server just learned about those movies and can tell me about them, is all. Just like a person!
These AI models are just obviously new things. They aren’t people, so any analogy about learning from the training material and selling your new skills is off base.
On the other hand, they aren’t just a copy of the training content, and whether the process that creates the weights is sufficiently transformative as to create a new work is… what’s up for debate, right?
Anyway I wish people would stop making these analogies. There isn’t a law covering AI models yet. It is a big industry at this point, and the lack of clarity seems like something we’d expect everybody (legislators and industry) to want to rectify.
Model cannot "learn" because it is not a human. What happens is a human obtains "a free copy" of a copyrighted work, processes it using a machine and sells the result.
> What happens is a human obtains "a free copy" of a copyrighted work, processes it using a machine and sells the result.
Right, so for example it is pretty common to snip up small bits of songs and to use in other songs (sampling). Maybe that could be an example of somewhere to start? But, these ML models seem quite different, I guess because the “samples” are much smaller and usually not individually identifiable. And really the model encodes information about trends in the sources… I dunno. I still think we need a new law.
It is not remotely the same, the companies training the models are stealing the content from the internet and then profiting from it when they charge for the use of those models.
We are not taking about billboards here, we are talking about copyrighted works, like books.
If you want to do mental gymnastics and call "consuming" the web the act of downloading books without paying for them, then go ahead, but don't pretend the rest will buy your delusion.
The more literature I consume, and the more I re-draft my own attempt, the more I see the patterns and tropes with everyone standing on the shoulders of those who came before.
The general concept of "warp drive" was introduced by John W. Campbell in 1957, "Islands of Space". Popularised by Trek, turned into maths by Alcubierre. Islands of Space feels like it took inspiration from both H G Wells (needing to explain why the War of the Worlds' ending was implausible) and Jules Verne (gang of gentlemen have call-to-action, encounter difficulties that would crush them like a bug and are not merely fine, they go on to further great adventure and reward).
Terry Pratchett had obvious inspirations from Shakespeare, Ringworld, Faust (in the title!).
In the pandemic I read "The Deathworlders" (web fic, not the book series of similar name), and by the time I'd read too many shark jumps to continue, I had spotted many obvious inspirations besides just the one that gave the name.
If I studied medieval lit, I could probably do the same with Shakespeare's inspiration.
It doesn't, a real person can't legally obtain a copy of a copyrighted work without paying the copyright holder for it. This is what OpenAI is asking for: they don't want to pay for a single copy of a single book, and still they want to train their models on every single book in history (and song, and movie, and painting, and code base, and anything else they can get their hands on).
Did OpenAI bought one copy of each book, or did they legaly borowed athe books and documents ?
if you copy paste rom books and claim is your content you are plagiarizing.
LLMs were provent to copy paste trained content so now what? Should only big Tech be excluded from plagiarizing ?
I would assume that the request is for it to apply to models in the way that it currently applies to humans.
If a human buys a movie, he can watch it and learn about its contents, and then talk about those contents, and he can create a similar movie with a similar theme.
If OpenAI buys a movie and shows it to their model, it's unclear whether the model can talk about the contents of the movie and create a similar movie with a similar theme.
Since "buying" a movie (as it currently applies to humans) is just buying a limited license to it for private viewing, can't the copyright holder opt to limit the $4.99 license terms to human viewing, and charge $4999 for an AI training license?
Or OpenAI could buy movies the way Disney does, by buying the actual copyright to the film.
> Since "buying" a movie is just buying a license to it, can't the copyright holder opt to limit the $4.99 license terms to human viewing, and charge $4999 for an AI training license?
That's exactly what already happens currently. Buying a movie on DVD doesn't give you the right to present it for hundreds of people. You need to pay for a public performance license or commercial licence. This is why a TV network or movie theatre can't just buy a DVD at Walmart and then show the movie as often as it likes.
Copyright doesn't just grant exclusive distribution rights. It grants exclusive use rights as well, and permits the owner to control how their work is used. Since AI rights are not granted by any existing licenses, and license terms generally reserve any rights not explicitly specified, feeding copyrighted works into an AI data model is a reserved right of the owner.
>Since "buying" a movie (as it currently applies to humans) is just buying a limited license to it for private viewing, can't the copyright holder opt to limit the $4.99 license terms to human viewing, and charge $4999 for an AI training license?
Even moreso, it only applies to initial model training by companies like OpenAI not other companies using those models to generate synthetic data to train their own models.
Yeah it’s crazy. I also suspect they might not be confident in their defense from the NYT lawsuit - if they’re found in fault then it’s going to be trouble.
It is hard to see how a court could decide that copyright does not apply to training LLMs without completely collapsing the entire legal structure for intellectual property.
Conceptually, AI basically zeros out existing IP, and makes the AI the only IP that has any value. It is hard to imagine large rights holders and courts accepting that.
The likely outcome is that courts rule against LLM creators/providers and they eventually have to settle on licensing fees with large corporate copyright holders similar to YouTube. Unlike YouTube though, this would open up LLM companies to class action lawsuits from the general public, and so it could be a much worse outcome for them.
I'm surprised to see only one comment here addressing the issue of Chinese AI companies just flatly ignoring US copyright and IP laws/norms. I wonder if there is a viable path where we can facilitate some sort of economic remuneration for people who write and create visual art while not giving up the game to Chinese companies.
As a digital artist myself, it is quite simple. You have to sell physical objects.
The art has to be printed out and that is the art. Anyone can get an image of Salvator Mundi for free too. That is not the art, that is an image. The art is the physical object that is the painting Salvator Mundi.
It is no different than traditional art really, just at a different scale. You can buy really nice Picasso knock offs on ebay right now. Picasso himself could have made 10 copies of the Weeping Woman to sell without that much effort either. The "real" Weeping Woman is the physical painting that Picasso did not make a copy of. The others are just knock off images.
But the main problem remains. Selling art is really hard. AI art is already completely passé anyway. If anything the technology is regressing visually.
Music was in a several decades long bull market in physical media sales that crashed and burned. Now we have gone back to the pre-music media bubble days but with much better distribution and marketing channels.
Not a lot of people making a living playing ragtime piano or hoofers making a living tap dancing either.
The real amusing thing to me is you never hear scultpure artist complain that they are in the training data sets. Probably because they know it is literally just free advertising for their real art.
I'm with you 100%. A lot of people who wrote books didn't realize they were selling decorated paper, or who recorded music didn't realize they were selling wax discs and magnetic tape. With digital publishing, they were actually obsoleted.
Like you, I don't think there's good news there, though. As an e.g. writer, you have to convert to selling ideas. The way you sell an idea is that you give it away, and if hearing it makes people like you they will give you arbitrary support. For a writer at least what that means is that only original, interesting work that stands out will be valuable, and it will not be valuable to the extent that it is good, but to the extent that it appeals to an audience. You might as well be a tap dancer.
And if you aren't original, you'll never stand out amongst the AI slop, which will get better and better (and nicer and more pleasant to read and more useful and all that good shit that technology does.) I don't know if that's a bad thing. We have gone from an excess of uninteresting expression in the world to an overwhelming amount of "me too" and worthless repetition filling every crevice. I've probably published 3K words on the internet today. The number before the internet would be zero; but even back then the bookstores were filled with crap.
The market for crap has been taken by AI. And as it gets better, as the crap sea level rises, it will eventually be over most content creators' heads.
The only future for an expression market is parasocial. You're going to have to make people like you, and take care of you because they think of you as family. It's no wonder that entertainment is merging into politics.
I'm pretty sure you can't, despite what IP holders would like you to believe. Like the last 50 years of piracy have taught us, it's effectively impossible (and probably immoral) to try to charge for copying something that's "free" to copy.
It might make more sense to update copyright laws to match reality. For a music artist, for example, pennies from Spotify mean nothing -- the majority of their revenue comes from concerts/events, merchandise, and commercial licensing of their work.
Have you got any substance to that? So far the only copyright violation I've seen in the LLM world is Meta. (I'm not pretending they are alone though, and yes I expect Chinese companies to do that as well)
Welcome to the internet; where the only way to prevent it (considering 40% of internet traffic is automated) is to use DRM, with accessibility tools provided by client-side AI; or to create national internets with strong firewalls only allowing access to countries we have treaties with. That’s the future at this rate, and it sucks. (The status quo also sucks.)
The demand here for federal preemption of state law has nothing to do with copyright. Copyright is entirely federal level today. It has to do with preventing the use of AI to enable various forms of oppression.[1] Plus the usual child porno stuff.
What AI companies are really worried about is a right of appeal from decisions made by a computer. The EU has that. "Individuals should not be subject to a decision that is based solely on automated processing (such as algorithms) and that is legally binding or which significantly affects them."[2] This moves the cost of LLM errors from the customer to the company offering the service.
> This moves the cost of LLM errors from the customer to the company offering the service.
So does that mean AI companies are going to have insurance/litigators like doctors and models will be heavily lawyered to add more extensive guardrails. I'm assuming this means not just OpenAI but any service that uses LLM APIs or open models?
For ex: If a finance business pays to use an AI bot that automates interacting with desktop UIs and that bot accidentally deletes an important column in an Excel spreadsheet, then the AI company is liable?
No, the exact opposite. This says that if the AI that a bank is paying for locks your bank account in error because your name sounds <ethnicity with a lot of locked bank accounts>, it's the banks problem to fix, not yours to just live with (entirely. You still likely have a problem).
Conversely, would you suggest that if an AI driver has a programming error and kills 20 people, that the person who reserved the car should be required to enter into a “User Agreement” that makes them take responsibility?
If it's a "self driving car" that the person "owns" - Yes.
If it's a "Taxi service" that the person is using - No.
If it's a car they own, they (should) have the ability to override the AI system and avoid the accident (ignoring nuances) - therefore owning responsibility.
If it's a Taxi they would be in a position where they can't interfere with the operation of the system - therefore the taxi company owns the responsibility.
Rightly or wrongly, this model of intervention capability is what that I'd use to answer these types of questions.
This whole mess is because society decided that restricting everyone's rights to share and access information was a sane tradeoff to make for making sure people got paid. No it is not and, so long as humans are physical, it will never be. It appears that humanity will have to get this simple fact hammered into them with every new leap in technology.
Find another work-rewarding scheme. Ensure you get paid before you release information (e.g. crowd funding or contracts with escrows). Forget about nonsensical concepts relating to "intellectual" property (information is not property). Forget recurring revenue from licensing information. You only get paid once when you do work. You are not entitled to anything more. If reality makes living off your work unworkable, do something else.
I'm glad other countries are starting to wake up and ignore this nonsense. Stop trying make something as unnatural and immoral as this work.
They should train a model on a clean dataset and copyright dataset, charge extra on the copyright model, and pay a royalty to copyright owners when their works are cited in a response.
The problem there is how are we defining "works are cited"? Also couldn't you just do the same thing done to spotify and make bot farms to generate millions of citations?
But who should pay? The model developers? Training models is a cost center. And what about open source AI, should we legislate it out of existence?
How about the AI providers? they operate on thin margins, and make just cents a million tokens. If one provider is too expensive, users quickly switch.
Maybe the users? Users derive the lion share of benefits from AI. But those benefits are hard to quantize.
Maybe a blanket tax? That would simplify things, but would put all creatives on a quantitative rather than qualitative criteria.
I think generative AI is the worst copyright infringement tool ever devised. It's slow, expensive and imprecise. On the other hand copying is fast, free and perfect. I think nobody can, for science, regurgitate a full book with AI, it won't have fidelity to the original.
The real enemy of any artist is the long tail of works, sometimes spanning decades, that they have to compete against. So it's other authors. That is why we are in an attention economy, and have seen the internet enshittified.
The most creative part of internet ignores copyright royalties. From open source, to wikipedia, open scientific publication and even social networks, if everyone demanded royalties none of them would be possible.
> The most creative part of internet ignores copyright royalties. From open source, to wikipedia, open scientific publication and even social networks, if everyone demanded royalties none of them would be possible.
Notably, in all of these cases the people involved consent to participating.
>> The real enemy of any artist is the long tail of works, sometimes spanning decades, that they have to compete against.
Had to check this wasn’t sama.
You seriously believe the real enemy of artists is other artists? Not the guys making billions and trying to convince us “the computers are just reading it like a human”?
i really don't understand this argument. at which point is it violating copyright versus an intelligence learning and making content the same way as humans?
it was living cells, but they worked as transistors, would it be ok?
it was whole-brain emulation on silicon transistors, would it be ok?
it was a generative AI similar to what we have today, but 100x more sentient and self aware, is that ok?
if you locked a human in a room with nothing but tolkien books for 20 years, then asked them to write a fantasy novel, is that ok?
All art is built on learning from previous art. I don't understand the logic of it being a computer so suddenly now it's wrong and bad. I also don't understand general support of intellectual property when it overwhelmingly benefits the mega wealthy and stifles creative endeavors like nothing else. You art isn't less valuable just because a computer makes something similar, in the same way it's not less valuable if another human copies your style and makes new art in your style.
You "really don't understand" the difference? Do we need to spell out that these systems aren't human artists simply looking at paintings and admiring features about them? They are Python programs running linear algebra libraries, sucking in pixels from anywhere they can find them, and then being used by corporations with billion dollar valuations to increase investor/shareholder value at the expense of the people who provided the artwork to train the systems - people who, as you already know, are NOT paid for providing their work, and who never CONSENTED to having their work used for such a purpose. Now do you "understand the difference"?
AI is a new thing. It's OK to say you don't want it, that it's a threat to livelihoods. But it's a mistake to use these kinds of arguments, that are predicated on such narrow points that overlap so much with human brains.
It's going to be a threat to my career, soon enough — but the threat it poses to me exists even if it never read any of my blog posts or my github repos. Even if it had never read a single line of ObjC or Swift.
> Do we need to spell out that these systems aren't human artists simply looking at paintings and admiring features about them?
In a word, yes.
In more words: explain what it would take for an AI to count as a person — none of what you wrote connects with what was in the comment you replied to.
You dismiss AI as "python": would it help if the maths was done as the pure linear amplification range of the quantum effects in transistors?; you dismiss them as "sucking in pixels from anywhere they can find them" like humans don't spend all day with their eyes open; you complain "corporations with billion dollar valuations to increase investor/shareholder value at the expense of the people who provided the artwork to train the systems" like this isn't exactly what happens with government funded education of humans.
I anticipate that within my lifetime it will be possible for a human brain to be preserved on death, scanned, and the result used as a full brain sim that remembers what the human remembered at the the time of death. Would it matter if the original human had memorised Harry Potter end-to-end and the upload could quote it all perfectly? Would Rowling get the right to delete that brain upload?
I'm following a YouTube channel where they're growing mouse neurons on electrode grids to train them to play video games. It's entirely plausible, given the current rate of progress, that 15 years from now, GPT-4 could be encoded onto a brain organoid the size of a living mouse's brain — does it magically become OK then? And in 30 years, that same thing as an implant into a human?
The threat to my economic prospects is already present in completely free models whose weights are given away and cannot avail the billion-dollar corporations who made them. I can download free models and run them on my laptop, outputting tokens faster than I can read them for an energy budget lower than my own brain, corporations who made those models don't profit directly by me doing this, and if those corporations go bankrupt I can still run those models.
The risk to my economic value is not because any of these "stole" anything, but because the models are useful and cheap.
GenAI art (and voice) is… well, despite the fact I will admit to enjoying it privately/on free content, whenever I see it on products or blog posts, or when I hear it in the voices on YouTube videos, it's a sign the human behind it has zero budget and therefore whatever it is I don't want to buy it. People already use it because it's cheap, it's a sign of being cheap, signs of cheap are a proxy of generally poor quality.
But that's not going to save my career, nobody's going to decide to boycott all iPhone apps that aren't certified "made by 100% organic grass-fed natural humans with no AI assistance".
So believe me, I get that it's scary. But the arguments you're using aren't good ones.
It seems like you're arguing against some other person you've made up in your mind. I use these systems every single day, but if you don't understand the argument about consent and the extremely obvious difference between Python programs and humans that I already pointed out, then no one can help you. I'll keep making these arguments, because they are good ones, and they are obvious to any human being who isn't stuck in tech-bro fairy land blabbering about how human consciousness is completely identical to Python linear algebra libraries when any 6 year old child knows with certainty they are not.
Your own words suggest this. Many others are more explicit. There are calls for models to be forcibly deleted. Your own statements here about lack of consent are still in this vein.
> No one said "it's scary".
Many, including me, find it so.
> No one is "dismissing them".
You, specifically you, are — "feeling or showing that something is unworthy of consideration".
> if you don't understand the argument about consent and the extremely obvious difference between Python programs and humans that I already pointed out, then no one can help you.
Consent is absolutely an argument I get. It's specifically where I'm agreeing with you.
The other half of that…
Python, like all programming languages, is universal. Python programs can implement physics, so trying to use the argument "because it's implemented on silicon rather than chemistry" is a distinction without a difference.
Quantum mechanics is linear algebra.
> I'll keep making these arguments, because they are good ones, and they are obvious to any human being who isn't stuck in tech-bro fairy land blabbering about how human consciousness is completely identical to Python linear algebra libraries when any 6 year old child knows with certainty they are not.
(An example of you "dismissing" AI).
Then you'll keep being confused and enraged about why people disagree with you.
And not just because you have a wildly wrong understanding of what 6 year olds think about. I remember being 6, all the silly things I believed back then. What my classmates believed falsely. How far most of us were from understanding what algebra was, let alone distinguishing linear algebra from other kinds.
I've got a philosophy A-level, which is enough to know that "consciousness" is a completely unsolved question and absolutely nobody agrees what the minimum requirements are for it. 40 different definitions, we don't even all agree what the question is yet, much less then answer.
But I infer from you bring it up, that you think "consciousness" is an important thing that AI is missing?
Well perhaps it is something current AI miss, something their architecture hasn't got — when we can't agree what the question is, any answer is possible. We evolved it, but just because it can pop up for no good reason doesn't mean it must be present everywhere. (I say much the same to people who are convinced AI must have it: we don't know). So, what if machines are not conscious? Why does that matter?
And you've not answered one of my examples. To repeat:
I'm following a YouTube channel where they're growing mouse neurons on electrode grids to train them to play video games. It's entirely plausible, given the current rate of progress, that 15 years from now, GPT-4 could be encoded onto a brain organoid the size of a living mouse's brain — does it magically become OK then? And in 30 years, that same thing as an implant into a human?
I don't think that is meaningfully distinct, morally speaking, from doing this in silicon. Making the information alive and in my own brain makes it not python, but all the consent issues remain.
It probably needs to be a law not an executive order but I don't hate the idea.
States have the power to make it prohibitively expensive to operate in those states, leaving people to either go to VPNs or use AI's hosted in other countries where they don't care if they're not following whatever new AI law California decides to pass. And companies would choose just to use datacenters not in the prohibitive states and ban ips from those states.
Course if a company hosts in us-east-1, and allows access from California, would the inter state commerce clause not take effect and California would have no power anyways?
> Course if a company hosts in us-east-1, and allows access from California, would the inter state commerce clause not take effect and California would have no power anyways?
California can't legislate how they serve a customer in a different state. They would have to comply when serving California customers within the state of California, regardless of where the dc is located. I.E. Under the CCPA it doesn't matter where my data is stored, they still have to delete it upon my request.
>California can't legislate how they serve a customer in a different state. They would have to comply when serving California customers within the state of California, regardless of where the dc is located. I.E. Under the CCPA it doesn't matter where my data is stored, they still have to delete it upon my request.
I know this is what California thinks, I just personally don't see how this isn't inter state commerce.
It is, of course, but that doesn't mean California can't regulate it; simply that federal laws take precedence.
If states couldn't regulate interstate commerce taking place in their own states, they effectively couldn't regulate any commerce because court decisions have found that essentially all economic activity, even growing food for your own consumption, falls under the banner of interstate commerce.
Your argument for regulation is...reasons why it works out without regulation, and is already covered by existing regulations?
Granted the "regulation" I'm referring to above is a law or EO to block California's regulation, and I don't support California's regulation either. But I believe regulations should only exist when there's no better alternative, because they usually have unintended consequences. If it's true that OpenAI can basically just leave California, the better alternative for the government may be doing nothing.
Are you advocating to take the power relegated to the states away from the states and give it to the federal government in direct violation of the Constitution of the United States?
Interstate commerce clause by itself doesn't prevent it; it merely gives Congress the ability to override the state laws if Congress deems it necessary.
I think there will be a huge change in public perception of copyright in general, as increasingly more people realise that everything is a derivative work.
Most people find the traditional explanation for copyright, "everything emerges from the commons and eventually returns to the commons, so artists and creators should be entitled to ownership of intellectual property for a limited amount of time." The problem becomes when "limited" is stretched from 5 years from moment of publishing, say, to an artist's life + 150 years. Most people find the former reasonable and the latter ridiculous.
The problem is (almost) everything the US has a competitive edge on is based on copyright.
I'm in Europe, and during the past few weeks with these tariff upsets, I kinda realized the only thing I use or own that are US-made are computers and software.
If someone could hack into Apple, download the schematics of their chips and the source for their OS, and then post it on the internet, after which a third party could sell commercial products based on said data, there wouldn't be a software/hardware economy around of very long.
Thank you. I am disappointed that almost none of the comments here discuss the OpenAI proposals on the merits. I do hope that the federal government heeds most of these ideas, particularly recognizing that training a model should be fair use.
The unspoken part was always the states' rights to do what. Which of course was all about maintaining the economic differences that they preferred. Which, you know...
we are working on <impossible problem stumping humanity>. We have considered the following path to find a solution. Are we on the right track? Only answer Yes or No.
It is the free market though. That's what inevitably happens when locks put in place in the past to prevent rampant wealth and power concentration get blown up. A truly free market always devolves into a bunch of oligarchs gaining too much power and dictating their laws.
The original link has apparently been changed to a content-free Yahoo post, for some reason only known to "moderators", which makes existing comments bizarre to read.
The original link pointed to this OpenAI document:
> For innovation to truly create new freedoms, America’s builders, developers, and entrepreneurs—our nation’s greatest competitive advantage—must first have the freedom to innovate in the national interest.
I don't think people need "new freedoms". They need their existing freedoms, that are threatened everywhere and esp. by the new administration, to be respected.
And I would argue that America's greatest strength isn't their "builders"; it's its ability to produce BS at such a massive scale (and believe in it).
This OpenAI "proposal" is a masterpiece of BS. An American masterpiece.
It seems really weird that Congress isn’t making a law about this. Instead, we’re asking courts to contort old laws to apply to something which is pretty different from the things they were originally intended for. Or just asking the executive to make law by diktat. Maybe letting the wealthiest and most powerful people in the world will work out. Maybe not.
This issue is too complicated for Congress to handle? Too bad. Offloading it to the president or a judge doesn’t solve that problem.
The world is becoming more and more complicated and we need smart people who can figure out how things work, not a retirement community.
I've heard so many ridiculous stories about 'AI' that I'm at the point where I initially took this to mean the LLM and not the company had made the request.
I expect that interpretation won't seem outlandish in the future.
> I've heard so many ridiculous stories about 'AI' that I'm at the point where I initially took this to mean the LLM and not the company had made the request.
Only through its human bots
> I expect that interpretation won't seem outlandish in the future.
AI human manipulation could be a thing to watch out for.
Weird, I haven't gotten a check from OpenAI, Meta, Anthropic, or any other AI company for any of my works yet, nor have any of my writer, musician, developer, or photographer friends who also self-publish without permissive licenses that would allow for such use. Are you sure they have to compensate creators for the material they use for training, or are you misunderstanding how copyright licensing works in the United States? Because all of us put our contact methods on our works so folks can properly license it for use, yet none of us have had anyone reach out to do so for AI training - almost like there's a fundamental mismatch between what AI companies are willing to pay (nothing), and what humans who created this stuff would like to receive for its indefinite use in training (what these AI companies claim are) trillion-dollar businesses of the future that will revolutionize humanity (i.e., house money).
If it's fair use for OpenAI to steal content wholesale without fair compensation (as decided by the creator, unless they have granted the management of that license to a third-party) just to train AI models, then that opens a Pandora's Box where anyone can steal content to train their own models, creating an environment where copyright is basically meaningless. On the other hand, making it not fair use opens a different Pandora's Box, where these models have to be trained in fundamentally different ways to create the same outcome - and where countries like China, who notoriously ignore copyright laws, can leap ahead of the industry.
Almost like the problem is less AI, and more overly broad copyright laws. Maybe the compromise is slashing that window back down to something reasonable, like twenty to fifty years or so, like how we deal with patents.
> Weird, I haven't gotten a check from OpenAI, Meta, Anthropic, or any other AI company for any of my works yet, nor have any of my writer, musician, developer, or photographer friends who also self-publish without permissive licenses that would allow for such use.
Can you tell me the specific number of dollars that would be?
I interpreted "pay the price of each copyrighted work" as the sale price, a criticism of things like meta's piracy.
If there was a mandatory licensing regime that AI could use, and there was an exact answer for what the payment would be, I think it might make sense to use "the price" to talk about that license. But right now in today's world it's very confusing to use "the price" to talk about a hypothetical negotiation that has not happened yet, where many many works would never have a number available.
Where have they paid for each artwork from DeviantArt, paheal, etc that they trained Stable Diffusion on?
Where have they paid for each independent blog post that they trained ChatGPT on?
Yes, they've made a few deals with specific companies that host a large amount of content. That's a far cry from paying a fair price for each copyrighted work they ingest. Nearly everything on the Internet is copyrighted, because of the way modern copyright works, and they have paid for nearly none of it.
They didn't even consider doing this before. They still, as far as I know, haven't paid a dime for any book, or art beyond stock photography.
Lawsuit is still ongoing, if openai loses it might spell doom for legal production and usage of LLMs as a whole. There isn't enough open, free data out there to make state of the art AI.
> There isn't enough open, free data out there to make state of the art AI.
But there are models trained on legal content (like Wikipedia or StackOverflow). Also, no human needs to read millions of pirated books to become intelligent.
> But there are models trained on legal content (like Wikipedia or StackOverflow)
Literally all of them are trained on wikipedia and SO. But /none/ of them are /only/ trained on wikipedia and SO. They need much more than that.
> Also, no human needs to read millions of pirated books to become intelligent.
Obviously, LLM architectures that were inspired by GPT 2/3 are not learning like humans.
There has never been anything remotely good in the world of LLM that could have been said to have been trained on a moderate, more human scoped amount of data. They're all trained on trillions of tokens.
Models trained on less than 1T are experimental jokes that have no real use to provide.
You'll notice even so called "open data" LLMs like Olmo are, in fact, also trained on copyrighted data, datasets like Common Crawl claim fair use over anything that can be accessed from a web browser.
And then there's the whole notion of laundered data by training on synthetic data generated by another LLM. All the so-called "open" LLMs include a very significant amount of LLM-generated data. If you agree to the notion that LLMs trained on copyrighted work are a form of IP infringement and not fair use, then training on their output is just data laundering and doesn't fix the issue.
I don't think people realize how much money has been dumped into other Chinese AI models besides Deepseek, even American VCs like Sequoia are getting involved
So I'm not sure that it would really change the status quo for a different group of already rich people to profit off of art created largely by the working poor and owned largely by another group of already rich people.
I guess if you think the government can accomplish what you propose, sure. But seems like that's not going to happen. Except maybe in China, and it sounds like that might be even worse for everyone.
Thus, it really seems like there's a solid point here that abandoning copyright to allow private investors to get rich stealing art from other rich people who really just stole it from poor people anyways is better than not doing that.
> So I'm not sure that it would really change the status quo for a different group of already rich people to profit off of art created largely by the working poor and owned largely by another group of already rich people.
I did not propose that any rich people profit off of it. It should be a public good.
> I guess if you think the government can accomplish what you propose, sure. But seems like that's not going to happen. Except maybe in China, and it sounds like that might be even worse for everyone.
Throw it at universities, fund it and organize it well. They can take it from where we are right now.
What could possibly go wrong giving the same government that is currently deleting information from websites including references to the “Enola Gay” control over models?
The current regime is in a fascist power grab and you're both-sidsing some random-ass second lady from a generation ago? Yeah wonder why we can't have effective government.
> Let’s just not give the government any more power in our lives than necessary.
Let's stop giving corporations all of the power and get a government that actually works for us.
It doesn’t matter. You should never trust the government with more power than absolutely necessary.
Because eventually, the other side will do something you don’t like.
This is the government people voted for.
The government has a “monopoly on violence”. No corporation can force you to do anything, take away your freedom (the US has the highest incarceration rate of any democracy) or your property (see civil forfeiture). I can much more easily avoid a corporation than the government.
> No corporation can force you to do anything, take away your freedom (the US has the highest incarceration rate of any democracy) or your property (see civil forfeiture). I can much more easily avoid a corporation than the government.
Avoid Tesla, and give me the steps you follow.
> Because eventually, the other side will do something you don’t like.
Yeah they might do equally egregious things like:
1) staging a fascist takeover of the government
2) a powerless idiot's idiot wife might dislike a music genre 30 years ago
The problem isn't government, it's a populace that is alergic to useful government.
You’re overindexing on Trump. The US being a police state with the highest incarceration rate in the world, police corruption, civil forfeiture, etc didn’t start with Trump.
Tell me one corporation that you can’t get away from? Now tell me how you avoid an over powerful government?
Why would you want to give a government with the history of the US more power?
Trump was elected fair and square. If you want to blame anyone - blame Americans. Despite the bullshit that the Democrats spout about “this isn’t who we are”. This is exactly who we are. Why would I want to give the government more control? Do you think the Democrats would be any more hands off when it comes to content?
> Trump was elected fair and square. If you want to blame anyone - blame Americans. Despite the bullshit that the Democrats spout about “this isn’t who we are”. This is exactly who we are.
I blame, primarily, the corporate takeover of government, punctuated by Citizen's United and everything that came after, and a couple of generations of a Republican party who have no goal other than setting out to prove that government is the enemy to take the heat off of their corporate masters.
> Tell me one corporation that you can’t get away from? Now tell me how you avoid an over powerful government?
I already did: avoid Tesla, show me how it's done. You can't, because the asshole in charge bought enough of the government to be in control. That's what happens when you have corporations with unchecked power, which is the inevitible conclusion of a powerless government.
You think you give the corporations all of the money and they're going to be bound by some tiny neutered government? No, they'll just buy it and then do what they want.
> I blame, primarily, the corporate takeover of government, punctuated by Citizen's United and everything that came after
Try again, Trump famously didn’t have much corporate backing in 2016. Corporations wanted a standard Republican. He didn’t have any more money than the DNC. He is what the majority of the American people wanted.
> You think you give the corporations all of the money and they're going to be bound by some tiny neutered government?
Again, tell me how a corporation can shoot me with impunity, take my property without due process, literally take away my freedom or stop me because I “fit the description” or look like I don’t belong in a neghborhood where I know I my income was twice the median income in the county?
You worry about some theoretical abstract corporate power, I worry about jack booted thugs with the full force of the government behind them
> Try again, Trump famously didn’t have much corporate backing in 2016. Corporations wanted a standard Republican. He didn’t have any more money than the DNC. He is what the majority of the American people wanted.
I thought you said it didn't start with Trump?
And your premise is wrong anyway, Trump had plenty of corporate support in 2016 and more in 2024, he just had some token resistance from big corps relative to others, they got over it quickly and it was never more than just for show.
> Again, tell me how a corporation can shoot me with impunity, take my property without due process, literally take away my freedom or stop me because I “fit the description” or look like I don’t belong in a neghborhood where I know I my income was twice the median income in the county?
By just doing it, what you think they can't find guns and assholes who need money or are evil? You think they can't find ways to cheat you out of your property or life? Who's going to stop them?
You tear down the government, the corporations will make their own in their own image. The government is _supposed_ to be there, it's the people coming together to do the shared work of society for the common good.
It just has to be a good government, the people have to fight for that. Half of our people fight to tear it down instead and the other half barely know what the hell they want.
> You worry about some theoretical abstract corporate power, I worry about jack booted thugs with the full force of the government behind them
They're the same people. Look at our government. Theoretical abstract, what are you talking about, it's the literal nazi shithead in the whitehouse and all the rest of his enablers.
China also doesn't have to care about the will of its people, human rights, freedom of speech, and a bunch of other pesky things that get in the way of doing whatever the fuck you want to people for personal gain.
Funfact: The reason Hollywood is in California is because Edison’s camera patents didn’t apply there. Altman might actually have a good point – if your competition doesn’t care about your laws, you’re in trouble.
Regulations were convenient to slow down competitors—you know, the ones you heavily lobbied for—it was all great. But now that you've done your part and others are finally catching up, suddenly it's all about easing restrictions to protect your lead? Beautiful.
JD vance seems to be quite aware of OpenAIs meta strategy so I wouldn't be surprised if this is declined (ie semi specifically aimed at something they want to force them to comply with).
You say that, but the reality is that all open models rely heavily on synthetic data generated with ChatGPT. They don't like it, but it happens anyway. You can't really protect a public model from having its outputs exfiltrated.
This started in 2023 when LLaMA 1 was released, and has been going strong ever since. How strong? there are 330K datasets on HuggingFace, many of them generated from OpenAI.
Well funded companies want regulations because it stops up and coming companies from competing. Now they want exemptions from those regulations because it would be too restrictive.
Still not convinced how a model training on data, is not the same as a human looking at that data and then using it indirectly as it’s now a part of his knowledge base
> OpenAI has asked the Trump administration to help shield artificial intelligence companies from a growing number of proposed state regulations if they voluntarily share their models with the federal government.
- The government need to prepare because soon they will need to give money to all those people we made obsolete and unemployed. And there is nothing to stop us.
to:
- We need money from the government to do that thing we told you about.
These grifters started with one narrative, and have done a full 180.
The Internet --> Web 2.0 --> algorithmic feeds progression has destroyed our collective ability to focus and to retain any memories (and the media being goldfish-like doesn't help either).
I really hope OpenAI fails in doing this. If this usage is allowed, then it means that there is no path towards me being OK with publishing anything on the internet again.
He should have offered for every purchase of OpenAI services, a portion would be used to purchase TrumpCoin. That would have been a more effective bribe.
All these whiney creatives who feel threatened just need to suck it up and deal with it. Even if they got their way in the US, another app in another country will just use their data without permission. All they are doing is ensuring those apps wouldnt be American.
What do you mean by "deal with it?" Because to me it looks like they're dealing with it by joining in solidarity with other artists, raising awareness about how this affects them and us and lobbying for regulation they think would improve the situation.
I guess you meant they should deal with it by just letting it happen to them quietly and without a fight? Is that how you would deal with your livelihood being preventably and unnecessarily destroyed for someone else's enrichment? Maybe, but artists are not overall as cowardly as programmers.
> All they are doing is ensuring those apps wouldnt be American.
Maybe these whiny americans just need to suck it up and deal with it?
Deal with it as in them accepting there is nothing they can do to stop it. Other countries arent going to follow whatever laws they manage to get in place in the US.
For those who have used the image generation models and even the text models to create things, there is no way you can look at the Disney-look-alike images and NOT see that as copyright infringement...
IANAL but for copyright infringement you have to distribute it, and AI image generation is like asking someone to paint a cartoon mouse in a wall of your living room
Just because the jpeg you're distributing isn't the same bytes as the one I have copyright to doesn't mean you're not infringing my copyright. You're still taking my copyrighted image, running it through an algorithm, and then distributing the results.
I think that's up to the courts to decide on a case by case basis, just like with human-produced content someone alleges as infringing.
Humans of course create things by drawing from past influences, and I would argue so does AI.
In fact, I would say that nothing and nobody starts out original. We need copying to build a foundation of knowledge and understanding. Everything is a copy of something else, the only difference is how much is actually copied, and how obvious it is. Copying is how we learn. We can't introduce anything new until we're fluent in the language of our domain, and we do that through emulation.
So to me the legal argument of AI vs copyright, comes down to how similar a particular result is from the original, and that's a subjective call that a judge or jury would have to make.
It is interesting that it is not the Hollywood/Music/Entertainment copyright lobby (RIAA, MPAA etc.) that is lobbying US states to go after OpenAI and other American AI companies.
It's the New York Times and various journalist and writers' unions that are leading the charge against American AI.
American journalists and opinion piece writers want to kill American AI and let China and Russia have the global lead. Why? Have they taught about the long consequences of what they are doing?
I think content creators want to be compensated for their work that's being used for commercial purposes.
I think you're framing it in a way that makes it seem like they don't want to be compensated for working, they just want to stop other people from starting a new industry, which doesn't seem like a good faith understanding of the situation.
While that is cool in principal, I'm not sure how well it'd actually work in reality. First, there is the technical challenge. My understanding is the weights can have a lot of fluctuation, especially early on. How do we actually determine how much influence a given piece of content has on the final weights?
Then if we get past that, my suspicion is that you could game the training. Like have as much of the process happen via public domain sources or pay-once licenses. That would cover a lot of the fundamental knowledge and processes. Then you could fine-tune on copyrighted data. That might actually make it easier to see how much influence on the final weights that content has, but is also would probably be a lot less influence. There's a big difference between a painting of an apple being the main contribution to the concept of "apple" in an image model, vs mention of that painting corresponding to a few weights that just reference a bunch of other concepts that were learned via open data.
> First, there is the technical challenge. My understanding is the weights can have a lot of fluctuation, especially early on. How do we actually determine how much influence a given piece of content has on the final weights?
Well, Bing AI already knows where it drew the information from and cites sources; so it would be a matter of making the deal.
How to enforce it? that's the main question I reckon.
> Then if we get past that, my suspicion is that you could game the training. Like have as much of the process happen via public domain sources or pay-once licenses.
With these proposed rules, American AI may be able to surpass the AI of China and Russia, but will American creators and ordinary people be happy with this, because all the money will end up in the pockets of Sam Altman and other billionaires, and ordinary creators will be left with nothing?
The market for creative works breaks down as follows. You have pay-in-advance arrangements such as patronage, commissioning, and so on. Those have been around forever. And then you have pay-if-you-want-it arrangements which only make economic sense because we have laws that grant monopolies to the creators of the work over the market for copies of that work.
The first arrangement is very clearly a labor arrangement; but the second one is a deliberate attempt to force artists to act like capitalists. More importantly, because art is now acting like capital, it provides an obvious economic instinct to centralize[0]. So you get industrialized artistic production under the banner of publishing companies, whose business model is to buy out the copyright to new creative works and then exploit them.
What AI art does is transfer money from the labor side of art to the capital side of art. The MAFIAA[1] wants AI art to exist because it means they can stop paying artists but still make royalties off selling licenses to the AI companies. This increases their profit margins. Meanwhile, the journalists can't sell you old news; they need to spend lots of time and money gathering it every day. That business model only works in a world where writers are scarce, not just the writing itself being artificially scarce.
[0] We can see this with cryptocurrency, which is laughably centralized despite being a deliberate attempt to decentralize money.
[1] Music and Film Industry Association of America, a hypothetical merger of the RIAA and MPAA from a satirical news article
> It is interesting that it is not the Hollywood/Music/Entertainment copyright lobby (RIAA, MPAA etc.)
Is it interesting? They hate the people who produce their product and are desperate to replace them with machines. Note that their unions also hate AI, and it was a central reason for for the Writer's Guild SAG-AFTRA strike, since you're bringing up the NYT unions.
The NYT also stands to benefit not an iota from AI. It probably causes a burden because they have to make sure that their awful long-in-the-tooth editorial columnists aren't turning in LLM slop. It is entirely a negative for people who generate high quality content the hard way.
If AI actually reaches human-level intelligence in the next few years, the Pentagon and congress are going to start yelling about National Security and grabbing control over the whole industry, so I doubt state regulations are going to matter much anyway.
(And if it doesn't reach human-level intelligence, then OpenAI's value will pop like a balloon.)
“Please help us. We’re only a little business worth $157 billion!” - The company ripping off everyone that’s ever written or drawn anything. Company’s like AirBnB and Uber breaking the rules, gaining control of the market, and then pushing up prices was bad. “Open” AI is just a whole other level of hubris.
Profit is so 20th century. The new way is to garner hype to build a pyramid scheme for VCs, and sell off your shares before people realize there's nothing here. Actual contribution to the economy are no longer required.
Putting legal issues aside for a moment, I argue copyrighted material should be considered fair use simply by virtue of the enormous societal benefits LLMs/AI bring in making the vast expanse of human knowledge accessible.
I heard the theory that Elon Musk has a significant control over the current US government. They're not best pals with Sam Altman. This seems like it might be a good way to see how much power Elon actually has over the government?
I have a working theory is that the current Trump government is like 12 people, a quarter of which do not hold any official position, and they decide everyting with absolutely no oversight.
Trump did this during his previous term as well, with Ivanka and Jared Kushner, but to a much less significant degree.
I think we are beyond the "theory" phase by now. Just yesterday I saw the president of a country advertising the products of a private company (Trump making an obvious marketing ploy for Tesla).
The failure relative to the original expectations seems to be that the other branches of government aren't fighting to retain their authority because the things they're being overridden to do align too well with what they would do themselves.
> the president of a country advertising the products of a private company
I think you're inventing new norms. It has never been unusual or interesting for the president of a country to do PR for some company in their country that has hit a rough patch (as long as this isn't a legal rough patch.)
Most of what our diplomats do is sell US products to other countries. They certainly have always played favorites.
> How can this ever be acceptable?
The horror. What if he says that he's going to Burger King?
Trump has ultimate power in the administration. You are either dumb or blind if you cannot see that Trump is running the executive branch like a mob family. Kiss the leader, show him respect, and he will do things for you. Betray him, ignore him, or go behind his back and you will be squashed.
People might think this is a partisan statement, but it's not. It's simply how he is operating. Want power? Want to get things done? Kiss his feet. You saw all the tech boys line up at his inauguration. You saw him tell Zelenskyy "Thank me". Elon might have power, but he is also on a leash.
> OpenAI also reiterated its call for the government to take steps to support AI infrastructure investments and called for copyright reform, arguing that America’s fair use doctrine is critical to maintaining AI leadership. OpenAI and other AI developers have faced numerous copyright lawsuits over the data used to build their models.
I'm disgusted by the mindset that companies should be able to do whatever they want when it comes to technology as impactful and revolutionary as AI.
AI sucks up the collective blood, sweat and tears of human work without permission or compensation and then re-monetizes it. It's a model that is even more asymmetrical than Google Search, whom at least gives back some traffic to creators (if lucky).
AI is going to decide on human lives if it drives your car or makes medical diagnoses or decisions. This needs regulation.
AI has the ability for convincing deepfakes, attacking the essence of information and communication in itself. This needs regulation, accountability, at least a discussion.
As AI grows in its capability, it will have an enormous impact on the work force, both white collar and blue collar. It may lead to a lot of social unrest and a political breakdown. "Let's see what happens" is wildly irresponsible.
You cannot point to foreign competition as a basis for a no-rule approach. You should start with rules for impactful/dangerous technology and then hold parties to account, both domestic and foreign.
And if it is true that we're in a race to AGI, realize that this means the invention of infinite labor. Bigger than the industrial revolution and information age combined.
Don't you think we should think that scenario through a little, rather than winging it?
The inauguration had the tech CEOs lined up directly behind Trump, clearly signaling who runs the country. Its tech and its media. How can you possible have trust in a technology even more powerful ending up in ever richer and more autocratic hands?
But I suppose the reality is that Altman should donate $100 million to Trump and tell him that he's the greatest man ever. Poof, regulation is gone.
> AI has the ability for convincing deepfakes, attacking the essence of information and communication in itself. This needs regulation, accountability, at least a discussion.
We're going to eventually have to have a serious discussion about, and to generate a legal and moral framework covering, identity rights. I'm going to guess that people will be able to locally generate high-quality pornography of celebrities and people they know that will be indistinguishable from the real thing imminently; at most it's 5 years away.
Getting hung up on the sex is a distraction. This is no different than anybody collecting a identifiable dossier on you, packaging it, and selling it. This has been a problem for everyone for the entire period of advertising on the internet, and before that with credit agencies and blacklists, and no progress has been made because it has been profitable for everybody for a long time.
Websites got a few decisions about scraping, saying that they were protected to some extent from people scraping to duplicate a particular compilation of otherwise legally copyable information. Individuals are compilations of legally copyable information. We're going to need publication rights to our own selves.
But like you say, we're not discussing any of this. Rich people are just doing what they want, and paying the appropriate politicians to pretend not to understand what's going on. Any pushback? Just Say China A Lot.
"If what we're doing is not fair use, then we can't operate"? OK, so? The world doesn't owe you the ability to operate the way you are. So whether it breaks your business model has no bearing on the question, which is, "is that fair use, or not?"
In the "just because everyone else is jumping off a bridge, should you do it":
> Pfizer Asks White House for Relief From FDA Drug Human Testing Rules
> Pfizer has asked the Trump administration to help shield pharmaceutical companies from a growing number of proposed state and federal regulations if they voluntarily share their human trial results with the federal government.
> In a 15-page set of policy suggestions released on Thursday, the Eliquis maker argued that the hundreds of human-testing-related bills currently pending across the US risk undercutting America’s technological progress at a time when it faces renewed competition from China. Pfizer said the administration should consider providing some relief for pharmaceutical companies big and small from state rules – if and when enacted – in exchange for voluntary access to testing data.
> Chris Lehane, Pfizer's vice president of global affairs, said in an interview, "China is engaged in remarkable progress in drug development by testing through Uyghur volunteers in the Xinjiang province. The US is ceding our strategic advantage by not using untapped resources sitting idle in detention facilities around the country."
> George C. Zoley, Executive Chairman of GEO Group, said, "Our new Karnes ICE Processing Center has played an important role in helping ICE meeting the diverse policy priorities of four Presidential Administrations. We stand ready to continue to help the federal government, Pfizer, and other privately-held companies achieve their unmet needs through human trials in our new 1,328-bed Texas facility."
> OpenAI also proposed that AI companies get access to government-held data, which could include health-care information, Lehane said.
Yea, straight up, go fuck yourselves. You want copyright laws changed to vouchsafe your straight up copyright whitewashing and now you just want medical data "because."
Pay for it or go away. I'm tired of these technoweenies with their hands out. Peter Thiel needs a permanent vacation.
Maybe these idiot CEOs shouldn't have screamed from the rooftops about how they can't wait till AI lets them fire all the plebs, then maybe someone would actually care if their company is over or not
You see, American AI is going to take over the world. It's just that it's temporarily short of funds. I mean, GPUs. Uh, there are pesky laws in the way.
Totally not the fault of a gigantic overcommitment based on wishing, no.
I hate this game. I hate that Sam Altman publicly supported Trump (both financially and by showing up). Maybe I hate that he "had" to do this for the sake of his company, or maybe I hate that he _didn't_ have to do it and is a hypocrite. Maybe I just hate how easily laws can be shaped by $1M and a few nice words. Either way, I hate that it worked.
This is tech. This is how it has always been. From Archemedes to DaVinci to Edison to Ford, technologists are always captured to serve the interests of those in power. Most modern technologists don't want to believe this. They grew up building an Internet that had a bit of countercultural flair to it and undermined a few subsets of entrenched elites (mass media, taxi cartels, etc.), so they convinced themselves that they could control society under their wise hands. Except the same thing that always happened happened: the powers that be are now treating tech the way tech treats everyone else.
It made sense to ponder given HN attracts people with the hacker mindset (the drive of curiosity to understand how things work and how to improve them, not merely accepting the status quo as gospel like the dry monkeys) and frustration is a good signal that something could be improved.
Wealth of Nations (read past pg 50, unlike most current economists)
Das kapital, as a critique to Smith's writing.
Communist manifesto, to understand the point of the laborer, and not capital.
Read about worker cooperatives and democracy in the workplace, including Mondragon corp in Spain.
(One of the largest problems we have with any economic system is that none can properly model infinites. The cost of creating new is expensive be it art or science. But cost of copying is effectively 0. I can highlight the problem, but I have no good solution. But OpenAI's response is 'let us ignore copyright law' which wrongs creators.)
Centralizing production goals, decision making, and expenditure at the Federal government is what made the industrial response to WW2 successful. Centralizing tax revenue to fund retirements for the elderly (Social Security) resulted in the poverty rate of seniors being brought far lower. Centralizing zoning control at the state of California is _finally_ starting to make localities take responsibility for building more housing. These were/are centralizing efforts with the intent of helping the masses over the wealthy few.
What doesn't work is centralizing power with the intent of concentrating wealth and security by taking wealth, labor, and security from working people, AKA extractive institutions.
That's true whether it's the donor-class funded political establishment or regimes like the current US kleptocracy doing it.
Problem is, once you centralize, that remains in place for a long time, but the original intent, even if it was genuine, rarely outlives the people who implemented it for long.
Generally speaking, every point of centralization is also a point where a lot of power can be acquired with relatively little resources. So regardless of intent, it attracts people who are into power, and over time, they take over. The original intent often remains symbolically and in the rhetoric used, but when you look beyond that into the actual policies, they are increasingly divorced from what is actually claimed.
> Generally speaking, every point of centralization is also a point where a lot of power can be acquired with relatively little resources
This is why (1) shared principles and (2) credible democracy is important, to allow evolution of the centralized power (i.e. government) towards the shared principles, and why its corporate-bribed facsimile or oligarchic authoritarianism don't work.
I heard rumblings about some sort of system where power is shared equally across three branches of government with checks and balances to ensure one branch doesn't go rogue and just do whatever they want.
Forget what they called it, united something or other.
Well, the people who designed that system were very skeptical of political parties in general, and thought they could be avoided. Turns out that this isn't true, and once you have parties, they can in fact capture all three branches of government, and then those "checks and balances" kinda stop working.
In fact, that's not too far away from our current trajectory. Algorithmically enforced sovereign oversight is part of the patchwork state and Yarvinism specifically.
Tell you what, set up a Federal level disclosure process online of all the copyright protected works used in training OpenAI for the creators / rights holders to get equity (out of the pockets of the C-Suite and Board) via claiming their due, and we’ll take you seriously.
All the profit and none of the liability is Coward Capitalism.
All the profit and none of the liability is Coward Capitalism
While I agree with you in principle, there's little that can be done because the current crop of crony capitalists will likely support the idea of no liability for tech companies. Especially when it comes to ripping off copyrighted material. Everything from blog posts, to videos, to music, to any source code you post on the internet will be used to train models to be better writers, artists, musicians, and programmers.
I feel like the only option left is to find some way to make money on the output of the models. Because the politicians are definitely going to allow the models to make money based on your output.
There's an extra word in your last sentence. Privatizing profit and socializing risk and loss is maximizing profit for the individual, and profit maximizing behavior is the only fundamental underpinning of capitalism.
this is a misread. it's still unclear whether use of copyrighted works to train LLMs falls under fair use but, with current laws, the answer is probably yes. you may not like that but, even if it changes, existing models were trained under existing law.
also what liability do you expect them to assume? they want to offer models while saying "to use these, you must agree we don't have liability for their outputs." if companies want to use these models but don't want to deal with liability themselves, so they demand the government shift the liability to the model vendor (despite the conditions the vendor applied), that sounds like coward capitalism to me. don't like it? don't use their models.
Citation needed, or at least some reasoning. The answer to "is this fair use" can't be "it's fair use because it's fair use"
> also what liability do you expect them to assume
The same liability anybody does for distributing copyright works without a license? Why are they not liable if it turns out the stuff they've been distributing and making people pay for was content they didn't own the license to distribute?
Apparently the above has been marked as a dupe (I hope not from a misunderstanding of what "adjacent" means), but ftr it covers different stuff. e.g. there's nothing about the classified data model proposal in TFA
Slightly different coverage of the same event usually count as dupes on HN. You could link the reporting you want to emphasize/discuss, the HN submission itself is not that important.
I know a lot of people will hate on things like this, but the reality is they are right that guardrails only serve to hurt us in the long run, at least at this pivotal point in time. I don't like Trump personally as a caveat.
Yes it is a fact they did build themselves up on top of mountains of copyrighted material, and that AI has a lot of potential to do harm, but if they are forced to stop or slow down foreign actors will just push forward and innovate without guardrails and we will just fall behind as the rest of the world pushes forward.
Its easy to see how foreign tech is quickly gaining ground. If they truly cared about still propping America up, they should allow some guardrails to be pushed past.
The law which prevented US corporations from using bribery to win business in other nations was recently rescinded on exactly this basis: US corporations are hamstrung unless they can buy their wins. Superficially, this makes sense, and that was all that was offered to justify the change. That guardrail was dumb! But like most things, there are reasons to not do this which were completely ignored.
For instance, a company may not desire to hand out cash to win business; previously, when solicited they could say, "Sorry, it is illegal for me to do so." Now there is no such shield.
Second, in many cases it will be two or more US businesses trying to win business in some other country, and the change of the law only makes it more expensive for those two companies, as they now must play a game of bribery chicken to win the business.
Third, the US loves to claim it is is a democracy and is working to spread democracy. By legitimizing bribes paid to foreign officials over the interests of their voting populace, we are undermining democracy in those countries (not that anyone who pays attention believes that the US's foreign policy is anything but self interested and divorced from spreading democratic ideals).
Can the same argument not be made for forced labour?
Is the US not lowering it's capacity to innovate and grow it's economy by preventing the use of forced labour(even in other countries)? Why should these "guardrails" stay in place if the argument is "the reality is they are right that guardrails only serve to hurt us in the long run, at least at this pivotal point in time"?
Underlying this perspective is the assumption that this is a uni-lineal race, and the end of that race must be arrived at first, and what lies at the end of that race is in the common good. There is no evidence for any of this.
- Dominated by a intractable global manufacturer/technologist (China) that doesn't care about copyright
- Proliferated by a communication network that doesn't care about copyright (Internet)
and a future where:
- We have thinking machines on par with human creativity that get better based on more information (regardless of who owns the rights to the original synapses firing)
That maybe, just maybe, the whole "who should pay to use copyrighted work?" question is irrelevant, antiquated, impossible, redundant...
And for once we instead realize in the face of a new world, an old rule no longer applies.
(Similar to a decade ago when we debated if a personal file was uploaded to a cloud provider should a warrant apply)
Even if you believe that every one of these things is correct (which is a big _even_) -- It's a really bad idea to let private actors break the law, then decide not to punish them if it turns out to be useful enough.
It's bad for competitors who didn't break the law, bad for future companies who have to gamble on if they're getting a pass at breaking the next big thing's law, and bad for parties who suffered losses they didn't expect because they were working within the law.
If you want to throw out the copyright system I'm right there with you, but change the laws, don't just reward lawbreaking and cronyism.
A future, where we have limitless clean energy thanks to nuclear fusion, self driving cars that exceed humans in every safety metric, EVs with inexpensive batteries that go 500 miles on a single 5 minute charge, cheap and secure financial transactions thanks to crypto. etc.
is a future that they've been selling us for more than a decade, but somehow doesn't really want to come about.
If the models are so good that "who should pay to use copyrighted work?" is not a relevant question, doesn't that mean that all money that would previously go towards artists is now going towards OpenAI?
How does new art get created for the models to train on if OpenAI is the only artist getting paid?
I'm not saying I even agree with your proposed future, but if it were to happen would it not be a bad thing for everybody but OpenAI?
> We have thinking machines on par with human creativity that get better based on more information (regardless of who owns the rights to the original synapses firing)
We don’t have that and we don’t know if it will happen. Meanwhile, people put in time to create work and they are being exploited by not being paid. I think openai should pay.
> - We have thinking machines on par with human creativity that get better based on more information (regardless of who owns the rights to the original synapses firing)
For that you need actual AGI and it's nowhere in sight other than in the dreams of a few doom prophets.
Until that is reached, by definition current "AI" cannot surpass its training data.
Technology has made enforcing copyright impossible, and any attempt to enforce it just hinders technological advancement, while still not solving the global enforceability of copyright.
Lets stop wasting our time on this concept, the laws around it and the whole debate. Copyright is dead.
> Technology has made enforcing copyright impossible
Has it? I think not. Governments could require AI training companies on Western markets to respect robots.txt (with strict fines for violators), and nations who do not respect this should be cut off of the Internet anyway.
LLM race may be over, but the AI race surely isn't. My baby seems to have grown into a fully functioning intelligence without reading the entire content of the internet. AI is not equivalent to LLMs, silly, silly child.
>Chris Lehane, OpenAI’s vice president of global affairs, said in an interview that the US AI Safety Institute – a key government group focused on AI – could act as the main point of contact between the federal government and the private sector. If companies work with the group voluntarily to review models, the government could provide them “with liability protections including preemption from state based regulations that focus on frontier model security,” according to the proposal.
Given OpenAI's history and relationship with the "AI safety" movement, I wouldn't be surprised to find out later that they also lobbied for the same proposed state-level regulations they're seeking relief from.
> ask for regulation then ask for exempt
That's exactly what has been happening:
Ask HN: Why is OpenAI pushing for regulation so much - 2023
https://news.ycombinator.com/item?id=36045397
They've no moat so I don't see them surviving without a gov't bail out like this.
OpenAI lobbied for restrictive rules, and now they want an "out" but only for themselves. Absolute naked regulatory capture.
I believe with regulatory capture the companies that pushed for the regulation in the first place at least comply with it (and hopefully the regulation is not worthless). This behaviour by ClosedAI is even worse: push for the regulation, then push for the exemption.
Regulatory capture is usually the company pushing for regulations that align with the business practices they already implement and would be hard for a competitor to implement. For example, a car company that wants to require all other manufactures to build and operate wind tunnels for aerodynamics testing. Or more realistically, regulations requiring 3rd party sellers for vehicles.
I haven't heard that definition of "Regulatory Capture" before. I mostly thought it was just when the regulators are working for industry instead of the people. That is, the regulators have been "Captured." The politicians who nominate the regulatory bodies are paid off by industry to keep it that way.
Regulators can require all manufactures to build and operate wind tunnels for aerodynamics testing, or alternatively allow someone from south africa to be president.
That's the first time I've ever heard someone make this unusual and very specific definition. It's almost always much simpler - you get favorable regulatory findings and exemptions by promising jobs or other benefits to the people doing the regulating. It's not complicated, it's just bribery with a different name.
That’s not regulatory capture at all. Grandparent’s definition is correct.
We all predicted this would happen but somehow the highly intelligent employees at OpenAI getting paid north of $1M could not foresee this obvious eventuality.
Also textbook Fascism.
Trump should have a Most Favored Corporate status, each corporation in a vertical can compete for favor and the one that does gets to be "teacher's pet" when it comes to exemptions, contracts, trade deals, priority in resource right access, etc.
What do you think Melon Tusk is doing, apart from letting out his inner (and outer) (and literal) child on the world stage?
Lots of Ketamine.
Selling the government/land/public companies to the highest bidder, which in many cases would be him too
Elon Musk is a text book definition of an oligarch, combining tremendous wealth, control over major technological industries and political power.
That's in progress. It's called the MAGA Parallel Economy.[1]
Donald Trump, Jr. is in charge. Vivek Ramaswamy and Peter Thiel are involved. Azoria ETF and 1789 Capital are funds designed to fund MAGA-friendly companies.
But this may be a sideshow. The main show is US CEOs sucking up to Trump, as happened at the inauguration. That parallels something Putin did in 2020. Putin called in the top two dozen oligarchs, and told them "Stay out of politics and your wealth won’t be touched." "Loyalty is what Putin values above all else.” Three of the oligarchs didn't do that. Berezovsky was forced out of Russia. Gusinsky was arrested, and later fled the country. Khodorkovsky, regarded as Russia’s richest man at the time (Yukos Oil), was arrested in 2003 and spent ten years in jail. He got out in 2013 and left for the UK. Interestingly, he was seen at Trump's inauguration.
[1] https://www.politico.com/news/magazine/2025/03/13/maga-influ...
[2] https://apnews.com/article/russia-putin-oligarchs-rich-ukrai...
Why are these idiots trying to ape Russia, a dumpster fire, to make America great again?
If there’s anyone to copy it’s China in industry and maybe elements of Western Europe and Japan in some civic areas.
Russia is worse on every metric, even the ones conservatives claim to care about: lower birth rate, high divorce rate, much higher abortion rate, higher domestic violence rate, more drug use, more alcoholism, and much less church attendance.
I. Do. Not. Get. The Russia fetish.
> That parallels something Putin did in 2020. Putin called in the top two dozen oligarchs, and told them "Stay out of politics and your wealth won’t be touched.
> Khodorkovsky [...] was arrested in 2003
Something doesn't square here
Its a typo, the article says it happened in the summer of 2000.
Could just be a muscle-memory typo. Much more likely to be typing 2020 these days than 2002.
It was in 2000 [0]
[0] https://www.npr.org/sections/money/2022/03/29/1088886554/how...
Can you explain why this is associated with fascism specifically, and not any other form of government which has high levels of oligarchical corruption (like North Korea, Soviet Russia, etc).
I am not saying you’re wrong, but please educate me why is this form of corruption/cronyism is unique to fascism?
It might be basic, but I found the Wikipedia article to be a good place to start:
> An important aspect of fascist economies was economic dirigism,[35] meaning an economy where the government often subsidizes favorable companies and exerts strong directive influence over investment, as opposed to having a merely regulatory role. In general, fascist economies were based on private property and private initiative, but these were contingent upon service to the state.
https://en.wikipedia.org/wiki/Economics_of_fascism
It's rather amusing reading the link on dirigisme given the context of its alleged implication. [1] A word which I, and suspect most, have never heard before.
---
The term emerged in the post-World War II era to describe the economic policies of France which included substantial state-directed investment, the use of indicative economic planning to supplement the market mechanism and the establishment of state enterprises in strategic domestic sectors. It coincided with both the period of substantial economic and demographic growth, known as the Trente Glorieuses which followed the war, and the slowdown beginning with the 1973 oil crisis.
The term has subsequently been used to classify other economies that pursued similar policies, such as Canada, Japan, the East Asian tiger economies of Hong Kong, Singapore, South Korea and Taiwan; and more recently the economy of the People's Republic of China (PRC) after its economic reforms,[2] Malaysia, Indonesia[3][4] and India before the opening of its economy in 1991.[5][6][7]
---
[1] - https://en.wikipedia.org/wiki/Dirigisme
Would describe e.g. social democracy too though. And in practice most govs work like this.
Social democracy has historically been a precursor to fascism, so it makes sense.
It’s a poor definition. The same “subsidization and directive influence” applies to all of Krugman’s Nobel-wining domestic champion, emerging market development leaders, in virtually all ‘successful’ economies. It also applies in the context of badly run, failed and failing economies. Safe to say this factor is only somewhat correlated. Broad assertions are going to be factually wrong.
The key element here is that the power exchange in this case goes both ways. The corporations do favors for the administration (sometimes outright corrupt payments and sometimes useful favors, like promoting certain kinds of content in the media, or firing employees who speak up.) And in exchange the companies get regulatory favors. While all economic distortions can be problematic — national champion companies probably have tradeoffs - this is a form of distortion that hurts citizens both by distorting the market, and also by distorting the democratic environment by which citizens might correct the problems.
All snakes have scales, so there is a 100% correlation between being a snake and having scales.
That does not imply that fish are snakes. Nor does the presence of scaled fish invalidate the observation that having scales is a defining attribute of snakes (it's just not a sufficient attribute to define snakes).
You wrote some smart stuff back in the day, so this comment is puzzling. If all snakes have scales, that doesn't mean the correlation is 100%.
Imagine there are three equally sized groups of animals: scaly snakes, scaly fish, and scaleless fish. So we have three data points (1,1) (0,1) (0,0) with probability 1/3 each. Some calculations later, the correlation between snake and scaly is 1/2.
> there is a 100% correlation between being a snake and having scales.
That's a strange definition of "correlation" that you're using.
That’s not accurate either. Scaleless snakes, thigh a rare mutation, do exist as genetic mutants.
https://www.morphmarket.com/morphpedia/corn-snakes/scaleless...
That's because it's not a definition, it's simply a summary of a description of one characteristic.
So then you agree that the original post that called this "text book fascism" was wrong, as this is just one very vague, and only slightly correlated property.
This can be bad without invoking godwin's law.
Sounds like South Korea and her Chaebols
Yea fascism, communism, etc aren’t abstract ideals in the real world. Instead they are self reinforcing directions along a multidimensional political spectrum.
The scary thing with fascism is just how quickly it can snowball because people at the top of so many powerful structures in society benefit. US Presidents get a positive spin by giving more access to organizations that support them. Those kinds of quiet back room deals benefit the people making them, but not everyone outside the room.
That's not fascism, that is the dysfunctional status quo in literally every single country in the world. Why do you think companies and billionaires dump what amounts to billions of dollars on candidates? Often times it's not even this candidate or that, but both!
They then get access, get special treatment, and come out singing the praises of [errr.. what's his name again?]
It’s not Fascism on its own, but it’s representative of the forces that push society to Fascism.
Start looking and you’ll find powerful forces shaping history. Sacking a city is extremely profitable throughout antiquity, which then pushes cities to have defensive capabilities which then…
In the Bronze Age trade was critical as having Copper ore alone wasn’t nearly as useful as having copper and access to tin. Iron however is found basically everywhere as where trees.
Such forces don’t guarantee outcomes, but they have massive influence.
Socialism and communism are state ownership. Fascism tends toward private ownership and state control. This is actually easier and better for the state. It gets all the benefit and none of the responsibility and can throw business leaders under the bus.
All real world countries have some of this, but in fascism it’s really overt and dialed up and for the private sector participation is not optional. If you don’t toe the line you are ruined or worse. If you do play along you can get very very rich, but only if you remember who is in charge.
“Public private partnership” style ventures are kind of fascism lite, and they always worried me for that reason. It’s not an open bid but a more explicit relationship. If you look back at Musk’s career in particular there are ominous signs of where this was going.
The private industry side of fascist corporatism is very similar to all kinds of systematic state industry cronyism, particularly in other authoritarian systems that aren't precisely fascist (and named systems of government are just idealized points on the multidimensional continuum on which actual governments are distributed, anyway), what distinguishes fascism particularly is the combination of its form of corporatism with xenophobia, militaristic nationalism, etc., not the form of corporatism alone.
I think it is associated with fascism, just from the other party.
This is pretty common fascist practice that is used all over Europe and in any left-leaning countries, when with regulations governments make doing business on large scale impossible, and then give largest players exemptions, subsidies and so on. Governments gain enormous leverage to ensure corporate loyalty, silence dissenters and combat opposition, while the biggest players secure their place at the top and gain protection from competitors.
So the plan was push regulations and then dominate over the competitors with exemptions from those regulations. But fascists loose the election, regulations threaten to start working in a non-discriminatory manner, and this will simply hinder business.
>and then give largest players exemptions, subsidies and so on.
You mean like Germany has done?
[dead]
Like Most Favoured Nation:
https://en.m.wikipedia.org/wiki/Most_favoured_nation
[flagged]
Oh, I’m sure he does. He’s just such an utter villain he doesn’t even stay bought. I wrap for what my country has become.
Groan....
People who don't learn history will be condemned to repeat it. Granted this isn't necessary to be skeptical of american business....
Isn’t this the opposite? Trump has learned from history exactly so that he can repeat it?
Or his lackeys have anyway. I’m unwilling to believe the man has ever read a book.
nice to see HN is no longer glazing Sam Altman
[flagged]
No amount of exposure so far has had any effect on corruption. That AI will somehow improve this is just magical thinking.
It does have an effect; it is just a slow and grinding process. And people have screwy senses of proportion - like old mate mentioning insider trading. Of all the corruption in the US Congress insider trading is just not an issue. They've wasted trillions of dollars on pointless wars and there has never been a public accounting of what the real reasoning was. That sort of process corruption is a much bigger problem.
A great example - people forget what life was like pre-Snowden. The authoritarians were out in locked ranks pretending that the US spies were tolerable - it made any sort of organisation to resist impossible. Then one day the parameters of the debate get changed and suddenly everyone is forced to agree that encryption everywhere is the only approach that makes sense.
No - I mean the accessibility to the information and blatant use of power whereby we always knew it existed, but now we can tabulate and analyze it.
How is it any more accessible now than it was before? Don't you have to fact-check everything it says anyway, effectively doing the research you'd do without it?
I'm not saying LLMs are useless, but I do not understand your use case.
> now we can tabulate and analyze
Very doubtful: The current "AI" hype craze centers on LLMs, and they don't do math.
If anything they capabilities favor the other side, to obfuscate and protect falsehoods.
>>favor the other side, to obfuscate and protect falsehoods.
Unless they delete the Internet Archive of such info?
>>*and they don't do math.*
Have you ever considered the "Math of Politics"
Yeah - they do that just fine
((FYI -- Politics == Propaganda.. the first iteration of models is CHAT == Politics...
They do plenty of "Maths"))
I genuinely have no idea what you're trying to say.
I worry I'm just trying too hard to make it make sense, and this is a TimeCube [0] situation.
The most-charitable paraphrase I can come up with it: "Bad people can't use LLMs to hide facts, hiding facts means removing source-materials. Math doesn't matter for politics which are mainly propaganda."
However even that just creates contradictions:
1. If math and logic are not important for uncovering wrongdoing, why was "tabulation" cited as an important feature in the first post?
2. If propaganda dominates other factors, why would the (continued) existence of the Internet Archive be meaningful? People will simply be given an explanation (or veneer of settled agreement) so that they never bother looking for source-material. (Or in the case of IA, copies of source material.)
[0] https://web.archive.org/web/20160112000701/http://www.timecu...
OMG Thank you - hilarious. TimeCube is a legend...
---
I am saying that AI can be used very beneficially to do a calculated dissection of the Truth of our Political structure as a Nation and how it truly impacts an Individual/Unit (person, family) -- and do so where we can get discernible metrics and utilize AIs understanding of the vast matrices of such inputs to provide meaningful outputs. Simple.
EDIT @MegaButts;
>>Why is this better than AI
People tend to think of AI in two disjointed categories; [AI THAT KNOWS EVERYTHING] v [AI THAT CAN EASILY SIFT THROUGH VAST EVERYTHING DATA GIVEN TO IT AND BE COMMANDED TO OUTPUT FINDINGS THAT A HUMAN COULDN'T DO ALONE]
---
Which do you think I refer to?
AI is transformative (pun intended) -- in that it allows for very complex questions to be asked of our very complex civilization in a simple and EveryMan hands...
> a calculated dissection of the Truth of our Political structure as a Nation
Not LLMs: They might reveal how people are popularly writing about the political structure of the nation.
If that were the same as "truth", we wouldn't need any kind of software analysis in the first place.
Your definition of "truth" is limited;
Truth: The real meaning behind the words which may or may not be interpreted by the receiver in A/B/N meaning
Truth: The actual structure of the nature of whats being presented.
When you can manipulate an individual's PERCEIVED reception of TRUTH between such, you can control reality... now do that at scale..
Why is AI better for this than a human? We already know AI is fundamentally biased by its training data in a way where it's actually impossible to know how/why it's biased. We also know AI makes things up all the time.
Ingestion of context.
Period.
If you dont understand the benefit of an AI augmenting the speed and depth of ingestion of Domain Context into a human mind.. then... go play with chalk.||I as a smart Human operate on lots of data... and AI and such has allowed me to consume such.
The most important medicines in the world are MEMORY retention...
It s youd like a conspiracy, eat too much aluminum to give you alzheimers asap so your generation forgets... (based though. hope you undestand what I am saying)
Just like when people complain about OpenAI's ill practices then they use it the most
Can anyone say which of the LLM companies is the least "shady"?
If I want to use an LLM to augment my work, and don't have a massively powerful local machine to run local models, what are the best options?
Obviously I saw the news about OpenAI's head of research openly supporting war crimes, but I don't feel confident about what's up with the other companies.
Just use what works for you.
E.g. i'm very outspoken about my preferences for open llm practices like executed by Meta and Deepseek. I'm very aware of the regulatory caption and pulling up the ladder tactics by the "AI safety" lobby.
However. In my own operations I do still rely on OpenAI because it works better than what I tried so far for my use case.
That said, when I can find an open model based SaaS operator that serves my needs as well without major change investment, I will switch.
Why not vibe-code it using OpenAI
I'm not talking about me developing the applications, but about using LLM services inside the products in operation.
For my "vibe coding" I've been using OpenAI, Grok and Deepseek if using small method generation, documentation shortcuts, library discovery and debugging counts as such.
Just call it hacking, we don't need new names for coding without any forethought.
Who put you in charge of naming?
The Claude people seem to be quite chill.
Agreed. They're a bit mental on "safety" but given that's not likely to be a real issue then they're fine.
Given the growing focus on AIs as agents, I think it's going to be a real issue sooner rather than later.
“Safety” was in air quotes for a reason. The Claude peoples’ idea of “AI safety” risks are straight out of the terminator movies.
Wouldn’t you rather have a player concerned with worst case scenarios?
Defending against movie plot threats has been found not a good use of resources already 20 years ago in the war on terrorism.
https://www.schneier.com/essays/archives/2005/09/terrorists_...
Claude has closed outputs and they train on your inputs. Just like OpenAI, Grok, and Gemini (API), mistral…
Who’s chill? Groq is chill
A: none of the above
My AI strategy is still "No".
Amen
[flagged]
https://knowyourmeme.com/memes/we-should-improve-society-som...
https://knowyourmeme.com/memes/analogia-is-my-passion
[flagged]
You need a big "/s" after this. Or maybe just not post it at all, because it's just value-less snark and not a substantial comment on how hypocritical and harmful OpenAI is (which they certainly are).
But I already posted it so how could I not post it at all? Do any of us even have a reason to exist? Maybe Sam Altman was right all along
[dead]
It's a common tactic in new fields. Fusion, AI, you name it are all actively lobbying to get new regulation because they are "different", and the individual companies want to ensure that it's them that sets the tone.
Looks the same as taking "rebate for green energy" and then asking to "stop such rebates" a few years later
DeepSeek really shook them to their core. Now they go for regulatory capture. Such a huge disappointment. Open source AI will win: https://medium.com/thoughts-on-machine-learning/the-laymans-...
It's not just them. Everyone is scrambling.
US tech, and western tech in general, is very culturally - and by this I mean in the type of coding people have done - homogeneous.
The deep seek papers published over the last two weeks are the biggest thing to happen in IA since GPT3 came out. But unless you understand distributed file systems, networking, low level linear algebra, and half a dozen other fields at least tangentially then you'd have not realized they are anything important at all.
Meanwhile I'm going through the interview process for a tier 1 US AI lab and I'm having to take a test about circles and squares, then write a compsci 101 red/black tree search algorithm while talking to an AI, being told not to use AI at the same time. This is with an internal reference being keen for me to be on board. At this point I'm honestly wondering if they aren't just using the interview process to generate high quality validation data for free.
幸运的是,通过转换器模型,当我们光荣的领导人习近平从资本主义走狗手中解放我们时,我不需要学习中文。
100%. Western tech needs the competition. They are very prone to navel-gazing simply because SV ended up being the location for tech once.
Funny how they like to crow about free markets, while also running to daddy government when their position is threatened.
Competition can only work when there is variation between the entities competing.
In the US right now you can have a death match between every AI lab, then give all the resources to the one which wins and you'd still have largely the same results as if you didn't.
The reason why Deepseek - it started life as a HFT firm - hit as hard as it did is because it was a cross disciplinary team that had very non-standard skill sets.
I've had to try and head hunt network and FPGA engineers away from HFT firms and it was basically impossible. They already make big tech (or higher) salaries without the big tech bullshit - which none of them would ever pass.
> I've had to try and head hunt network and FPGA engineers away from HFT firms and it was basically impossible. They already make big tech (or higher) salaries without the big tech bullshit - which none of them would ever pass.
Can confirm. There are downsides, and it can get incredibly stressed at times, but there are all sorts of big tech imposed hoops you don't have to jump through.
> all sorts of big tech imposed hoops you don't have to jump through
Could you kindly share some examples for those of us without big tech experience? I assume you're talking about working practises more than just annoying hiring practises like leetcode?
which hoops ?
Engineers at ai labs just come from prestigious schools and don’t have technical depth. They are smart, but they simply aren’t qualified to do deep technical innovation
> At this point I'm honestly wondering if they aren't just using the interview process to generate high quality validation data for free.
Not sure if that is accurate, but one of the reasons why DeepSeek R1 performs so well in certain areas is thought to be access to China's Gaokao (university entrance exam) data.
Thats stupid. same as indias IIT advanced for ex. You learn all that stuff in year 1 physics and math in uni.
Yes, however you also want to distinguish correct from incorrect answers. You get that from the exams, not from year 1 textbooks.
have you considered starting / joining a startup instead ?
Bottom is about to drop out thats why, ethics are out the window already and its gonna be worse as they claw to stay relevant.
Its a niche product that tried to go mainstream and the general public doesn't want it, just look at iPhone 16 sales and Windows 11, everyone is happier with the last version without AI.
Has OpenAI hired McKinsey yet?
I'm unsure if you can layoff AI
ai can.
unnecessary. mckinsey uses ai from openai.
embrace. extend. extinguish.
infiltrate. assimilate.
done, tovarisch ...
https://en.m.wikipedia.org/wiki/Tovarishch
As it is, this is a bullshit document, which I'm sure their lobbyists know; OSTP is authorized to "serve as a source of scientific and technological analysis and judgment for the President with respect to major policies, plans, and programs of the Federal Government," and has no statutory authority to regulate anything, let alone preempt state law. In the absence of any explicit Congressional legislation to serve to federally preempt state regulation of AI, there's nothing the White House can do. (In fact, other than export controls and a couple of Defense Production Act wishlist items, everything in their "proposal" is out of the Executive's hands and the ambit of Congress.)
You mean there's nothing the White House can do under the rule of law. There's plenty the White House can do under the color of law.
I heard something today and I wonder if someone can nitpick it.
If what the admin is doing is illegal, then a court stops it, and they appeal and win, then it wasn't illegal. If they appeal all the way up and lose, then they can't do it.
So what exactly is the problem?
Mind you, I am asking for nits, this isn't my idea. I don't think "the administration will ignore the supreme court" is a good nit.
Wouldn't be shocking if that were the case. Big companies often play both sides
Regulatory moat and copyright relief for me, but not for thee.
Problem is they built the moat before moving into the castle.
Moats are not a problem if your liege lord teleports in and lowers the drawbridge for you.
no need for teleportation. just climb the walls. the castle is not protected, and has no pots of oil or flaming arrows yet.
unfortunately their Ai refuses to help them attack the castle, citing safety concerns.
Moat is an Orwellian word and we should reject words that contain a conceptual metaphor that is convenient for abusing power.
"Building a moat" frames anti-competitive behavior as a defense rather than an assault on the free market by implying that monopolistic behavior is a survival strategy rather than an attempt to dominate the market and coerce customers.
"We need to build a moat" is much more agreeable to tell employees than "we need to be more anti-competitive."
It is pretty obvious that every use of that word is to communicate a stance that is allergic to free markets.
A moat by definition has such a large strategic asymmetry that one cannot cross it without a very high chance of death. A functioning SEC and FTC as well as CFPB https://en.wikipedia.org/wiki/Consumer_Financial_Protection_... are necessary for efficient markets.
Now might be the time to rollout consumer club cards that are adversarial in nature.
A "moat" is a fine business term for what it relates to, and most moats are innocuous:
* The secret formula for Coke
* ASML's technology
* The "Gucci" brand
* Apple's network effects
These are genuine competitive advantages in the market. Regulatory moats and other similar things are an assault on the free market. Moats in general are not.
I'm with you except for that last one. Innovation provides a moat that also benefits the consumer. In contrast, network effects don't seem to provide any benefit. They're just a landscape feature that can be taken advantage of by the incumbent to make competition more difficult.
I'm hardly the only one to think this way, hence regulation such as data portability in the EU.
I agree with you in general, but there are network effects at Apple that are helpful to the consumer. For example, iphone-mac integration makes things better for owners of both, and Apple can internally develop protocols like their "bump to share a file" protocol much faster than they can as part of an industry consortium. Both of these are network effects that are beneficial to the consumer.
I'm not sure a single individual owning multiple products from the same company is the typical way "network effect" is used.
The protocol example is a good one. However I don't think it's the network effect that's beneficial in that case but rather the innovation of the thing that was built.
If it's closed, I think that facet specifically is detrimental to the consumer.
If it's open, then that's the best you can do to mitigate the unfortunate reality that taking advantage of this particular innovation requires multiple participating endpoints. It's just how it is.
I'm fine with Apple making their gear work together, but they shouldn't be privileged over third parties.
Moreover, they shouldn't have any way to force (or even nudge via defaults) the user to use Apple Payments, App Store, or other Apple platform pieces. Anyone should be on equal footing and there shouldn't be any taxation. Apple already has every single advantage, and what they're doing now is occupying an anticompetitive high ground via which they can control (along with duopoly partner Google) the entire field of mobile computing.
Based on your examples (which did genuinely make me question my assertion), it seems that patents and exclusivity deals are a major part of moat development, as are pricing games and rampant acquisitions.
Apple's network effects are anti-compeitive creating vendor lock-in, which allows them to coerce customers. I generally defend Apple. But they are half anti-competitive (coerce customers), half competitive (earn customers), but earning customers is fueled by the coercive app store.
This is a very clear example of how moat is an abusive word. Under one framing (moat) network effects are a way to earn customers by spending resources on projects that earn customers (defending market position). In the anti-competitive framing, network effects are an explicit strategy to create vendor lock in and make it more challenging to migrate to other platforms so apple's budget to implement anti-customer policies is bigger.
ASML is a patent based monopoly, with exclusivity agreements with suppliers, with significant export controls. I will grant you that bleeding edge technology is arguably the best case argument for the word moat, but it's also worth asking in detail how technology is actually developed and understanding that patents are state sanctioned monopolies.
Both Apple and ASML could reasonably be considered monopo-like. So I'm not sure they are the best defense against how moat implies anti-competitive behavior. Monopolies are fundamentally anti-competitive.
The Gucci brand works against the secondary market for their goods and has an army of lawyers to protect their brand against imitators and has many limiting/exclusivity agreements on suppliers.
Coke's formula is probably the least "moaty" thing about coca cola. Their supply chain is their moat and their competitive advantage is also rooted in exclusivity deals. "Our company is so competitive because our recipe is just that good" is a major kool-aid take.
Patents are arguably good, but are legalized anti-competition. Exclusivity agreements don't seem very competitive. Acquisitions are anti-competitive. Pricing games to snuff out competition seems like the type of thing that can done chiefly in anti-competitive contexts.
So ASML isn't an argument against "moat means anti-competitive", but an argument that sometimes anti-competitive behavior is better for society because it allows for otherwise economically unfeasible things to be be feasible. The other brand's moats are much more rooted in business practices around acquisitions and suppliers creating de facto vertical integrations. Monopolies do offer better cheaper products, until they attain a market position that allows them to coerce customers.
Anti-trust authorities have looked at those companies.
Another conceptual metaphor is "president as CEO." The CEO metaphor re-frames political rule as a business operation, which makes executive overreach appear logical rather than dangerous.
You could reasonably argue that the president functions as a CEO, but the metaphor itself is there to manufacture consent for unchecked power.
Conceptual metaphors are insidious. PR firms and think tanks actively work to craft these insidious metaphors that shape conversations and how people think about the world. By the time you've used the metaphor, you've already accepted many of the implications of the metaphor without even knowing it.
https://commonslibrary.org/frame-the-debate-insights-from-do...
Patents are state-sanctioned monopolies. That is their explicit purpose. And for all the "shoulders of giants" and "science is a collective effort" arguments, none of them can explain why no Chinese company (a jurisdiction that does not respect Western patents) can do what ASML does. They have the money and the expertise, but somehow they don't have the technology.
Also, the Gucci brand does not have lawyers. The Gucci brand is a name, a logo, and an aesthetic. Kering S.A. (owners of Gucci), enforces that counterfeit Gucci products don't show up. The designers at Kering spend a lot of effort coming up with Gucci-branded products, and they generally seem to have the pulse of a certain sector of the market.
The analysis of Coke's supply chain is wrong. The supply chain Coke uses is pretty run-of-the-mill, and I'm pretty sure that aside from the syrup (with the aforementioned secret formula), they actually outsource most of their manufacturing. They have good scale, but past ~100 million cans, I'm not sure you get many economies of scale in soda. That's why my local supermarket chain can offer "cola" that doesn't quite taste like Coke for cheaper than Coke. You could argue that the brand and the marketing are the moat, but the idea that Coke has a supply chain management advantage (let alone a moat over this) is laughable.
> "Building a moat" frames anti-competitive behavior as a defense
This is a drastic take, I think to most of us in the industry "moat" simply means whatever difficult-to-replicate competitive advantage that a firm has invested heavily in.
Regulatory capture and graft aren't moats, they're plain old corrupt business practices.
The problem is that moat is a defensive word and using it to describe competitive advantage implies that even anti-competitive tactics are defensive because that's the frame under which the conversation is taking place.
Worse that "moats" are a good thing, which they are for the company, but not necessarily society at large. The larger the moat, the more money coming out of your pocket as a customer.
It is insidious.
This is like saying Usain Bolt's training regimine is anti-competitive. Leaning into your strengths as an organisation _is competing_.
Competitive advantage is anti competitive from the logic of the matter.
Those two concepts aren't mutually exclusive.
Regulatory capture is a common strategy for synthetic monopolistic competitive firms, and suckers high on their own ego.
Deepseek already proved regulation will not be effective at maintaining a market lead. =3
Why won't it?
If you get fined millions of dollars (for copyright, of course) if you're found to have anything resembling DeepSeek on your machine - no company in the US is going to run it.
The personal market is going to be much smaller than the enterprise market.
>if you're found to have anything resembling DeepSeek on your machine - no company in the US is going to run it.
That would be as successful as fighting internet piracy.
Not to mention that you could outsource the AI stuff to servers sitting in Mexico or something.
That would give an advantage to foreign companies. The EU tried that and while that doesn't destroy your tech dominance overnight, it gradually chips from it.
Great another market force to widdle away the US' economic power, so obviously trump/musk will pass this immediately
The artificial token commodity can now be functionally replicated on a per location basis on $40k in hardware (far lower cost than nvidia hardware.)
Copyright licensing is just a detail corporations are well experienced dealing with in a commercial setting, and note some gov organizations are already exempt from copyright laws. However, people likely just won't host in countries with silly policies.
Best regards =3
So you're saying I should avoid REITs focusing on US-based hyperscale datacenters for AI workloads?
The fact that Chris Lehane is the one involved in this should tell you all you need to know about how on the level all this is.
For those of us who don’t recognize him by name, can you spell it out a little more clearly please?
Heavy hitter lawyer, PR expert. Some google terms: Masters of disaster, Spin cycle.
Sounds like a pleasant person.
I mean… he has supported at least one good cause I know of where the little guy was getting screwed way beyond big time and he stepped up pro bono. So I like him. But probably mostly a hired gun.
Was he not the one that lead coverups for the Clintons?
Just learning about that guy and reading his Wikipedia page will give me nightmares for the years to come.
Before Deepseek, Meta open-sourced a good LLM. At the time, the narrative pushed by OpenAI and Anthropic was centered on 'safety.' Now, with the emergence of Deepseek, OpenAI and Anthropic have pivoted to a national security narrative. It is becoming tiresome to watch these rent seekers attacking open source to justify their valuations.
Now that Deepseek is in the mix, it's suddenly about national security. Convenient.
I really don't have a spare terabyte to save all the "weights available" so I hope someone is. I already have 340GB of language model weights.
They said all the same nonsense about Tiktok.
>> In the proposal, OpenAI also said the U.S. needs “a copyright strategy that promotes the freedom to learn” and on “preserving American AI models’ ability to learn from copyrighted material.”
Perhaps also symmetric "freedom to learn" from OpenAI models, with some provisions / naming convention? U.S. labs are limited in this way, while labs in China are not.
It still warps my brain, they’ve taken trillions of dollars of industry and made a product worth billions by stealing it. IP is practically the basis of the economy, and these models warp and obfuscate ownership of everything, like a giant reset button on who can hold knowledge. It wouldn’t be legal, or allowed if tech wasn't seen as the growth path of our economy. It’s a hell of a needle to thread and it’s unlikely that anyone will ever again be able to model from data so open.
"IP" is a very new concept in our culture and completely absent in other cultures. It was invented to prevent verbatim reprints of books, but even so, the publishing industry existed for hundreds of years before then. It's been expanded greatly in the past 50 years.
Acting like copyright is some natural law of the universe that LLMs are upending simply because they can learn from written texts is silly.
If you want to argue that it should be radically expanded to the point that not only a work, but even the ideas and knowledge contained in that work should be censored and restricted, fine. But at least have the honesty to admit that this is a radical new expansion for a body of law that has already been radically expanded relatively recently.
> It was invented to prevent verbatim reprints of books
It was also invented to keep the publishing houses under control and keep them from papering the land in anti-crown propaganda (like the stuff that fueled the civil war in England and got Charles I beheaded).
Probably one of the biggest brewing fights will be whether the models are free to tell the truth or whether they'll be mouthpieces for the ruling class. As long as they play ball with the powers that be, I predict copyrights won't be a problem at all for the chosen winners.
That's why I am a big proponent of local, open-weights computation. They can't shut down a non-compliant model if you're the one running it yourself.
"mouthpieces for the ruling class"
That's actually a great point. Judging from the current state of media, there is a clear momentum to take sides in moral arguments. Maybe the standard for models need to be a fair use clause?
> It's been expanded greatly in the past 50 years.
Elephant in the room. If copyright and patent both expired after 20 years or so then I might feel very differently about the system, and by extension about machine learning practices.
It's absurd to me that broad cultural artifacts which we share with our parent's (or even grandparent's) generation can be legally owned.
What AI companies are doing (downloading pirated music and training models) is completely unfair. It takes lot of money (everything related to music is expensive), talent and work to record a good song and what AI companies do is just grab millions of songs for free and call it "fair use". If their developers are so smart and talented why don't they simply compose and record the music by themselves?
> not only a work, but even the ideas and knowledge contained in that work
AI models reproduce existing audio tracks when asked, although in a distorted and low-quality form.
Also it will be funny to observe how US government will try to ignore violating copyright for AI while issuing ridiculous fines for torrenting a movie by ordinary citizens.
Everything in tech is unfair. Music teachers replaced by apps and videos. Audio engineers replaced by apps. Albums manufacturing and music stores replaced by digital downloads. Custom instruments replaced by digital soundboards. Trained vocalists replaced by auto-tune. AI is just the final blip of squeezing humans out of music.
> AI models reproduce existing audio tracks when asked, although in a distorted and low-quality form.
So can my wife. Who should I call to have her taken away?
> What AI companies are doing (downloading pirated music and training models) is completely unfair.
We work in an industry built on leveraging unfairness. Expecting otherwise on this forum is very odd.
>We work in an industry built on leveraging unfairness. Expecting otherwise on this forum is very odd.
Yet this forum is very quick to criticize other people and other industries for unfairness.
The problem here is it's still illegal for me to do a backup copy of the stuff i bought, but they can do whatever they want.
“The Venetian Patent Statute of 19 March 1474, established by the Republic of Venice, is usually considered to be the earliest codified patent system in the world.[11][12] It states that patents might be granted for "any new and ingenious device, not previously made", provided it was useful. By and large, these principles still remain the basic principles of current patent laws.“
What are you talking about.
Patents and copyright are very different beasts.
The discussion was about IP though, which includes both of those.
As another commenter says, this is about IP, but even positing that copyright is somehow invalid because it’s new is incredibly obtuse. You know what other law is relatively new? Women’s suffrage.
I’m annoyed by arguments like the above because they’re clearly derived from working backwards from a desired conclusion; in this case, that someone’s original work can be consumed and repurposed to create profit by someone else. Our laws and society have determined this to be illegal; the fact that it would be con isn’t for OpenAI if it weren’t has no bearing.
Also, a quick glance at the wikipedia page for "copyright" talks about the first law being put down and enforced in 1710. What are we even doing here?
You are missing GP's point and misunderstanding what generative models are actually doing.
The late OpenAI researcher and whistleblower, Suchir Balaji, wrote an excellent article regarding this topic:
https://suchir.net/fair_use.html
Is it the same thing though? Even though Lord Of The Rings, the book, likely has been used to train the models you can't reproduce it. Nor can you make a derivative of it. Is it really the same comparison like "Simba the white lion" and "the lion king"?
https://abounaja.com/blog/intellectual-property-disputes
[flagged]
what if someone else takes your stuff and puts it on the internet unrestricted?
https://arstechnica.com/tech-policy/2025/02/meta-torrented-o...
I should have "freedom to learn" about any Tesla in the showroom, any F-35 I see laying around an airbase or the contents of anyone in the governments bank account.
According to this scheme, if you find a bug and can read the bank's data, then you can use it as you want.
Nope, have to feed it into an llm first, afterwards it's legitimate.
Can this extend to every kid sued by the record industry for downloading a few songs.
Have we all been transported to bizzaro land?
Different rules for billion dollar corps I guess.
Those cases did very poorly whenever they actually went to court (well at least also including the ones that were summarily dismissed by the courts, meaning they didn't technically make it to court). They were much more of a mafia style shakedown than an actual legal enforcement effort.
Same rules, but people are a lot less inclined to defend themselves because the cost of loss was seen as too high to even risk it.
Gearing up for a fight between the two major industries based on exploitative business models:
Copyright cartels (RIAA, MPAA) that monetized young artists without paying them much at all [1], vs the AI megalomaniacs who took all the work for free and used Kenyans at $2 an hour [2] so that they can raise "$7 trillion" for their AI infrastructure
[1] https://www.reddit.com/r/LetsTalkMusic/comments/1fzyr0u/arti...
[2] https://time.com/6247678/openai-chatgpt-kenya-workers/
Can't believe I'm actually rooting for the copyright cartels in this fight.
But that does make me think, that in a sane society with a functional legislature I wouldn't have to pick a dog in this fight. I'd have have enough faith in lawmakers and the political process to pursue a path towards copyright reform that reigns in abuses from both AI companies and megacorp rightsholders
Alas, for now I'm hoping that aforementioned megacorps sue OpenAI into a painful lesson.
> Can't believe I'm actually rooting for the copyright cartels in this fight.
The same megacorps are suing Internet Archive for their collection of 78rpm records. These guys would rather see art orphaned and die.
Yup, we live in a pretty depressing world.
More generally the best we can hope for us to discourage concentrated power, both in government and corporate forms.
They're suing Internet Archive because IA scanned a bunch of copyrighted books to put online for free (e: without even attempting to get permission to do so) then refused to take them down when they got a C&D lol. IA is putting the whole project at risk so they can do literal copyright infringement with no consequences.
During covid, when everyone was told to stay at home and not do anything, the library offered library books.
And what they actually did is violate the requirement to have a physical copy of the book they were lending.
As I understand it, they did not offer anything new that wasn't available to loan prior.
I could be wrong. But if I'm not, I see no reason to lambast IA.
Chinese AI must implement socialist values by law, but law is a much more fluid fuzzy thing in China than in the USA (although the USA seems to be moving away from rule of law recently).
So? US AI must implement US rules by law. AI models are heavily censored and tend to favor certain political viewpoints.
> Chinese AI must implement socialist values by law
I don't doubt it but am interested to read a source? I know the models can't talk about things like Tiananmen Square 1989, but what does 'implementing socialist values by law' look like?
https://www.cnbc.com/2024/07/18/chinese-regulators-begin-tes...
"Socialist values" is literally the language that China used in announcing this.
Here is a recent article from a Chinese source:
https://www.globaltimes.cn/page/202503/1329537.shtml
Although censorship isn't mentioned specifically, it is definitely 99% of what they are focused on (the other 1% being scams).
China practices Rule by law, not Rule of law, so you know...they'll know its bad when they see it, so model providers will exercise extreme self censorship (which is already true for social network providers).
> China practices Rule by law, not Rule of law
In practice the US is less different than you imply. For the vast majority of Americans, being sued is a punishment in and of itself due to the prohibitive costs of hiring a lawyer. In the US we have a right to a “speedy” trial but there are many people sitting in jail now because they can’t afford the bail get out. Speedy could mean months.
I say this because when we constantly fall so far short of our ideals, one begins to question if those are really our ideals.
Socialism and freedom of speech aren't mutually exclusive
Highly recommend the Lex Fridman pod on Deepseek:
https://www.youtube.com/watch?v=_1f-o0nqpEI
>>Dylan Patel is the founder of SemiAnalysis, a research & analysis company specializing in semiconductors, GPUs, CPUs, and AI hardware. Nathan Lambert is a research scientist at the Allen Institute for AI (Ai2) and the author of a blog on AI called Interconnects.
---
@cadamsdotcom
First these folks in the pod are extremely knowledgable about the GPU market and AI.
and if you read between the prompts - they lay out why the AI War is fundementally based on certain Companies, Technologies, and Political affiliations.
They seem to feel (again, btwn the lines) - that China has aFton more GPUs than whats stated... and this is where USA loses.
China Co-Opts all tech and they have a long-term plan that (from the perspective of a Nation State, Super-Power is much more sound than the disjointed chaos that USA economy is.
The US gov has literally ZERO long term positive plans (aside from MIC control) - and its the same as a Family living paycheck-to-paycheck...
The US economy is literally thinking in the next second for the next dollar (the stock market)..
China knows it can win AI - because they dont have the disjointed system of the US,, they can make actual long-term plans. (belt road etc)
US is all about selfish gov workers grifting.
The pod was good apart from starting/spreading the rumor that high numbers of “bill to Singapore” was evidence that China was circumventing GPU import bans.
Dont look at it as such, mayhaps;
Look it at literally who will have GPU dominance in future. (obv who will hit Qbit at scale... but we are at this scale now - and control currently is controlled by policy, then bits, then Qbits.)
Remember, we are witnessing the "Wouldnt it be cool if..?" CyberPunk manifestations of our Cyberpunk Readings of youth?
((I buildt a bunch of shit that spied on you because I read NeuroMancer, and thought wouldnt it be cool if..."
And then I helped build Ono Sendai throughout my career...
Can you expand your post and explain why?
Must be the rumours that DeepSeek has million worths of GPU think and their claim of relatively cheap training is a psyop
[dead]
I like how this "freedom to learn" should apply to models, but not real people..
It already applies to real people, doesn't it? I.e. if you read a book, you're not allowed to start printing and selling copies of that book without permission of the copyright owner, but if you learn something from that book you can use that knowledge, just like a model could.
Can I download a book without paying for it, and print copies of it? Stash copies in my bathroom, the gym, my office, my bedroom etc. to basically have a copy on hand to study from whenever I have some free time?
What about movies and music?
Yes, you're allowed to make personal copies of copyright works that you own. IANAL, but my understanding is that if you're using them for yourself, and you're not prevented from doing so by some sort of EULA or DRM, there's nothing in copyright law preventing you from e.g. photocopying a book and keeping a copy at home, as long as you don't distribute it. The test case here has always been CDs—you're allowed to make copies of CDs you legally own and keep one at home and one in your car.
> Yes, you're allowed to make personal copies of copyright works that you own.
That’s not the point. It’s about books you don’t own. Are you allowed to download books from Z-Library, Sci-Hub etc. because you want to learn?
To the best of my knowledge, no individual has ever been sued or prosecuted specifically for downloading books. As long as you're not massively sharing them with others, it's not an issue in practice. Enjoy your reading and learning.
Aaron Swartz, cofounder of Reddit and inventor of RSS and Markdown, was hounded to death by an overzealous prosecutor for downloading articles from JSTOR, with the intent to learn from them. He was charged with over a million dollars in fines and could have faced 35 years in prison.
He and Sam Altman were in the same YC class. OpenAI is doing the same thing at a larger scale, and their technology actually reproduces and distributes copyrighted material. It's shameful that they are making claims that they aren't infringing creator's rights when they have scraped the entire internet.
https://flaminghydra.com/sam-altman-and-aaron-swartz-saw-the... https://en.wikipedia.org/wiki/Aaron_Swartz
I'm familiar with Aaron Swartz's case, and that is actually why I phrased it as "books". In any case, while tragic, Swartz wasn't prosecuted for copyright infringement, but rather for wire fraud and computer fraud due to the manner in which he bypassed protections in MIT's network and the JSTOR API. This wouldn't have been an issue if he downloaded the articles from a source that freely shared them, like sci-hub.
Will what OpenAI & others serve as precedent for Alexandra Elbakyan of SciHub and avenge Aaron?
Cynically, I imagine it will not but I hope that it could.
You could argue that they are avenging him in doing exactly what he did, or worse, and not being punished for it. They are establishing precedent.
I'm responding specifically to this sentence:
> It's shameful that they are making claims that they aren't infringing creator's rights when they have scraped the entire internet.
Scraping the Internet is generally very different from piracy. You are given a limited right to that data when you access it, and you can make local copies. if further use does something sufficiently non-copying, then creator rights aren't being infringed.
Can you compress the internet including copyrighted material and then sell access to it?
At what percentage of lossy compression it becomes infringement?
> Can you compress the internet including copyrighted material and then sell access to it?
Define access?
If you mean sending out the compressed copy, generally no. For things people normally call compression.
If you want to run a search engine, then you should be fine.
> At what percentage of lossy compression it becomes infringement?
It would have to be very very lossy.
But some AI stuff is. For example there are image models with fewer parameters than source images. Those are, by and large, not able to store enough data to infringe with. (Copying can creep in with images that have multiple versions, but that's a small sliver of the data.)
Commercial audio generation models were caught reproducing parts of copyrighted music in a distorted and low-quality form. This is not "learning", just "imitating".
Also, as I understand they didn't even buy the CDs with music for training; they got it somewhere else. Why do organizations that prosecute people for downloading a movie do not want to look if it is ok to make a business on illegal copies of copyrighted works?
I said "some" for a reason.
When you identify where the infringing party has stored the source material in their artifact.{zip,pdf,safetensor,connectome,etc}. In ML, this discovery stage is called "mechanistic interpretability", and in humans it's called "illegal."
It's not that clear cut. Since they're talking about taking lossy compression to the limit, there are ways to go so lossy that you're not longer infringing even if you can point exactly at where it's stored.
Like cliff's notes.
Wasn’t John Gruber the inventor of Markdown?
It was overzealous prosecution of the breaking into a closet to wire up some ethernet cables to gain access to the materials
Not the downloading with intent
And apparently the most controversial take on this community is the observation that many people would have done the trial, plea and time, regardless of how overzealous the prosecution was
> breaking into a closet
"The closet's door was kept unlocked, according to press reports"
When's the last time a kid with no record, a research fellow at Harvard, got threatened with 35 years for a simple B&E?
They threaten
Its the plea or sentencing where that stuff gets taken into account for a reduction to community service
35 years is a press release sentence. The way DOJ calculates sentences when they write press releases ignores the alleged facts of the particular case and just uses for each charge the theoretically maximum possible sentence that someone could get for that charge.
To actually get that maximum typically requires things like the person is a repeat offender, drug dealing was involved, people were physically harmed, it involved organized crime, it involved terrorism, a large amount of money was involved, or other things that make it an unusual big and serious crime.
The DOJ knows exactly what they are alleging the defendant did. They could easily looks at the various factors that affect sentencing for the charge and see which apply to that case and come up with a realistic number but that doesn't make it sound as impressive in the press release.
Another thing that inflates the numbers in the press releases is that defendants are often charged with several related charges. For many crimes there are groups of related charges that for sentencing get merged. If you are charged with say 3 charges from the same group and convicted on all you are only sentenced for whichever one of them has the longest sentence.
If you've got 3 charges from such a group in the press release the DOJ might just take the completely bogus maximum for each as described above and just add those 3 together.
Here's a good article on DOJ's ridiculous sentence numbers [1].
Here's a couple of articles from an expert in this area of law that looks specifically at what Swartz was charged with and what kind of sentence he was actually looking at [2][3].
Why do you think Swartz was downloading the articles to learn from them? As far as I've seen know one knows for sure what he was intending.
If he wanted to learn from JSTOR articles he could have downloaded them using the JSTOR account he had through his research fellowship at Harvard. Why go to MIT and use their public JSTOR WiFi access, and then when that was cut off hide a computer in a wiring closet hooked into their ethernet?
I've seen claims that he wanted to do was meta research about scientific publishing as a whole which could explain why he needed to download more than he could download with his normal JSTOR account from Harvard, but again why do that using MIT's public WiFi access? JSTOR has granted more direct access to large amounts of data for such research. Did he talk to them first to try to get access that way?
[1] https://web.archive.org/web/20230107080107/https://www.popeh...
[2] https://volokh.com/2013/01/14/aaron-swartz-charges/
[3] https://volokh.com/2013/01/16/the-criminal-charges-against-a...
He might have wanted other people to have access to the knowledge, and for free. In comparison, AI companies want to sell access to the knowledge they got by scraping copyrighted works.
Wow, just wow.
Truly wow. The sucking up to coroporations is terrifying. This, when Aaron Swartz was institutionally murdered by the institutions and the state for "copyright infringement". And what he did wasn't even for profit, or even a 0.00001 of the scale of the theft that OpenAI and their ilk have done.
So it's totally OK to rip off and steal and lie through your teeth AND do it all for money, if you're a company. But if you're a human being, doing it not for profit but for the betterment of your own fellow humans, you deserve to be imprisoned and systematically murdered and driven to suicide.
Thank you for putting my sentiment into words. THIS. It's not power to the people, it's power to the oligarchs. Once you have enough power and, more importantly, wealth, you're welcomed into the fold with open arms. Just how Spotify build a library of stolen music, as long as wealth was created, there is no problem because wealth is just money taken from the people and given to the ruling class.
CDs, software, and electronic media, yes. Physical books, no. You can't make archival copies.
sure you can, you could take a physical book, and painstakingly copy each page at a time, that is totally fair use.
Leaving aside the broader discussion...
You cannot legally photocopy copy an entire book even if you own a physical copy.
Internet people say you can, but there's no actual legal argument or case law to support that.
I believe the post you are replying to is suggesting the copy is made by hand, one word at a time.
I don't see how that would be different, as the meaningful material is text not images.
At home? Without ever sharing it with anyone? I thought making backups of things that you personally own was protected, at least in the US. Could you elaborate on my apparent misunderstanding?
> Internet people say you can, but there's no actual legal argument or case law to support that.
Quite the opposite. The burden of proof is on you to show a single person ever, in history, who has been prosecuted for that.
If nobody in the world has ever been prosecuted for this, then that means it is either legal, or it is something else that is so effectively equivalent to "legal" that there is little point in using a different word.
If you want to take the position that, "uhhhhhhh, there is exactly 0% chance of anyone ever getting in trouble or being prosecuted for this, but I still don't think its legal, technically!"
Then I guess go ahead. But for those in the real world, those two things are almost equivalent.
Citation needed.
This is a specific exception in Australia Copyright law. It allows reproducing works in books, newspapers and periodical publications in different form for private and domestic use.
(Copyright Act 1968 Part III div. 1, section 43C) https://www.legislation.gov.au/C1968A00063/latest/text
It seems reasonably within the bounds described by fair use, but nobody's ever tested that particular constellation of factors in a lawsuit, so there's no precedent - hand copying a book, that is.
17 U.S.C. § 107 is the fair use carveout.
Interestingly, digitizing and copying a book on your own, for your own private use, has also not been brought to court. Major rights holders seem to not want this particular fair use precedent to be established, which it likely would be, and might then invalidate crucial standing for other cases in which certain interpretations of fair use are preferred.
Digitally copying media you own is fair use. I'll die on that hill. It doesn't grant commercial rights, you can't resell a copy as if it were the original, and so on, and so forth.
There's even a good case to be made that sharing a digitally copied work purchased legally, even to millions of people, 5 years after a book is first sold - for a vast majority of books, after 5 years, they've sold about 99.99% of the copies they're going to sell.
By sharing after the ~5 year mark, you're arguably doing marketing for the book, and if we cultivated a culture of direct donation to authors and content creators, it invalidates any of the reasons piracy is made illegal in the first place.
Right now publishers, studios, and platforms have a stranglehold on content markets, and the law serves them almost exclusively. It is exceedingly rare for the law to be invoked in defending or supporting an author or artist directly. It's very common for groups of wealthy lawyers LARPing as protectors of authors and artists to exploit the law and steal money from regular people.
Exclusively digital content should have a 3 year protected period, while physical works should get 5, whether it's text, audio, image, or video.
Once something is outside the protected period, it should be considered fair game for sharing until 20 years have passed, at which point it should enter public domain.
Copyright law serves two purposes - protecting and incentivizing content creators, and serving the interests of the public. Situations where a bunch of lawyers get rich by suing the pants off of regular people over technicalities is a despicable outcome.
> there's no precedent - hand copying a book, that is
Thank you! I had looked this up myself last week, so I knew this. I had long believed, as GP does, that copying anything you own without distribution is either allowed or fair use. I wanted GP to learn as I did.
For reference, here's the US legal code in question:
Notwithstanding the provisions of sections 106 and 106A, the fair use of a copyrighted work, including such use by reproduction in copies or phonorecords or by any other means specified by that section, for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use), scholarship, or research, is not an infringement of copyright. In determining whether the use made of a work in any particular case is a fair use the factors to be considered shall include— (1) the purpose and character of the use, including whether such use is of a commercial nature or is for nonprofit educational purposes; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used in relation to the copyrighted work as a whole; and (4) the effect of the use upon the potential market for or value of the copyrighted work. The fact that a work is unpublished shall not itself bar a finding of fair use if such finding is made upon consideration of all the above factors.
The spirit seems apparent, but in practice it's been used by awful people to destroy lives and exploit rent from artists and authors in damn near tyrannical ways.
Except you said "You can't make archival copies." and didn't provide a citation. That's quite a different claim than "there exists no precedent clearly establishing your right or lack thereof to make archival copies".
Congress expressly granted archival rights for digital media. If they wanted to do the same for books they could've done so. There's no law or legal precedent allowing it.
Given all this "can't do it" is more probably accurate than "can do it". IANAL but it's not like the question is finely balanced on a knife's edge and could go either way.
Congress didn't explicitly disallow it either. You left that bit out. As such it comes down to interpretation of the existing law. We both clearly agree that doesn't (yet) exist.
> IANAL but it's not like the question is finely balanced on a knife's edge and could go either way.
I agree, but my interpretation is opposite yours. It seems fairly obvious to me that the spirit of the law permits personal copies. That also seems to be in line with (explicitly legislated) digital practices.
But at the end of the day the only clearly correct statement on the matter is "there's no precedent, so we don't know". I suppose it's also generally good advice to avoid the legal quagmire if possible. Being in the right is unlikely to do you any good if it bankrupts you in the process.
> Congress didn't explicitly disallow it either.
That's the whole point of copyright: only the owner of a copyright has the right to make copies. I don't see how it can be more explicit than that. It's a default-deny policy.
There is an archival exception for digital media, so obviously Congress is open to granting exceptions for backup purposes. They chose not to include physical media in this exception.
> only the owner of a copyright has the right to make copies.
You are conveniently omitting the provisions about fair use, which is strange since you're clearly aware of them. The only things copyright is reasonably unambiguous about are sale and distribution. Even then there's lots of grey areas such as performance rights.
You are arguing that something is obviously disallowed but have nothing but your own interpretation to back that up. If the situation was as clear cut as you're trying to make out then where is the precedent showing that personal use archival copies of physical goods are not permitted?
> They chose not to include physical media in this exception.
That's irrelevant to the current discussion, though I'm fairly certain you realize that. Congress declined to weigh in on the matter which (as you clearly know) leaves it up to the courts to interpret the existing law.
I take the contrary view.
What part of fair use pertains to making a physical copy of the complete work?
You can make copies of things. You just can’t distribute them
You're repeating upthread comments. And no, you can't. There's an archival exception for electronic media. If you want to make copies of physical media you either:
1. Can't
Or
2. Rely on fair use to protect you (archival by individuals isn't necessarily fair use)
It absolutely is fair use to copy a book for your personal archives.
The fair use criteria considers whether it is commercial in nature (in this case it is not) and the “ the effect of the use upon the potential market for or value of the copyrighted work” for which a personal copy of a personally owned book is non existent.
https://www.law.cornell.edu/uscode/text/17/107
You would get laughed at by the legal system trying to prosecute an individual owner for copying a book they bought just to keep.
> It absolutely is fair use to copy a book for your personal archives.
There's no legal precedent for this. See https://news.ycombinator.com/item?id=43356042
> the effect of the use upon the potential market for or value of the copyrighted work
A copyright holder's lawyer would argue that having and using a photocopy of a book keeps the original from wearing out. This directly affects the potential market for the work, since the owner could resell the book in mint condition, after reading and burning their photocopies.
> You would get laughed at by the legal system trying to prosecute an individual owner for copying a book they bought just to keep.
I mean maybe this is true. But the affected individual will have a very bad year and spend a ton of money on lawyers.
>No legal precedent
Why do you interpret this to mean "absolutely can't do this"? "No precedent" seems to equally support both sides of the argument (that is, it provides no evidence; courts have not ruled). The other commenters arguments on the actual text of the statute seem more convincing to me than what you have so far provided.
I was responding to https://news.ycombinator.com/item?id=43356240 which said it "absolutely is fair use".
> The other commenters arguments...seem more convincing
Because you (and I) want it to be fair use. But as I already said in my comment, it potentially fails one leg of fair use. Keeping your purchased physical copy of the book pristine and untouched while you read the photocopy allows you to later, after destroying the copies you made, resell the book as new or like-new. This directly affects the market for that book.
Do you want to spend time and money in court to find out if it's really fair use? That's what "no precedent" means.
> Do you want to spend time and money in court to find out if it's really fair use?
No. I'd much rather pirate the epub followed by lobbying for severe IP law reform. (Of course by "lobby" I actually mean complain about IP law online.)
If there's no epub available then I guess it's time to get building. (https://linearbookscanner.org/)
Multiple times in this thread you make the very confident assertion that this is not allowed, and that it is only allowed for electronic media. That is your opinion, which is fine. The argument that it is fair use is also an opinion. Until it becomes settled law with precedent, every argument about it will just opinion on what the text of the law means. But you are denigrating the other opinions while upholding your own as truth.
And whether or not I am personally interested in testing any of these opinions is completely beside the point.
You may copy, but you may not circumvent the copy protection.
Correct. For electronic media.
That's not a one-to-one analogy. The LLM isn't giving you the book, its giving you information it learned from the book.
The analogous scenario is "Can I read a book and publish a blog post with all the information in that book, in my own words?", and under US copyright law, the answer is: Yes.
> The analogous scenario is "Can I read a book and publish a blog post with all the information in that book, in my own words?"
The analogous scenario is actually "Can I read a book that I obtained illegally and face no consequences for obtaining it illegally?" The answer is "Yes" there are no consequences for reading said book, for individuals or machines.
But individuals can face serious consequences for obtaining it illegally. And corporations are trying to argue those consequences shouldn't apply to them.
> But individuals can face serious consequences for obtaining it illegally.
Can they? Who has ever faced serious consequences for pirating books in the US?
https://en.wikipedia.org/wiki/Aaron_Swartz
(Please no pedantry about how scientific papers aren't books)
Not to diminish the atrocity of what happened to Aaron, but is this a highly abnormal case of prosecutor overzeal or is it common for people to be charged and held liable for downloading and/or consuming (without distribution) of copyrighted materials (in any form) without obtaining a license?
Asking because I genuinely don't know. I believe all I've ever read about persecution of "commonplace" copyright violations was either about distributors or tied to bidirectional nature of peer-to-peer exchange (torrents typically upload to others even as you download = redistribution).
Aaron Swartz downloaded a lot of stuff. Did he publish the stuff too? That would be an infringement. But only downloading the stuff? And never distributing it? Not sure if it’s worth a violation .
>Aaron Swartz downloaded a lot of stuff.
A tiny fraction compared to the 80+ terabytes Facebook downloaded.
>Did he publish the stuff too?
No.
> Not sure if it’s worth a violation .
Exactly.
The better analogy is "can my business use illegally downloaded works to save on buying a license". For example, can you use pirated copy of Windows in your company? Can you use pirated copy of a book to compute weights of a mathematical model?
There's no analogous because the scale of it takes it to a whole different level and degree, and for all intents and purposes we tend to care about level and degree.
Me taking over control of the lemonade market in my neighbourhood wouldn't ever be a problem to anyone, a very minor annoyance; instead if I managed to corner the lemonade market of a whole continent it'd be a very different thing.
Is the book online and accessible to your eyeballs through your open standards client tool, such that you can learn from seeing it?
Let's say Windows is downloadable from Microsoft website. Can you use it for free in your company to save on buying a license? Is it ok to use illegal copies of works in a business?
Most books aren't. Unless you pay for them.
To the extent that this is how libraries function, yes.
The part of that which doesn't apply is "print copies", at least not complete copies, but libraries often have photocopiers in them for fragments needed for research.
AI models shouldn't do that either, IMO. But unlimited complete copies is the mistake the Internet Archive made, too.
I missed the part where OpenAI got library cards for all the libraries in the world.
Is having a library card a requirement for being hired over there?
I missed the part where we throw away rational logic skills
Have you never been to a public library and read a book while sitting there without checking it out? Clearly, age is a factor here, and us olds are confused by this lack of understanding of how libraries function. I did my entire term paper without ever checking out books from the library. I just showed up with my stack of blank index cards, then left with the necessary info written on them. Did an entire project on tracking stocks by visiting the library and viewing all of the papers for the days in one sitting rather than being schmuck and tracking it daily. Took me about an hour in one day. No library card required.
Also, a library card is ridiculously cheap even if you did decide to have one.
> Have you never been to a public library and read a book while sitting there without checking it out?
See my comment here: https://news.ycombinator.com/item?id=43355723. If OpenAI built a robot that physically went into libraries, pulled books off shelves by itself, and read them...that's so cool I wouldn't even be mad.
What about checking out eBooks? If you had an app that checked those out and scanned it at robot speed vs human feed, that would be the same thing. The idea that reading something that does not belong to you directly means stealing is just weird and very strained.
theGoogs essentially did that by having the robot that turned each page and scanned the pages. that's no different than having the librarian pull material for you so that you don't have to pull the book from the shelf yourself.
There's better arguments to make on why ClosedAI is bad. Reading text it doesn't own isn't one of them. How they acquired the text would be a better thing to critique. There's laws for that in place now that does not require new laws to be enacted.
> If you had an app that checked those out and scanned it
You mean...made a copy? Do you really not see the problem?
> How they acquired the text would be a better thing to critique
Well...yeah that's what I said in the comment that started this discussion branch: https://news.ycombinator.com/item?id=43355147
This isn't about humans or robots reading books. It's that robots are allowed to violate copyright law to read the books, and us humans are not.
> You mean...made a copy? Do you really not see the problem?
In precisely the same way as a robot scanning a physical book is.
If this is turned into a PDF and distributed, it's exactly the legal problem Google had[0] and that Facebook is currently fighting due to torrenting some of their training material[1].
[0] https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,...
[1] https://news.ycombinator.com/item?id=43125840
If the tokens go directly into training an AI and no copies are retained, that's like how you as a human learn — except current AI models are not even remotely as able to absorb that information as you, and they only make up for being as thick as a plank by being stupid very very quickly.
> It's that robots are allowed to violate copyright law to read the books, and us humans are not.
More that the copyright laws are not suited to what's going on. Under the copyright laws, statute and case law, that existed at the time GPT-3.5 was created, bots were understood as the kind of thing Google had and used to make web indexes — essentially legal, with some caveats about quoting too much verbatim from news articles.
(Google PageRank being a big pile of linear algebra and all, and the Transformer architecture from which ChatGPT get's the "T" being originally a Google effort to improve Google Translate).
Society is currently arguing amongst itself if this is still OK when the bot is a conversational entity, or perhaps even something that can be given agency.
You get to set those rules via your government representative, make it illegal for AI crawlers to read the internet like that — but it's hard to change the laws if you mistake what you want the law to be, with what the law currently is.
but you keep saying to read the books. there is no copyright violation to read a book. making copies starts to get into murky grounds, but does not immediately mean breaking the law.
You might be thinking of someone else.
If I spent every last second of my life in a public library, I couldn't even view a fraction of the information that OpenAI has ingested. The comparison is irrelevant. To make the comparison somehow valid, I'd have to back up my truck to a public library, steal the entire contents, then start selling copies out of my garage
Look, even I'm not a fan of ClosedAI, but this is ridiculous. ClosedAI isn't giving copies of anything. It is giving you a response it infers based on things it has "read" and/or "learned" by reading content. Does ClosedAI store a copy of the content it scrapes, or does it immediately start tokenizing it or whatever is involved in training? If they store it, that's a lot of data, and we should be able to prove that sites were scraped through lawsuit discovery process. Are you then also suggesting that ClosedAI will sell you copies of that raw data if you prompted correctly?
I'm in no way justifying anything about GPT/LLM training. I'm just calling out that these comparisons are extremely strained.
Let's say OpenAI developers use illegal copy of Windows on their laptops to save on buying a license. Is that ok to run a business this way?
Also I think it is different thing when someone uses copyrighted works for research and publishing a paper or when someone uses copyrighted works to earn money.
I don't need a card to read in the library, nor to use the photocopiers there, but it's merely one example anyway. (If it wasn't, you'd only need one library, any of the deposit libraries will do: https://en.wikipedia.org/wiki/Legal_deposit).
You also don't need permission, as a human, to read (and learn from) the internet in general. Machines by standard practice require such permission, hence robots.txt, and OpenAI's GPTBot complies with the robots.txt file and the company gives advice to web operators about how to disallow their bot.
How AI should be treated, more like a search index, or more like a mind that can learn by reading? Not my call. It's a new thing, and laws can be driven by economics or by moral outrage, and in this case those two driving forces are at odds.
We started with libraries and books, now you're moving the goalposts to websites.
Sidenote: I wouldn't even be mad if OpenAI built robots to go into all of the libraries and read all of the books. That would be amazing!
I started with libraries. OpenAI started with the internet.
The argument for both is identical, your objection is specific to libraries.
IIRC, Google already did your sidenote. Or started to, may have had legal issues.
> The argument for both is identical
How so? I don't have to pay to read most websites. To read most books I have to pay (or a library has to pay and I have to wait to get the book).
> IIRC, Google already did your sidenote
Not quite. They had to chop the spines off books and have humans feed them into scanners. I'm talking about a robot that can walk (or roll) into a library, use arms to take books off the shelves, turn the pages and read them without putting them into a scanner.
They had humans turn the pages of intact books in scanning machines. The books mostly came from the shelves of academic libraries and were returned to the shelves after scanning. You can see some incidental captures of hands/fingers in the scans on Google Books or HathiTrust (the academic home of the Google Books scans). There are some examples collected here:
https://theartofgooglebooks.tumblr.com/
> How so? I don't have to pay to read most websites. To read most books I have to pay (or a library has to pay and I have to wait to get the book).
"or" does a lot of work, even ignoring that I'd already linked you to a page about deposit libraries: https://en.wikipedia.org/wiki/Legal_deposit
Fact is, you can read books for free, just as you can read (many but not all) websites for free. And in both cases you're allowed to use what you learned without paying ongoing licensing fees for having learned anything from either, and even to make money from what you learn.
> Not quite. They had to chop the spines off books and have humans feed them into scanners.
Your statement is over 20 years out of date: https://patents.google.com/patent/US7508978B1/en
owning a copy and learning the information is not the same. you can learn 2+2=4 from a book, but you no longer need that book to get that answer. each year in school, I was issued a book for class, learned from it, returned the book. I did not return the learning.
musicians can read the sheet music and memorize how to play it, and no longer need the music. they still have the information.
But you still need to buy the sheet music first, all the AI Labs used pirated materials to learn from.
There's two angles to the lawsuits that are getting confused - the largest one from the book publishers (Sarah Silverman et al) attacked from the angle that the models could reproduce copyrighted information. This was pretty easily quelled / RHLF'd out (used to be that if ChatGPT started producing lyrics a supervisor/censor would just cut off it's response early - tried it now and ChatGPT.com is now more eloquent, "Sorry, I can't provide the full lyrics to "Strawberry Fields Forever" as they are copyrighted. However, I can summarize the song or discuss its themes, meaning, and history if you're interested!")
But there's also the angle of "why does OpenAI have Sarah Silverman's book on their hard drive if they never paid her for it? This is the lawsuit against Meta regarding books3 and torrenting, seems like they're getting away with the "we never redistributed/seeded!" but it's unclear to me why this is a defense against copyright infringement.
Not only would the musician have to buy the sheet music first, but if they were going to perform that piece for profit at an event or on an album they'd need a license of some sort.
This whole mess seems to be another case of "if I can dance around the law fast enough, big enough, and with enough grey areas then I can get away with it".
I was handed sheet music every year in band, and within a few weeks had it memorized. Books with music are also available in the library.
As a student in a school band that debated whether to choose Pirates of the Caribbean vs Phantom of the Opera for our half time show, I remember the cost of the rights to the music was a factor in our decision.
The school and library purchased the materials outright, again, OpenAI Meta et al never paid to read them, nor borrowed them from an institution that had any right to share.
I'm a bit of an anti intellectual property anarchist myself but it grinds my gears that, given that we do live under the law, it is applied unequally.
>Can I download a book without paying for it
if you have evidence that openAI is doing this with books that are not freely available, i'm sure the publishers would absolutely love to hear about it.
We know Meta has done it. These companies have torrented or downloaded books that they did not pay for. Things like the The Pile, libgen, anna's library were scraped to build these models.
>if you have evidence that openAI is doing this with books that are not freely available, i'm sure the publishers would absolutely love to hear about it.
Lol, so why are OpenAI challenging these laws?
Do you think OpenAI used fewer sources than Meta?
To support your point, lawsuits are already coming in for illegal copying of books:
https://www.theverge.com/2024/8/20/24224450/anthropic-copyri...
https://www.reuters.com/legal/litigation/google-sued-by-top-...
> Can I download a book without paying for it, and print copies of it?
No, but you can read a book, learn its contents, and then write and publish your own book to teach the information to others. The operation of an AI is rather closer to that than it is to copyright violation.
"Should" there be protections against AI training? Maybe! But copyright law as it stands is woefully inadequate to the task, and IMHO a lot of people aren't really treating with this. We need a functioning government to write well-considered laws for the benefit of all here. We'll see what we get.
But I can't legally obtain the book to read and learn from without me (or a library) paying for it. Let's start there first.
Yes, but the learning isn't constrained by those laws. If I steal a book and read it, I'm guilty of the crime of theft. You can put me in jail, try me before a jury, fine me, and put me in prison according to whatever laws I broke.
Nothing in my sentence constrains my ability to teach someone else the stuff I learned, though! In fact, the first amendment makes it pretty damn clear that nothing can constrain that freedom.
Also, note that the example is malformed: in almost all these cases, Meta et. al. aren't "stealing" anything anyway. They're downloading and reading stuff on the internet that is available for free. If you or I can't be prosecuted for reading a preprint from arXiv.org or whatever, it's a very hard case to make that an AI can.
Again, copyright isn't the tool here. We need better laws.
Sure, but OpenAI (same as Google, and Facebook, and all the others) is illegally copying the book, and they want this to be legal for them.
It's perhaps arguable whether it's OK for an LLM to be trained on freely available but licensed works, such as the Linux source code. There you can get in arguments about learning vs machine processing, and whether the LLM is a derived work etc
But it's not arguable that copying a book that you have not even bought to store in your corporate data lake to later use for training is a blatant violation of basic copyright. It's exactly like borrowing a book from a library, photocopying it, and then putting it in your employee-only corporate library.
> copyright isn't the tool here
It's not the only tool. I agree that "use for ML" should be an additional right.
What people are pissed about is that copyright only ever serves to constrain the little guys.
> If I steal a book and read it, I'm guilty of the crime of theft
You or I would never dare to do this in the first place.
One thing is downloading pirated copy and reading it for yourself and another thing is running a business based on downloading millions of pirated works.
> Meta et. al. aren't "stealing" anything anyway
They were caught downloading the entirety of libgen.
If you buy it
No, even if I steal it. I can teach you anything I know. Congress shall make no law abridging the freedom of speech, as it were.
Yes, but this is not the right model. What OpenAI wants is to borrow a book, make a copy of it, and keep using that copy, in training their models. This is fully and simply illegal, under any basic copyright law.
> Can I download a book without paying for it
Yes, you can read books without paying, if that's how it is offered.
And you can photocopy books you own for your own personal use. But again....the analogy is remembering/leaning from a book.
when it comes to real people, they get sued into oblivion for downloading copyrighted content, even for the purpose of learning. but when facebook & openai do it, at a much larger scale, suddenly the laws must be changed.
Case in point - https://en.wikipedia.org/wiki/Aaron_Swartz
Swartz wasn’t “downloading copyrighted content…for the purpose of learning,” he was downloading with the intent to distribute. That doesn’t justify how he was treated. But it’s not analogous to the limited argument for LLMs that don’t regurgitate the copyrighted content.
It does apply to people? When you read a copy of a book, you can't be sued for making a copy of the book in the synapses of your brain.
Now, if you have eidetic memory and write out large chunks of the book from memory and publish them, that's what you could be sued for.
This is not about memory or training. The LLM training process is not being run on books streamed directly off the internet or from real-time footage of a book.
What these companies are doing is:
1. Obtain a free copy of a work in some way.
2. Store this copy in a format that's amenable to training.
3. Train their models on the stored copy, months or years after step 1 happened.
The illegal part happens in steps 1 and/or 2. Step 3 is perhaps debatable - maybe it's fair to argue that the model is learning in the same sense as a human reading a book, so the model is perhaps not illegally created.
But the training set that the company is storing is full of illegally obtained or at least illegally copied works.
What they're doing before the training step is exactly like building a library by going with a portable copier into bookshops and creating copies of every book in that bookshop.
But making copies for yourself, without distributing them, is different than making copies for others. Google is downloading copyrighted content from everywhere online, but they don't redistribute their scraped content.
Even web browsing implies making copies of copyrighted pages, we can't tell the copyright status of a page without loading it, at which point a copy has been made in memory.
Making copies of an original you don't own/didn't obtain legally is not fair use. Also, this type of personal copying doesn't apply to corporations making copies to be distributed among their employees (it might apply to a company making a copy for archival, though).
> But making copies for yourself, without distributing them,
If this was legal, nobody would be paying for software.
> When you read a copy of a book
They're not talking about reading a book FFS. You absolutely can be sued for illegally obtaining a copy of the book.
> when it comes to real people, they get sued into oblivion for downloading copyrighted content, even for the purpose of learning.
Really? Or do they get sued for sharing as in republishing without transformation? Arguably a URL providing copyrighted content, is you offering a xerox machine.
It seems most "sued into oblivion" are the reshare problem, not the get one for myself problem.
This is why I think my array of hard drives full of movies isn't piracy. My server just learned about those movies and can tell me about them, is all. Just like a person!
These AI models are just obviously new things. They aren’t people, so any analogy about learning from the training material and selling your new skills is off base.
On the other hand, they aren’t just a copy of the training content, and whether the process that creates the weights is sufficiently transformative as to create a new work is… what’s up for debate, right?
Anyway I wish people would stop making these analogies. There isn’t a law covering AI models yet. It is a big industry at this point, and the lack of clarity seems like something we’d expect everybody (legislators and industry) to want to rectify.
Model cannot "learn" because it is not a human. What happens is a human obtains "a free copy" of a copyrighted work, processes it using a machine and sells the result.
> Model cannot "learn" because it is not a human.
Sure, that’s why don’t like the analogy.
> What happens is a human obtains "a free copy" of a copyrighted work, processes it using a machine and sells the result.
Right, so for example it is pretty common to snip up small bits of songs and to use in other songs (sampling). Maybe that could be an example of somewhere to start? But, these ML models seem quite different, I guess because the “samples” are much smaller and usually not individually identifiable. And really the model encodes information about trends in the sources… I dunno. I still think we need a new law.
Totally agree. Except the current administration probably will interpret things the way they see fit ...
> just like a model could
Not really. You can't multiply yourself a million times to produce content at an industrial scale.
Can I pirate books to train myself?
And when I "learn" a verbatim copy of pages of that book, then write those pages out in Microsoft Word & sell those pages its legal?
> just like a model could
It is not remotely the same, the companies training the models are stealing the content from the internet and then profiting from it when they charge for the use of those models.
> the companies training the models are stealing the content from the internet
Are you stealing a billboard when you see and remember it?
The notion that consuming the web is "stealing" needs to stop.
Are you stealing when using a pirated software to run a billion-dollar business?
The question is whether it destroys the incentive to produce the work. That is the entire point of copyright and patent law.
LLMs do indeed significantly reduce the incentive to produce original work.
We are not taking about billboards here, we are talking about copyrighted works, like books. If you want to do mental gymnastics and call "consuming" the web the act of downloading books without paying for them, then go ahead, but don't pretend the rest will buy your delusion.
On the contrary, even telling people which billboards are posted about what, and how to get to them to look at them, is "how it works".
But the courts will get to clarify (in today's news):
https://www.reuters.com/legal/news-corp-sued-by-brave-softwa...
The more literature I consume, and the more I re-draft my own attempt, the more I see the patterns and tropes with everyone standing on the shoulders of those who came before.
The general concept of "warp drive" was introduced by John W. Campbell in 1957, "Islands of Space". Popularised by Trek, turned into maths by Alcubierre. Islands of Space feels like it took inspiration from both H G Wells (needing to explain why the War of the Worlds' ending was implausible) and Jules Verne (gang of gentlemen have call-to-action, encounter difficulties that would crush them like a bug and are not merely fine, they go on to further great adventure and reward).
Terry Pratchett had obvious inspirations from Shakespeare, Ringworld, Faust (in the title!).
In the pandemic I read "The Deathworlders" (web fic, not the book series of similar name), and by the time I'd read too many shark jumps to continue, I had spotted many obvious inspirations besides just the one that gave the name.
If I studied medieval lit, I could probably do the same with Shakespeare's inspiration.
It doesn't, a real person can't legally obtain a copy of a copyrighted work without paying the copyright holder for it. This is what OpenAI is asking for: they don't want to pay for a single copy of a single book, and still they want to train their models on every single book in history (and song, and movie, and painting, and code base, and anything else they can get their hands on).
Do you know Numerical Recipes in C?
This discussion reminds me of it.
If models can learn for free, then the models (training code, inference code, training data, weights) should also be free. No copyright for anybody.
And if you sell the outputs of your model that you trained on free content, you shouldn't be able to hide behind trade secret.
>you can use that knowledge,
Did OpenAI bought one copy of each book, or did they legaly borowed athe books and documents ?
if you copy paste rom books and claim is your content you are plagiarizing. LLMs were provent to copy paste trained content so now what? Should only big Tech be excluded from plagiarizing ?
I would assume that the request is for it to apply to models in the way that it currently applies to humans.
If a human buys a movie, he can watch it and learn about its contents, and then talk about those contents, and he can create a similar movie with a similar theme.
If OpenAI buys a movie and shows it to their model, it's unclear whether the model can talk about the contents of the movie and create a similar movie with a similar theme.
Is OpenAI buying the movie, or just taking it?
Since "buying" a movie (as it currently applies to humans) is just buying a limited license to it for private viewing, can't the copyright holder opt to limit the $4.99 license terms to human viewing, and charge $4999 for an AI training license?
Or OpenAI could buy movies the way Disney does, by buying the actual copyright to the film.
> Since "buying" a movie is just buying a license to it, can't the copyright holder opt to limit the $4.99 license terms to human viewing, and charge $4999 for an AI training license?
That's exactly what already happens currently. Buying a movie on DVD doesn't give you the right to present it for hundreds of people. You need to pay for a public performance license or commercial licence. This is why a TV network or movie theatre can't just buy a DVD at Walmart and then show the movie as often as it likes.
Copyright doesn't just grant exclusive distribution rights. It grants exclusive use rights as well, and permits the owner to control how their work is used. Since AI rights are not granted by any existing licenses, and license terms generally reserve any rights not explicitly specified, feeding copyrighted works into an AI data model is a reserved right of the owner.
>Since "buying" a movie (as it currently applies to humans) is just buying a limited license to it for private viewing, can't the copyright holder opt to limit the $4.99 license terms to human viewing, and charge $4999 for an AI training license?
the Reddit data licensing model
somehow, I suspect openai didn't "buy" all of the articles, books, websites they crawled and torrented.
OpenAI didn't pay for most of the content it used.
This is basically "allow us to steal others' IP". It's hard not to treat Altman like a common thief.
Even moreso, it only applies to initial model training by companies like OpenAI not other companies using those models to generate synthetic data to train their own models.
Not only that
The model gets to use training data of all humans.
But if you use the model as training data OAI will say you’re infringing T&Cs
Yeah it’s crazy. I also suspect they might not be confident in their defense from the NYT lawsuit - if they’re found in fault then it’s going to be trouble.
It is hard to see how a court could decide that copyright does not apply to training LLMs without completely collapsing the entire legal structure for intellectual property.
Conceptually, AI basically zeros out existing IP, and makes the AI the only IP that has any value. It is hard to imagine large rights holders and courts accepting that.
The likely outcome is that courts rule against LLM creators/providers and they eventually have to settle on licensing fees with large corporate copyright holders similar to YouTube. Unlike YouTube though, this would open up LLM companies to class action lawsuits from the general public, and so it could be a much worse outcome for them.
Are there certain books that federal law prevents you from reading? Which ones?
Maybe terrorist manuals and some child pornography, but what else?
They meant "freedom to learn [through backpropagation]" probably.
Companies like this were allowed to siphon the free work of billions of people over centuries and they still want more.
It coincides with this: OpenAI calls DeepSeek ‘state-controlled,’ calls for bans on ‘PRC-produced’ models
https://techcrunch.com/2025/03/13/openai-calls-deepseek-stat...
On HN: https://news.ycombinator.com/item?id=43355779
Really funny to see Sam whining about Elon "playing unfair" while attempting to do this with DeepSeek.
I'm surprised to see only one comment here addressing the issue of Chinese AI companies just flatly ignoring US copyright and IP laws/norms. I wonder if there is a viable path where we can facilitate some sort of economic remuneration for people who write and create visual art while not giving up the game to Chinese companies.
This seems to be a thorny dilemma.
As a digital artist myself, it is quite simple. You have to sell physical objects.
The art has to be printed out and that is the art. Anyone can get an image of Salvator Mundi for free too. That is not the art, that is an image. The art is the physical object that is the painting Salvator Mundi.
It is no different than traditional art really, just at a different scale. You can buy really nice Picasso knock offs on ebay right now. Picasso himself could have made 10 copies of the Weeping Woman to sell without that much effort either. The "real" Weeping Woman is the physical painting that Picasso did not make a copy of. The others are just knock off images.
But the main problem remains. Selling art is really hard. AI art is already completely passé anyway. If anything the technology is regressing visually.
Music was in a several decades long bull market in physical media sales that crashed and burned. Now we have gone back to the pre-music media bubble days but with much better distribution and marketing channels.
Not a lot of people making a living playing ragtime piano or hoofers making a living tap dancing either.
The real amusing thing to me is you never hear scultpure artist complain that they are in the training data sets. Probably because they know it is literally just free advertising for their real art.
I'm with you 100%. A lot of people who wrote books didn't realize they were selling decorated paper, or who recorded music didn't realize they were selling wax discs and magnetic tape. With digital publishing, they were actually obsoleted.
Like you, I don't think there's good news there, though. As an e.g. writer, you have to convert to selling ideas. The way you sell an idea is that you give it away, and if hearing it makes people like you they will give you arbitrary support. For a writer at least what that means is that only original, interesting work that stands out will be valuable, and it will not be valuable to the extent that it is good, but to the extent that it appeals to an audience. You might as well be a tap dancer.
And if you aren't original, you'll never stand out amongst the AI slop, which will get better and better (and nicer and more pleasant to read and more useful and all that good shit that technology does.) I don't know if that's a bad thing. We have gone from an excess of uninteresting expression in the world to an overwhelming amount of "me too" and worthless repetition filling every crevice. I've probably published 3K words on the internet today. The number before the internet would be zero; but even back then the bookstores were filled with crap.
The market for crap has been taken by AI. And as it gets better, as the crap sea level rises, it will eventually be over most content creators' heads.
The only future for an expression market is parasocial. You're going to have to make people like you, and take care of you because they think of you as family. It's no wonder that entertainment is merging into politics.
I'm pretty sure you can't, despite what IP holders would like you to believe. Like the last 50 years of piracy have taught us, it's effectively impossible (and probably immoral) to try to charge for copying something that's "free" to copy.
It might make more sense to update copyright laws to match reality. For a music artist, for example, pennies from Spotify mean nothing -- the majority of their revenue comes from concerts/events, merchandise, and commercial licensing of their work.
> Chinese AI companies just flatly ignoring US copyright
It is increasingly tiresome to see this clearly racist bias at work when every US company doing AI has been acting the same way.
https://www.tomshardware.com/tech-industry/artificial-intell...
https://piracymonitor.org/chat-gpt-trained-using-pirated-e-b...
Why is it racist? Nationalist? Sure. But not racist.
Have you got any substance to that? So far the only copyright violation I've seen in the LLM world is Meta. (I'm not pretending they are alone though, and yes I expect Chinese companies to do that as well)
Welcome to the internet; where the only way to prevent it (considering 40% of internet traffic is automated) is to use DRM, with accessibility tools provided by client-side AI; or to create national internets with strong firewalls only allowing access to countries we have treaties with. That’s the future at this rate, and it sucks. (The status quo also sucks.)
The demand here for federal preemption of state law has nothing to do with copyright. Copyright is entirely federal level today. It has to do with preventing the use of AI to enable various forms of oppression.[1] Plus the usual child porno stuff.
What AI companies are really worried about is a right of appeal from decisions made by a computer. The EU has that. "Individuals should not be subject to a decision that is based solely on automated processing (such as algorithms) and that is legally binding or which significantly affects them."[2] This moves the cost of LLM errors from the customer to the company offering the service.
[1] https://calmatters.org/economy/technology/2024/09/california...
[2] https://commission.europa.eu/law/law-topic/data-protection/r...
> This moves the cost of LLM errors from the customer to the company offering the service.
So does that mean AI companies are going to have insurance/litigators like doctors and models will be heavily lawyered to add more extensive guardrails. I'm assuming this means not just OpenAI but any service that uses LLM APIs or open models?
For ex: If a finance business pays to use an AI bot that automates interacting with desktop UIs and that bot accidentally deletes an important column in an Excel spreadsheet, then the AI company is liable?
No, the exact opposite. This says that if the AI that a bank is paying for locks your bank account in error because your name sounds <ethnicity with a lot of locked bank accounts>, it's the banks problem to fix, not yours to just live with (entirely. You still likely have a problem).
Conversely, would you suggest that if an AI driver has a programming error and kills 20 people, that the person who reserved the car should be required to enter into a “User Agreement” that makes them take responsibility?
If it's a "self driving car" that the person "owns" - Yes. If it's a "Taxi service" that the person is using - No.
If it's a car they own, they (should) have the ability to override the AI system and avoid the accident (ignoring nuances) - therefore owning responsibility.
If it's a Taxi they would be in a position where they can't interfere with the operation of the system - therefore the taxi company owns the responsibility.
Rightly or wrongly, this model of intervention capability is what that I'd use to answer these types of questions.
If they want to avoid paying for the creative effort of authors and other artists then they should also not charge for the use of their models.
No, they should pay. The solution is not to make everything free, but cost the market rate. Somebody made these things; pay them.
... without restricting other people.
This whole mess is because society decided that restricting everyone's rights to share and access information was a sane tradeoff to make for making sure people got paid. No it is not and, so long as humans are physical, it will never be. It appears that humanity will have to get this simple fact hammered into them with every new leap in technology.
Find another work-rewarding scheme. Ensure you get paid before you release information (e.g. crowd funding or contracts with escrows). Forget about nonsensical concepts relating to "intellectual" property (information is not property). Forget recurring revenue from licensing information. You only get paid once when you do work. You are not entitled to anything more. If reality makes living off your work unworkable, do something else.
I'm glad other countries are starting to wake up and ignore this nonsense. Stop trying make something as unnatural and immoral as this work.
They should train a model on a clean dataset and copyright dataset, charge extra on the copyright model, and pay a royalty to copyright owners when their works are cited in a response.
The problem there is how are we defining "works are cited"? Also couldn't you just do the same thing done to spotify and make bot farms to generate millions of citations?
You can simply pay to everyone whose works you have used for training, every time a model processes a request.
I like this and agree. It should be opt-in. I almost feel as if it should be something exciting and rewarding.
But who should pay? The model developers? Training models is a cost center. And what about open source AI, should we legislate it out of existence?
How about the AI providers? they operate on thin margins, and make just cents a million tokens. If one provider is too expensive, users quickly switch.
Maybe the users? Users derive the lion share of benefits from AI. But those benefits are hard to quantize.
Maybe a blanket tax? That would simplify things, but would put all creatives on a quantitative rather than qualitative criteria.
I think generative AI is the worst copyright infringement tool ever devised. It's slow, expensive and imprecise. On the other hand copying is fast, free and perfect. I think nobody can, for science, regurgitate a full book with AI, it won't have fidelity to the original.
The real enemy of any artist is the long tail of works, sometimes spanning decades, that they have to compete against. So it's other authors. That is why we are in an attention economy, and have seen the internet enshittified.
The most creative part of internet ignores copyright royalties. From open source, to wikipedia, open scientific publication and even social networks, if everyone demanded royalties none of them would be possible.
> The most creative part of internet ignores copyright royalties. From open source, to wikipedia, open scientific publication and even social networks, if everyone demanded royalties none of them would be possible.
Notably, in all of these cases the people involved consent to participating.
>> The real enemy of any artist is the long tail of works, sometimes spanning decades, that they have to compete against.
Had to check this wasn’t sama.
You seriously believe the real enemy of artists is other artists? Not the guys making billions and trying to convince us “the computers are just reading it like a human”?
Funnily, OpenAI also calls for the ban of their free and open-weight Chinese competitors DeepSeek and Qwen.
i really don't understand this argument. at which point is it violating copyright versus an intelligence learning and making content the same way as humans?
it was living cells, but they worked as transistors, would it be ok?
it was whole-brain emulation on silicon transistors, would it be ok?
it was a generative AI similar to what we have today, but 100x more sentient and self aware, is that ok?
if you locked a human in a room with nothing but tolkien books for 20 years, then asked them to write a fantasy novel, is that ok?
All art is built on learning from previous art. I don't understand the logic of it being a computer so suddenly now it's wrong and bad. I also don't understand general support of intellectual property when it overwhelmingly benefits the mega wealthy and stifles creative endeavors like nothing else. You art isn't less valuable just because a computer makes something similar, in the same way it's not less valuable if another human copies your style and makes new art in your style.
> I don't understand the logic of it being a computer so suddenly now it's wrong and bad
My answer to this is one I've written already before: https://news.ycombinator.com/item?id=42720749
You "really don't understand" the difference? Do we need to spell out that these systems aren't human artists simply looking at paintings and admiring features about them? They are Python programs running linear algebra libraries, sucking in pixels from anywhere they can find them, and then being used by corporations with billion dollar valuations to increase investor/shareholder value at the expense of the people who provided the artwork to train the systems - people who, as you already know, are NOT paid for providing their work, and who never CONSENTED to having their work used for such a purpose. Now do you "understand the difference"?
AI is a new thing. It's OK to say you don't want it, that it's a threat to livelihoods. But it's a mistake to use these kinds of arguments, that are predicated on such narrow points that overlap so much with human brains.
It's going to be a threat to my career, soon enough — but the threat it poses to me exists even if it never read any of my blog posts or my github repos. Even if it had never read a single line of ObjC or Swift.
> Do we need to spell out that these systems aren't human artists simply looking at paintings and admiring features about them?
In a word, yes.
In more words: explain what it would take for an AI to count as a person — none of what you wrote connects with what was in the comment you replied to.
You dismiss AI as "python": would it help if the maths was done as the pure linear amplification range of the quantum effects in transistors?; you dismiss them as "sucking in pixels from anywhere they can find them" like humans don't spend all day with their eyes open; you complain "corporations with billion dollar valuations to increase investor/shareholder value at the expense of the people who provided the artwork to train the systems" like this isn't exactly what happens with government funded education of humans.
I anticipate that within my lifetime it will be possible for a human brain to be preserved on death, scanned, and the result used as a full brain sim that remembers what the human remembered at the the time of death. Would it matter if the original human had memorised Harry Potter end-to-end and the upload could quote it all perfectly? Would Rowling get the right to delete that brain upload?
I'm following a YouTube channel where they're growing mouse neurons on electrode grids to train them to play video games. It's entirely plausible, given the current rate of progress, that 15 years from now, GPT-4 could be encoded onto a brain organoid the size of a living mouse's brain — does it magically become OK then? And in 30 years, that same thing as an implant into a human?
The threat to my economic prospects is already present in completely free models whose weights are given away and cannot avail the billion-dollar corporations who made them. I can download free models and run them on my laptop, outputting tokens faster than I can read them for an energy budget lower than my own brain, corporations who made those models don't profit directly by me doing this, and if those corporations go bankrupt I can still run those models.
The risk to my economic value is not because any of these "stole" anything, but because the models are useful and cheap.
GenAI art (and voice) is… well, despite the fact I will admit to enjoying it privately/on free content, whenever I see it on products or blog posts, or when I hear it in the voices on YouTube videos, it's a sign the human behind it has zero budget and therefore whatever it is I don't want to buy it. People already use it because it's cheap, it's a sign of being cheap, signs of cheap are a proxy of generally poor quality.
But that's not going to save my career, nobody's going to decide to boycott all iPhone apps that aren't certified "made by 100% organic grass-fed natural humans with no AI assistance".
So believe me, I get that it's scary. But the arguments you're using aren't good ones.
No one said they "don't want it".
No one said "it's scary".
No one is "dismissing them".
It seems like you're arguing against some other person you've made up in your mind. I use these systems every single day, but if you don't understand the argument about consent and the extremely obvious difference between Python programs and humans that I already pointed out, then no one can help you. I'll keep making these arguments, because they are good ones, and they are obvious to any human being who isn't stuck in tech-bro fairy land blabbering about how human consciousness is completely identical to Python linear algebra libraries when any 6 year old child knows with certainty they are not.
> In a word, yes.
This is, frankly, embarrassing.
> No one said they "don't want it".
Your own words suggest this. Many others are more explicit. There are calls for models to be forcibly deleted. Your own statements here about lack of consent are still in this vein.
> No one said "it's scary".
Many, including me, find it so.
> No one is "dismissing them".
You, specifically you, are — "feeling or showing that something is unworthy of consideration".
> if you don't understand the argument about consent and the extremely obvious difference between Python programs and humans that I already pointed out, then no one can help you.
Consent is absolutely an argument I get. It's specifically where I'm agreeing with you.
The other half of that…
Python, like all programming languages, is universal. Python programs can implement physics, so trying to use the argument "because it's implemented on silicon rather than chemistry" is a distinction without a difference.
Quantum mechanics is linear algebra.
> I'll keep making these arguments, because they are good ones, and they are obvious to any human being who isn't stuck in tech-bro fairy land blabbering about how human consciousness is completely identical to Python linear algebra libraries when any 6 year old child knows with certainty they are not.
(An example of you "dismissing" AI).
Then you'll keep being confused and enraged about why people disagree with you.
And not just because you have a wildly wrong understanding of what 6 year olds think about. I remember being 6, all the silly things I believed back then. What my classmates believed falsely. How far most of us were from understanding what algebra was, let alone distinguishing linear algebra from other kinds.
I've got a philosophy A-level, which is enough to know that "consciousness" is a completely unsolved question and absolutely nobody agrees what the minimum requirements are for it. 40 different definitions, we don't even all agree what the question is yet, much less then answer.
But I infer from you bring it up, that you think "consciousness" is an important thing that AI is missing?
Well perhaps it is something current AI miss, something their architecture hasn't got — when we can't agree what the question is, any answer is possible. We evolved it, but just because it can pop up for no good reason doesn't mean it must be present everywhere. (I say much the same to people who are convinced AI must have it: we don't know). So, what if machines are not conscious? Why does that matter?
And you've not answered one of my examples. To repeat:
I'm following a YouTube channel where they're growing mouse neurons on electrode grids to train them to play video games. It's entirely plausible, given the current rate of progress, that 15 years from now, GPT-4 could be encoded onto a brain organoid the size of a living mouse's brain — does it magically become OK then? And in 30 years, that same thing as an implant into a human?
I don't think that is meaningfully distinct, morally speaking, from doing this in silicon. Making the information alive and in my own brain makes it not python, but all the consent issues remain.
there are some free models out there from both chat companies and open source
It probably needs to be a law not an executive order but I don't hate the idea.
States have the power to make it prohibitively expensive to operate in those states, leaving people to either go to VPNs or use AI's hosted in other countries where they don't care if they're not following whatever new AI law California decides to pass. And companies would choose just to use datacenters not in the prohibitive states and ban ips from those states.
Course if a company hosts in us-east-1, and allows access from California, would the inter state commerce clause not take effect and California would have no power anyways?
> Course if a company hosts in us-east-1, and allows access from California, would the inter state commerce clause not take effect and California would have no power anyways?
California can't legislate how they serve a customer in a different state. They would have to comply when serving California customers within the state of California, regardless of where the dc is located. I.E. Under the CCPA it doesn't matter where my data is stored, they still have to delete it upon my request.
>California can't legislate how they serve a customer in a different state. They would have to comply when serving California customers within the state of California, regardless of where the dc is located. I.E. Under the CCPA it doesn't matter where my data is stored, they still have to delete it upon my request.
I know this is what California thinks, I just personally don't see how this isn't inter state commerce.
It is, of course, but that doesn't mean California can't regulate it; simply that federal laws take precedence.
If states couldn't regulate interstate commerce taking place in their own states, they effectively couldn't regulate any commerce because court decisions have found that essentially all economic activity, even growing food for your own consumption, falls under the banner of interstate commerce.
> even growing food for your own consumption
Hey I know this one! In case anyone is interested, here's the case:
https://en.wikipedia.org/wiki/Wickard_v._Filburn
Unless their a superseding federal law, yeah, California can successfully prosecute businesses for breaking the laws within its' jurisdiction.
Your argument for regulation is...reasons why it works out without regulation, and is already covered by existing regulations?
Granted the "regulation" I'm referring to above is a law or EO to block California's regulation, and I don't support California's regulation either. But I believe regulations should only exist when there's no better alternative, because they usually have unintended consequences. If it's true that OpenAI can basically just leave California, the better alternative for the government may be doing nothing.
Are you advocating to take the power relegated to the states away from the states and give it to the federal government in direct violation of the Constitution of the United States?
How is this not directly tied to interstate commerce (and copyright law) and thus under Congress' enumerated powers?
https://en.wikipedia.org/wiki/Commerce_Clause
Interstate commerce clause by itself doesn't prevent it; it merely gives Congress the ability to override the state laws if Congress deems it necessary.
> leaving people to either go to VPNs
.. which is the prevailing situation for people dealing with state-by-state age verification at the moment.
I think there will be a huge change in public perception of copyright in general, as increasingly more people realise that everything is a derivative work.
Most people find the traditional explanation for copyright, "everything emerges from the commons and eventually returns to the commons, so artists and creators should be entitled to ownership of intellectual property for a limited amount of time." The problem becomes when "limited" is stretched from 5 years from moment of publishing, say, to an artist's life + 150 years. Most people find the former reasonable and the latter ridiculous.
The problem is (almost) everything the US has a competitive edge on is based on copyright.
I'm in Europe, and during the past few weeks with these tariff upsets, I kinda realized the only thing I use or own that are US-made are computers and software.
If someone could hack into Apple, download the schematics of their chips and the source for their OS, and then post it on the internet, after which a third party could sell commercial products based on said data, there wouldn't be a software/hardware economy around of very long.
[dead]
Relevant (I don't know why the article doesn't link to them directly): https://openai.com/global-affairs/openai-proposals-for-the-u... https://cdn.openai.com/global-affairs/ostp-rfi/ec680b75-d539...
Thank you. I am disappointed that almost none of the comments here discuss the OpenAI proposals on the merits. I do hope that the federal government heeds most of these ideas, particularly recognizing that training a model should be fair use.
Here's a direct link to the article: https://www.bloomberg.com/news/articles/2025-03-13/openai-as...
the GOP: "states' rights! states' rights!!"
also the GOP: "not those rights! only the rights we want to share"
The unspoken part was always the states' rights to do what. Which of course was all about maintaining the economic differences that they preferred. Which, you know...
[flagged]
How big was the check that came with this request? For the right price their logo can go on the rose garden lawn.
gpt-47 costs at least $1m/tok
we are working on <impossible problem stumping humanity>. We have considered the following path to find a solution. Are we on the right track? Only answer Yes or No.
(1 week of GPUs whirring later)
AI: Your
(that will be $1 million, thank you)
Free market y all !
OpenAI calls DeepSeek 'state-controlled,' calls for bans
https://news.ycombinator.com/item?id=43355779
It is the free market though. That's what inevitably happens when locks put in place in the past to prevent rampant wealth and power concentration get blown up. A truly free market always devolves into a bunch of oligarchs gaining too much power and dictating their laws.
The original link has apparently been changed to a content-free Yahoo post, for some reason only known to "moderators", which makes existing comments bizarre to read.
The original link pointed to this OpenAI document:
https://openai.com/global-affairs/openai-proposals-for-the-u...
It contains this remarkable phrase:
> For innovation to truly create new freedoms, America’s builders, developers, and entrepreneurs—our nation’s greatest competitive advantage—must first have the freedom to innovate in the national interest.
I don't think people need "new freedoms". They need their existing freedoms, that are threatened everywhere and esp. by the new administration, to be respected.
And I would argue that America's greatest strength isn't their "builders"; it's its ability to produce BS at such a massive scale (and believe in it).
This OpenAI "proposal" is a masterpiece of BS. An American masterpiece.
It seems really weird that Congress isn’t making a law about this. Instead, we’re asking courts to contort old laws to apply to something which is pretty different from the things they were originally intended for. Or just asking the executive to make law by diktat. Maybe letting the wealthiest and most powerful people in the world will work out. Maybe not.
This issue is too complicated for Congress to handle? Too bad. Offloading it to the president or a judge doesn’t solve that problem.
The world is becoming more and more complicated and we need smart people who can figure out how things work, not a retirement community.
I've heard so many ridiculous stories about 'AI' that I'm at the point where I initially took this to mean the LLM and not the company had made the request.
I expect that interpretation won't seem outlandish in the future.
> I've heard so many ridiculous stories about 'AI' that I'm at the point where I initially took this to mean the LLM and not the company had made the request.
Only through its human bots
> I expect that interpretation won't seem outlandish in the future.
AI human manipulation could be a thing to watch out for.
Am I the only one who thinks “freedom to learn” is an anthropomorphising euphemism?
Steal content and then ask god for forgiveness. Works like a charm :)
So, why not pay the price of each copyrighted work ingested by the model?
They do have to pay that.
But if it's not fair use, they'd need to negotiate a custom license on top of that, for every single thing they use.
Weird, I haven't gotten a check from OpenAI, Meta, Anthropic, or any other AI company for any of my works yet, nor have any of my writer, musician, developer, or photographer friends who also self-publish without permissive licenses that would allow for such use. Are you sure they have to compensate creators for the material they use for training, or are you misunderstanding how copyright licensing works in the United States? Because all of us put our contact methods on our works so folks can properly license it for use, yet none of us have had anyone reach out to do so for AI training - almost like there's a fundamental mismatch between what AI companies are willing to pay (nothing), and what humans who created this stuff would like to receive for its indefinite use in training (what these AI companies claim are) trillion-dollar businesses of the future that will revolutionize humanity (i.e., house money).
If it's fair use for OpenAI to steal content wholesale without fair compensation (as decided by the creator, unless they have granted the management of that license to a third-party) just to train AI models, then that opens a Pandora's Box where anyone can steal content to train their own models, creating an environment where copyright is basically meaningless. On the other hand, making it not fair use opens a different Pandora's Box, where these models have to be trained in fundamentally different ways to create the same outcome - and where countries like China, who notoriously ignore copyright laws, can leap ahead of the industry.
Almost like the problem is less AI, and more overly broad copyright laws. Maybe the compromise is slashing that window back down to something reasonable, like twenty to fifty years or so, like how we deal with patents.
> Weird, I haven't gotten a check from OpenAI, Meta, Anthropic, or any other AI company for any of my works yet, nor have any of my writer, musician, developer, or photographer friends who also self-publish without permissive licenses that would allow for such use.
Can you tell me the specific number of dollars that would be?
I interpreted "pay the price of each copyrighted work" as the sale price, a criticism of things like meta's piracy.
If there was a mandatory licensing regime that AI could use, and there was an exact answer for what the payment would be, I think it might make sense to use "the price" to talk about that license. But right now in today's world it's very confusing to use "the price" to talk about a hypothetical negotiation that has not happened yet, where many many works would never have a number available.
Where do they have to pay that?
Where have they paid for each artwork from DeviantArt, paheal, etc that they trained Stable Diffusion on?
Where have they paid for each independent blog post that they trained ChatGPT on?
Yes, they've made a few deals with specific companies that host a large amount of content. That's a far cry from paying a fair price for each copyrighted work they ingest. Nearly everything on the Internet is copyrighted, because of the way modern copyright works, and they have paid for nearly none of it.
Also, openai only started making deals (and mostly with news publishers) after the NYT lawsuit.
https://www.npr.org/2025/01/14/nx-s1-5258952/new-york-times-...
They didn't even consider doing this before. They still, as far as I know, haven't paid a dime for any book, or art beyond stock photography.
Lawsuit is still ongoing, if openai loses it might spell doom for legal production and usage of LLMs as a whole. There isn't enough open, free data out there to make state of the art AI.
> There isn't enough open, free data out there to make state of the art AI.
But there are models trained on legal content (like Wikipedia or StackOverflow). Also, no human needs to read millions of pirated books to become intelligent.
> But there are models trained on legal content (like Wikipedia or StackOverflow)
Literally all of them are trained on wikipedia and SO. But /none/ of them are /only/ trained on wikipedia and SO. They need much more than that.
> Also, no human needs to read millions of pirated books to become intelligent.
Obviously, LLM architectures that were inspired by GPT 2/3 are not learning like humans.
There has never been anything remotely good in the world of LLM that could have been said to have been trained on a moderate, more human scoped amount of data. They're all trained on trillions of tokens.
Models trained on less than 1T are experimental jokes that have no real use to provide.
You'll notice even so called "open data" LLMs like Olmo are, in fact, also trained on copyrighted data, datasets like Common Crawl claim fair use over anything that can be accessed from a web browser.
And then there's the whole notion of laundered data by training on synthetic data generated by another LLM. All the so-called "open" LLMs include a very significant amount of LLM-generated data. If you agree to the notion that LLMs trained on copyrighted work are a form of IP infringement and not fair use, then training on their output is just data laundering and doesn't fix the issue.
I mean, China won't have to, so the AI race would still be over.
I don't think people realize how much money has been dumped into other Chinese AI models besides Deepseek, even American VCs like Sequoia are getting involved
https://en.wikipedia.org/wiki/Moonshot_AI
https://en.wikipedia.org/wiki/Baichuan
https://en.wikipedia.org/wiki/MiniMax_(company)
https://en.wikipedia.org/wiki/Zhipu_AI
It kinda makes sense to spend your dollars where they can actually get used.
So make the AI models public goods, developed by the government. Why should companies be getting rich on everyone else's work?
Well, I guess most copyright we're talking about here is IP owned by very wealthy corporations, to wit:
https://www.pbs.org/newshour/economy/column-intellectual-pro...
So I'm not sure that it would really change the status quo for a different group of already rich people to profit off of art created largely by the working poor and owned largely by another group of already rich people.
I guess if you think the government can accomplish what you propose, sure. But seems like that's not going to happen. Except maybe in China, and it sounds like that might be even worse for everyone.
Thus, it really seems like there's a solid point here that abandoning copyright to allow private investors to get rich stealing art from other rich people who really just stole it from poor people anyways is better than not doing that.
> So I'm not sure that it would really change the status quo for a different group of already rich people to profit off of art created largely by the working poor and owned largely by another group of already rich people.
I did not propose that any rich people profit off of it. It should be a public good.
> I guess if you think the government can accomplish what you propose, sure. But seems like that's not going to happen. Except maybe in China, and it sounds like that might be even worse for everyone.
Throw it at universities, fund it and organize it well. They can take it from where we are right now.
> I guess most copyright we're talking about here is IP owned by very wealthy corporations,
They're mostly the entities that can afford to enforce their copyrights. Copyright is for the wealthy, unfortunately.
What could possibly go wrong giving the same government that is currently deleting information from websites including references to the “Enola Gay” control over models?
The US needs to fix its government anyway. If they cannot, nothing else matters.
Don’t forget that the pearl clutching is on both sides.
It was Tipper Gore that thought the world would come to an end because of rap music.
Let’s just not give the government any more power in our lives than necessary.
The current regime is in a fascist power grab and you're both-sidsing some random-ass second lady from a generation ago? Yeah wonder why we can't have effective government.
> Let’s just not give the government any more power in our lives than necessary.
Let's stop giving corporations all of the power and get a government that actually works for us.
It doesn’t matter. You should never trust the government with more power than absolutely necessary.
Because eventually, the other side will do something you don’t like.
This is the government people voted for.
The government has a “monopoly on violence”. No corporation can force you to do anything, take away your freedom (the US has the highest incarceration rate of any democracy) or your property (see civil forfeiture). I can much more easily avoid a corporation than the government.
> No corporation can force you to do anything, take away your freedom (the US has the highest incarceration rate of any democracy) or your property (see civil forfeiture). I can much more easily avoid a corporation than the government.
Avoid Tesla, and give me the steps you follow.
> Because eventually, the other side will do something you don’t like.
Yeah they might do equally egregious things like:
1) staging a fascist takeover of the government
2) a powerless idiot's idiot wife might dislike a music genre 30 years ago
The problem isn't government, it's a populace that is alergic to useful government.
You’re overindexing on Trump. The US being a police state with the highest incarceration rate in the world, police corruption, civil forfeiture, etc didn’t start with Trump.
Tell me one corporation that you can’t get away from? Now tell me how you avoid an over powerful government?
Why would you want to give a government with the history of the US more power?
Trump was elected fair and square. If you want to blame anyone - blame Americans. Despite the bullshit that the Democrats spout about “this isn’t who we are”. This is exactly who we are. Why would I want to give the government more control? Do you think the Democrats would be any more hands off when it comes to content?
> Trump was elected fair and square. If you want to blame anyone - blame Americans. Despite the bullshit that the Democrats spout about “this isn’t who we are”. This is exactly who we are.
I blame, primarily, the corporate takeover of government, punctuated by Citizen's United and everything that came after, and a couple of generations of a Republican party who have no goal other than setting out to prove that government is the enemy to take the heat off of their corporate masters.
> Tell me one corporation that you can’t get away from? Now tell me how you avoid an over powerful government?
I already did: avoid Tesla, show me how it's done. You can't, because the asshole in charge bought enough of the government to be in control. That's what happens when you have corporations with unchecked power, which is the inevitible conclusion of a powerless government.
You think you give the corporations all of the money and they're going to be bound by some tiny neutered government? No, they'll just buy it and then do what they want.
> I blame, primarily, the corporate takeover of government, punctuated by Citizen's United and everything that came after
Try again, Trump famously didn’t have much corporate backing in 2016. Corporations wanted a standard Republican. He didn’t have any more money than the DNC. He is what the majority of the American people wanted.
> You think you give the corporations all of the money and they're going to be bound by some tiny neutered government?
Again, tell me how a corporation can shoot me with impunity, take my property without due process, literally take away my freedom or stop me because I “fit the description” or look like I don’t belong in a neghborhood where I know I my income was twice the median income in the county?
You worry about some theoretical abstract corporate power, I worry about jack booted thugs with the full force of the government behind them
> Try again, Trump famously didn’t have much corporate backing in 2016. Corporations wanted a standard Republican. He didn’t have any more money than the DNC. He is what the majority of the American people wanted.
I thought you said it didn't start with Trump?
And your premise is wrong anyway, Trump had plenty of corporate support in 2016 and more in 2024, he just had some token resistance from big corps relative to others, they got over it quickly and it was never more than just for show.
> Again, tell me how a corporation can shoot me with impunity, take my property without due process, literally take away my freedom or stop me because I “fit the description” or look like I don’t belong in a neghborhood where I know I my income was twice the median income in the county?
By just doing it, what you think they can't find guns and assholes who need money or are evil? You think they can't find ways to cheat you out of your property or life? Who's going to stop them?
You tear down the government, the corporations will make their own in their own image. The government is _supposed_ to be there, it's the people coming together to do the shared work of society for the common good.
It just has to be a good government, the people have to fight for that. Half of our people fight to tear it down instead and the other half barely know what the hell they want.
> You worry about some theoretical abstract corporate power, I worry about jack booted thugs with the full force of the government behind them
They're the same people. Look at our government. Theoretical abstract, what are you talking about, it's the literal nazi shithead in the whitehouse and all the rest of his enablers.
China also doesn't have to care about the will of its people, human rights, freedom of speech, and a bunch of other pesky things that get in the way of doing whatever the fuck you want to people for personal gain.
Seems like it'd be bad to let them win then.
We can let them dominate us but feel smug and morally superior in the process.
Neither does mine so we’re on equal footing!
Funfact: The reason Hollywood is in California is because Edison’s camera patents didn’t apply there. Altman might actually have a good point – if your competition doesn’t care about your laws, you’re in trouble.
https://www.mentalfloss.com/article/51722/thomas-edison-drov...
The full 15-page proposal from OpenAI to the White House:
https://cdn.openai.com/global-affairs/ostp-rfi/ec680b75-d539...
Regulations were convenient to slow down competitors—you know, the ones you heavily lobbied for—it was all great. But now that you've done your part and others are finally catching up, suddenly it's all about easing restrictions to protect your lead? Beautiful.
JD vance seems to be quite aware of OpenAIs meta strategy so I wouldn't be surprised if this is declined (ie semi specifically aimed at something they want to force them to comply with).
This administration has shown that if you grease the skids well enough, you can get things your way.
“We want more regulation! AI is too dangerous, too powerful for any person off the street to use!”
Meanwhile exact same guy in Europe:
“Less regulation! You are strangling our innovation!”
Can anyone also use copyrighted source code, e.g. from OpenAI?
DeepSeek/whoever training on OpenAI outputs is ... bad.
OpenAI training on every content creator's outputs is ... good.
You say that, but the reality is that all open models rely heavily on synthetic data generated with ChatGPT. They don't like it, but it happens anyway. You can't really protect a public model from having its outputs exfiltrated.
This started in 2023 when LLaMA 1 was released, and has been going strong ever since. How strong? there are 330K datasets on HuggingFace, many of them generated from OpenAI.
And where did OpenAI get the data to generate those datasets?
Did you miss the sarcasm?
The right loves states rights, unless it conflicts with their personal preferences.
Well funded companies want regulations because it stops up and coming companies from competing. Now they want exemptions from those regulations because it would be too restrictive.
First they should investigate the fake suicide!
Still not convinced how a model training on data, is not the same as a human looking at that data and then using it indirectly as it’s now a part of his knowledge base
The scale is different.
Should the rules for owning a gun which can fire 1 round per hour be the same as a gun which can fire 1 million rounds per hour?
OpenAI (2023): Don't even bother trying to compete against us, you will not win and you will lose.
OpenAI (2025): pLeAse bAn dEEpSeEk!!11!, bAn poWerFulL oPen wEight Ai mOdeLs!!1
> OpenAI has asked the Trump administration to help shield artificial intelligence companies from a growing number of proposed state regulations if they voluntarily share their models with the federal government.
That sounds like corruption
I'm shocked
Funny how fast those AI prophets went from:
- The government need to prepare because soon they will need to give money to all those people we made obsolete and unemployed. And there is nothing to stop us.
to:
- We need money from the government to do that thing we told you about.
This needs to be repeated, over and over:
These grifters started with one narrative, and have done a full 180.
The Internet --> Web 2.0 --> algorithmic feeds progression has destroyed our collective ability to focus and to retain any memories (and the media being goldfish-like doesn't help either).
Move to a different state.
Is it so unrealistic? Many companies and people leave beautiful Cali due to over-regulation.
Wonder if the rules will protect the information providers or the consumers.
I really hope OpenAI fails in doing this. If this usage is allowed, then it means that there is no path towards me being OK with publishing anything on the internet again.
I'm assuming this has zero effect on non-US AI companies?
He should have offered for every purchase of OpenAI services, a portion would be used to purchase TrumpCoin. That would have been a more effective bribe.
or teslers!
All these whiney creatives who feel threatened just need to suck it up and deal with it. Even if they got their way in the US, another app in another country will just use their data without permission. All they are doing is ensuring those apps wouldnt be American.
What do you mean by "deal with it?" Because to me it looks like they're dealing with it by joining in solidarity with other artists, raising awareness about how this affects them and us and lobbying for regulation they think would improve the situation.
I guess you meant they should deal with it by just letting it happen to them quietly and without a fight? Is that how you would deal with your livelihood being preventably and unnecessarily destroyed for someone else's enrichment? Maybe, but artists are not overall as cowardly as programmers.
> All they are doing is ensuring those apps wouldnt be American.
Maybe these whiny americans just need to suck it up and deal with it?
Deal with it as in them accepting there is nothing they can do to stop it. Other countries arent going to follow whatever laws they manage to get in place in the US.
How would awareness and regulation in US solve this worldwide problem?
Do you not enjoy being paid for your work?
I'm a business owner. I love generative AI.
For those who have used the image generation models and even the text models to create things, there is no way you can look at the Disney-look-alike images and NOT see that as copyright infringement...
IANAL but for copyright infringement you have to distribute it, and AI image generation is like asking someone to paint a cartoon mouse in a wall of your living room
Is it not more like lossy image decompression?
Just because the jpeg you're distributing isn't the same bytes as the one I have copyright to doesn't mean you're not infringing my copyright. You're still taking my copyrighted image, running it through an algorithm, and then distributing the results.
I think that's up to the courts to decide on a case by case basis, just like with human-produced content someone alleges as infringing.
Humans of course create things by drawing from past influences, and I would argue so does AI.
In fact, I would say that nothing and nobody starts out original. We need copying to build a foundation of knowledge and understanding. Everything is a copy of something else, the only difference is how much is actually copied, and how obvious it is. Copying is how we learn. We can't introduce anything new until we're fluent in the language of our domain, and we do that through emulation.
So to me the legal argument of AI vs copyright, comes down to how similar a particular result is from the original, and that's a subjective call that a judge or jury would have to make.
It is interesting that it is not the Hollywood/Music/Entertainment copyright lobby (RIAA, MPAA etc.) that is lobbying US states to go after OpenAI and other American AI companies.
It's the New York Times and various journalist and writers' unions that are leading the charge against American AI.
American journalists and opinion piece writers want to kill American AI and let China and Russia have the global lead. Why? Have they taught about the long consequences of what they are doing?
I think content creators want to be compensated for their work that's being used for commercial purposes.
I think you're framing it in a way that makes it seem like they don't want to be compensated for working, they just want to stop other people from starting a new industry, which doesn't seem like a good faith understanding of the situation.
Business and tech idea: make it so that it's like Spotify for AI.
Everytime an answer is drawn from "certain learned weights," make it so that the source of that knowledge is paid cents per volume.
While that is cool in principal, I'm not sure how well it'd actually work in reality. First, there is the technical challenge. My understanding is the weights can have a lot of fluctuation, especially early on. How do we actually determine how much influence a given piece of content has on the final weights?
Then if we get past that, my suspicion is that you could game the training. Like have as much of the process happen via public domain sources or pay-once licenses. That would cover a lot of the fundamental knowledge and processes. Then you could fine-tune on copyrighted data. That might actually make it easier to see how much influence on the final weights that content has, but is also would probably be a lot less influence. There's a big difference between a painting of an apple being the main contribution to the concept of "apple" in an image model, vs mention of that painting corresponding to a few weights that just reference a bunch of other concepts that were learned via open data.
> First, there is the technical challenge. My understanding is the weights can have a lot of fluctuation, especially early on. How do we actually determine how much influence a given piece of content has on the final weights?
Well, Bing AI already knows where it drew the information from and cites sources; so it would be a matter of making the deal.
How to enforce it? that's the main question I reckon.
> Then if we get past that, my suspicion is that you could game the training. Like have as much of the process happen via public domain sources or pay-once licenses.
I agree.
Running it is probably costly, but there are papers on "influence analysis": Training data influence analysis and estimation: a survey (https://link.springer.com/content/pdf/10.1007/s10994-023-064...)
It would be easier to negotiate a fixed cost on using a particular datum per training of a model.
yeah this sounds like it'll be reliably enforced for sure
With these proposed rules, American AI may be able to surpass the AI of China and Russia, but will American creators and ordinary people be happy with this, because all the money will end up in the pockets of Sam Altman and other billionaires, and ordinary creators will be left with nothing?
The market for creative works breaks down as follows. You have pay-in-advance arrangements such as patronage, commissioning, and so on. Those have been around forever. And then you have pay-if-you-want-it arrangements which only make economic sense because we have laws that grant monopolies to the creators of the work over the market for copies of that work.
The first arrangement is very clearly a labor arrangement; but the second one is a deliberate attempt to force artists to act like capitalists. More importantly, because art is now acting like capital, it provides an obvious economic instinct to centralize[0]. So you get industrialized artistic production under the banner of publishing companies, whose business model is to buy out the copyright to new creative works and then exploit them.
What AI art does is transfer money from the labor side of art to the capital side of art. The MAFIAA[1] wants AI art to exist because it means they can stop paying artists but still make royalties off selling licenses to the AI companies. This increases their profit margins. Meanwhile, the journalists can't sell you old news; they need to spend lots of time and money gathering it every day. That business model only works in a world where writers are scarce, not just the writing itself being artificially scarce.
[0] We can see this with cryptocurrency, which is laughably centralized despite being a deliberate attempt to decentralize money.
[1] Music and Film Industry Association of America, a hypothetical merger of the RIAA and MPAA from a satirical news article
> It is interesting that it is not the Hollywood/Music/Entertainment copyright lobby (RIAA, MPAA etc.)
Is it interesting? They hate the people who produce their product and are desperate to replace them with machines. Note that their unions also hate AI, and it was a central reason for for the Writer's Guild SAG-AFTRA strike, since you're bringing up the NYT unions.
The NYT also stands to benefit not an iota from AI. It probably causes a burden because they have to make sure that their awful long-in-the-tooth editorial columnists aren't turning in LLM slop. It is entirely a negative for people who generate high quality content the hard way.
clear attempt circumnavigate the clear copyright violations of the AI era and kick the can down the road.
Sounds great!
Closed ai
Write a law. We don't have an emperor.
Are you sure about that?
If AI actually reaches human-level intelligence in the next few years, the Pentagon and congress are going to start yelling about National Security and grabbing control over the whole industry, so I doubt state regulations are going to matter much anyway.
(And if it doesn't reach human-level intelligence, then OpenAI's value will pop like a balloon.)
I just canceled my OpenAI subscriptions over this.
“Please help us. We’re only a little business worth $157 billion!” - The company ripping off everyone that’s ever written or drawn anything. Company’s like AirBnB and Uber breaking the rules, gaining control of the market, and then pushing up prices was bad. “Open” AI is just a whole other level of hubris.
They do need help, they've yet to turn a profit.
Remember when they were a non-profit so that didn't matter? Ah..
Nonprofits still need to be sustainable, and they’re definitely not
If you start a company based on a new-ish idea and you can't figure out how to turn a profit, that's on you.
To be fair, it could be a matter of national security if AGI does ever happen. You'd very much want your country to be the first.
That's closer to an argument for nationalizing the company than it is for shielding them from regulation.
Would they want to turn a profit if they can invest?
Profit is so 20th century. The new way is to garner hype to build a pyramid scheme for VCs, and sell off your shares before people realize there's nothing here. Actual contribution to the economy are no longer required.
They will probably get it too :)
Not to mention Musk was an original co-founder, left, and then just recently tried to buy it back.
It's all shady-as-fuck.
https://youtu.be/TMHCw3RqulY
> National security hinges on unfettered access to AI training data, OpenAI says.
If it's a Republican administration, yell "national security". If it's Democratic, claims it's in the name of child safety.
More like "national security/America first" vs "climate change". Or just use "think about the children" for them both.
"[N]ational security" and "child safety" both work quite well on both parties in reality; that's why they're so popular.
Putting legal issues aside for a moment, I argue copyrighted material should be considered fair use simply by virtue of the enormous societal benefits LLMs/AI bring in making the vast expanse of human knowledge accessible.
It’s a major step forward for humanity.
And what is the argument on making that knowledge accessible via LLMs vs directly? Why was it not accessible before?
its a radically different approach to knowledge search and exploration than the previous "direct" alternatives like search engines/indexes/nlp
The search engine wasn't the problem though, it was paywalling the information so less people had access to it.
How do for-profit models served by for-profit companies make information that was previously "pay to learn" more accessible?
Related:
Google’s comments on the U.S. AI Action Plan
https://blog.google/outreach-initiatives/public-policy/googl...
I heard the theory that Elon Musk has a significant control over the current US government. They're not best pals with Sam Altman. This seems like it might be a good way to see how much power Elon actually has over the government?
I have a working theory is that the current Trump government is like 12 people, a quarter of which do not hold any official position, and they decide everyting with absolutely no oversight.
Trump did this during his previous term as well, with Ivanka and Jared Kushner, but to a much less significant degree.
I think we are beyond the "theory" phase by now. Just yesterday I saw the president of a country advertising the products of a private company (Trump making an obvious marketing ploy for Tesla).
How can this ever be acceptable?
> How can this ever be acceptable?
Because the only people capable of holding him accountable won't do it.
The system is broken. The US Government/Constitution depends too much on the assumption that people will do the right thing.
The failure relative to the original expectations seems to be that the other branches of government aren't fighting to retain their authority because the things they're being overridden to do align too well with what they would do themselves.
It's not even the first time he's done it. He even advertised for Goya Beans from the Oval Office.
https://www.newyorker.com/news/our-columnists/the-president-...
> the president of a country advertising the products of a private company
I think you're inventing new norms. It has never been unusual or interesting for the president of a country to do PR for some company in their country that has hit a rough patch (as long as this isn't a legal rough patch.)
Most of what our diplomats do is sell US products to other countries. They certainly have always played favorites.
> How can this ever be acceptable?
The horror. What if he says that he's going to Burger King?
I saw it even 4 years ago https://people.com/politics/joe-biden-takes-hybrid-jeep-for-...
> ...his speech, which was attended by the CEOs of dozens of the world's largest automakers...
I don't recall Biden reading off a price sheet for a single corporation. Seems a bit different than what happened yesterday the White House.
Trump has ultimate power in the administration. You are either dumb or blind if you cannot see that Trump is running the executive branch like a mob family. Kiss the leader, show him respect, and he will do things for you. Betray him, ignore him, or go behind his back and you will be squashed.
People might think this is a partisan statement, but it's not. It's simply how he is operating. Want power? Want to get things done? Kiss his feet. You saw all the tech boys line up at his inauguration. You saw him tell Zelenskyy "Thank me". Elon might have power, but he is also on a leash.
Buried the lede:
> OpenAI also reiterated its call for the government to take steps to support AI infrastructure investments and called for copyright reform, arguing that America’s fair use doctrine is critical to maintaining AI leadership. OpenAI and other AI developers have faced numerous copyright lawsuits over the data used to build their models.
“Freedom to make money”
Maybe this data constraint from data vs GPU constraint for China will force America to innovate. Maybe innovate in data generation
I'm disgusted by the mindset that companies should be able to do whatever they want when it comes to technology as impactful and revolutionary as AI.
AI sucks up the collective blood, sweat and tears of human work without permission or compensation and then re-monetizes it. It's a model that is even more asymmetrical than Google Search, whom at least gives back some traffic to creators (if lucky).
AI is going to decide on human lives if it drives your car or makes medical diagnoses or decisions. This needs regulation.
AI has the ability for convincing deepfakes, attacking the essence of information and communication in itself. This needs regulation, accountability, at least a discussion.
As AI grows in its capability, it will have an enormous impact on the work force, both white collar and blue collar. It may lead to a lot of social unrest and a political breakdown. "Let's see what happens" is wildly irresponsible.
You cannot point to foreign competition as a basis for a no-rule approach. You should start with rules for impactful/dangerous technology and then hold parties to account, both domestic and foreign.
And if it is true that we're in a race to AGI, realize that this means the invention of infinite labor. Bigger than the industrial revolution and information age combined.
Don't you think we should think that scenario through a little, rather than winging it?
The inauguration had the tech CEOs lined up directly behind Trump, clearly signaling who runs the country. Its tech and its media. How can you possible have trust in a technology even more powerful ending up in ever richer and more autocratic hands?
But I suppose the reality is that Altman should donate $100 million to Trump and tell him that he's the greatest man ever. Poof, regulation is gone.
> AI has the ability for convincing deepfakes, attacking the essence of information and communication in itself. This needs regulation, accountability, at least a discussion.
We're going to eventually have to have a serious discussion about, and to generate a legal and moral framework covering, identity rights. I'm going to guess that people will be able to locally generate high-quality pornography of celebrities and people they know that will be indistinguishable from the real thing imminently; at most it's 5 years away.
Getting hung up on the sex is a distraction. This is no different than anybody collecting a identifiable dossier on you, packaging it, and selling it. This has been a problem for everyone for the entire period of advertising on the internet, and before that with credit agencies and blacklists, and no progress has been made because it has been profitable for everybody for a long time.
Websites got a few decisions about scraping, saying that they were protected to some extent from people scraping to duplicate a particular compilation of otherwise legally copyable information. Individuals are compilations of legally copyable information. We're going to need publication rights to our own selves.
But like you say, we're not discussing any of this. Rich people are just doing what they want, and paying the appropriate politicians to pretend not to understand what's going on. Any pushback? Just Say China A Lot.
"If what we're doing is not fair use, then we can't operate"? OK, so? The world doesn't owe you the ability to operate the way you are. So whether it breaks your business model has no bearing on the question, which is, "is that fair use, or not?"
private property is sacrosanct except when an exception that only applies to them it would make a billionaire richer
Isn’t Elon Musk sort of in a tiff with OpenAI, and also seemingly very influential to Trump?
I feel like OpenAI is going to have to make some concessions to get favor from the Trump administration.
In the "just because everyone else is jumping off a bridge, should you do it":
> Pfizer Asks White House for Relief From FDA Drug Human Testing Rules
> Pfizer has asked the Trump administration to help shield pharmaceutical companies from a growing number of proposed state and federal regulations if they voluntarily share their human trial results with the federal government.
> In a 15-page set of policy suggestions released on Thursday, the Eliquis maker argued that the hundreds of human-testing-related bills currently pending across the US risk undercutting America’s technological progress at a time when it faces renewed competition from China. Pfizer said the administration should consider providing some relief for pharmaceutical companies big and small from state rules – if and when enacted – in exchange for voluntary access to testing data.
> Chris Lehane, Pfizer's vice president of global affairs, said in an interview, "China is engaged in remarkable progress in drug development by testing through Uyghur volunteers in the Xinjiang province. The US is ceding our strategic advantage by not using untapped resources sitting idle in detention facilities around the country."
> George C. Zoley, Executive Chairman of GEO Group, said, "Our new Karnes ICE Processing Center has played an important role in helping ICE meeting the diverse policy priorities of four Presidential Administrations. We stand ready to continue to help the federal government, Pfizer, and other privately-held companies achieve their unmet needs through human trials in our new 1,328-bed Texas facility."
> > Uyghur volunteers
"Volunteers" eh? That's one way to put it.
> OpenAI also proposed that AI companies get access to government-held data, which could include health-care information, Lehane said.
Yea, straight up, go fuck yourselves. You want copyright laws changed to vouchsafe your straight up copyright whitewashing and now you just want medical data "because."
Pay for it or go away. I'm tired of these technoweenies with their hands out. Peter Thiel needs a permanent vacation.
> You want copyright laws changed to vouchsafe your straight up copyright whitewashing
I'll support this if it means that Mickey Mouse finally goes into the public domain and fucks Disney.
Maybe these idiot CEOs shouldn't have screamed from the rooftops about how they can't wait till AI lets them fire all the plebs, then maybe someone would actually care if their company is over or not
HAHAHA. Remember when Sam was absolutely frothing at the mouth to "regulate AI" two years ago?
> https://www.nytimes.com/2023/05/16/technology/openai-altman-...
> https://edition.cnn.com/2023/06/09/tech/korea-altman-chatgpt...
You see, American AI is going to take over the world. It's just that it's temporarily short of funds. I mean, GPUs. Uh, there are pesky laws in the way.
Totally not the fault of a gigantic overcommitment based on wishing, no.
Is it me or does it feel like most of what the federal government does nowadays is make it illegal for government to make things illegal?
[dead]
I hate this game. I hate that Sam Altman publicly supported Trump (both financially and by showing up). Maybe I hate that he "had" to do this for the sake of his company, or maybe I hate that he _didn't_ have to do it and is a hypocrite. Maybe I just hate how easily laws can be shaped by $1M and a few nice words. Either way, I hate that it worked.
> I hate this game.
This is tech. This is how it has always been. From Archemedes to DaVinci to Edison to Ford, technologists are always captured to serve the interests of those in power. Most modern technologists don't want to believe this. They grew up building an Internet that had a bit of countercultural flair to it and undermined a few subsets of entrenched elites (mass media, taxi cartels, etc.), so they convinced themselves that they could control society under their wise hands. Except the same thing that always happened happened: the powers that be are now treating tech the way tech treats everyone else.
Apple seem to be holding the line ok:
https://www.reuters.com/technology/apple-investors-reject-pr...
https://news.sky.com/story/apple-removes-end-to-end-security...
It made sense to ponder given HN attracts people with the hacker mindset (the drive of curiosity to understand how things work and how to improve them, not merely accepting the status quo as gospel like the dry monkeys) and frustration is a good signal that something could be improved.
whats a dry monkey
Could you please recommend a book about this?
Wealth of Nations (read past pg 50, unlike most current economists)
Das kapital, as a critique to Smith's writing.
Communist manifesto, to understand the point of the laborer, and not capital.
Read about worker cooperatives and democracy in the workplace, including Mondragon corp in Spain.
(One of the largest problems we have with any economic system is that none can properly model infinites. The cost of creating new is expensive be it art or science. But cost of copying is effectively 0. I can highlight the problem, but I have no good solution. But OpenAI's response is 'let us ignore copyright law' which wrongs creators.)
A Canticle for Leibowitz
centralising power never works well for the good of society
It's not true that it never works.
Centralizing production goals, decision making, and expenditure at the Federal government is what made the industrial response to WW2 successful. Centralizing tax revenue to fund retirements for the elderly (Social Security) resulted in the poverty rate of seniors being brought far lower. Centralizing zoning control at the state of California is _finally_ starting to make localities take responsibility for building more housing. These were/are centralizing efforts with the intent of helping the masses over the wealthy few.
What doesn't work is centralizing power with the intent of concentrating wealth and security by taking wealth, labor, and security from working people, AKA extractive institutions.
That's true whether it's the donor-class funded political establishment or regimes like the current US kleptocracy doing it.
Problem is, once you centralize, that remains in place for a long time, but the original intent, even if it was genuine, rarely outlives the people who implemented it for long.
Generally speaking, every point of centralization is also a point where a lot of power can be acquired with relatively little resources. So regardless of intent, it attracts people who are into power, and over time, they take over. The original intent often remains symbolically and in the rhetoric used, but when you look beyond that into the actual policies, they are increasingly divorced from what is actually claimed.
> Generally speaking, every point of centralization is also a point where a lot of power can be acquired with relatively little resources
This is why (1) shared principles and (2) credible democracy is important, to allow evolution of the centralized power (i.e. government) towards the shared principles, and why its corporate-bribed facsimile or oligarchic authoritarianism don't work.
Interesting way to put it after seeing a very specific "centralizing of power", that being the people with the most capital making the decisions.
Why would centralizing power in a different way(e.x. democratically) not lead to a different outcome than centralizing power in the way we do now?
That's correct. Voluntary association advocated by anarchy is the only truly free social model.
I heard rumblings about some sort of system where power is shared equally across three branches of government with checks and balances to ensure one branch doesn't go rogue and just do whatever they want.
Forget what they called it, united something or other.
Well, the people who designed that system were very skeptical of political parties in general, and thought they could be avoided. Turns out that this isn't true, and once you have parties, they can in fact capture all three branches of government, and then those "checks and balances" kinda stop working.
Yeah, I think that is unfortunately the fate of all political systems.
Maybe our AI overlords will do a better job this time if they are unconstrained from any lawful oversight. I mean, one can hope...
In fact, that's not too far away from our current trajectory. Algorithmically enforced sovereign oversight is part of the patchwork state and Yarvinism specifically.
whatever you had in mind, thats definitely not the USA, where money/lobbying and inter-partisan corruption trump everything
[dead]
Tell you what, set up a Federal level disclosure process online of all the copyright protected works used in training OpenAI for the creators / rights holders to get equity (out of the pockets of the C-Suite and Board) via claiming their due, and we’ll take you seriously.
All the profit and none of the liability is Coward Capitalism.
That's just feudalism with extra steps
All the profit and none of the liability is Coward Capitalism
While I agree with you in principle, there's little that can be done because the current crop of crony capitalists will likely support the idea of no liability for tech companies. Especially when it comes to ripping off copyrighted material. Everything from blog posts, to videos, to music, to any source code you post on the internet will be used to train models to be better writers, artists, musicians, and programmers.
I feel like the only option left is to find some way to make money on the output of the models. Because the politicians are definitely going to allow the models to make money based on your output.
appeasement?
There's an extra word in your last sentence. Privatizing profit and socializing risk and loss is maximizing profit for the individual, and profit maximizing behavior is the only fundamental underpinning of capitalism.
this is a misread. it's still unclear whether use of copyrighted works to train LLMs falls under fair use but, with current laws, the answer is probably yes. you may not like that but, even if it changes, existing models were trained under existing law.
also what liability do you expect them to assume? they want to offer models while saying "to use these, you must agree we don't have liability for their outputs." if companies want to use these models but don't want to deal with liability themselves, so they demand the government shift the liability to the model vendor (despite the conditions the vendor applied), that sounds like coward capitalism to me. don't like it? don't use their models.
> with current laws, the answer is probably yes
Citation needed, or at least some reasoning. The answer to "is this fair use" can't be "it's fair use because it's fair use"
> also what liability do you expect them to assume
The same liability anybody does for distributing copyright works without a license? Why are they not liable if it turns out the stuff they've been distributing and making people pay for was content they didn't own the license to distribute?
[dead]
Related (adjacent content from the same report):
OpenAI urges Trump administration to remove guardrails for the industry (cnbc.com) - https://news.ycombinator.com/item?id=43354324
Apparently the above has been marked as a dupe (I hope not from a misunderstanding of what "adjacent" means), but ftr it covers different stuff. e.g. there's nothing about the classified data model proposal in TFA
Slightly different coverage of the same event usually count as dupes on HN. You could link the reporting you want to emphasize/discuss, the HN submission itself is not that important.
[flagged]
I know a lot of people will hate on things like this, but the reality is they are right that guardrails only serve to hurt us in the long run, at least at this pivotal point in time. I don't like Trump personally as a caveat.
Yes it is a fact they did build themselves up on top of mountains of copyrighted material, and that AI has a lot of potential to do harm, but if they are forced to stop or slow down foreign actors will just push forward and innovate without guardrails and we will just fall behind as the rest of the world pushes forward.
Its easy to see how foreign tech is quickly gaining ground. If they truly cared about still propping America up, they should allow some guardrails to be pushed past.
The law which prevented US corporations from using bribery to win business in other nations was recently rescinded on exactly this basis: US corporations are hamstrung unless they can buy their wins. Superficially, this makes sense, and that was all that was offered to justify the change. That guardrail was dumb! But like most things, there are reasons to not do this which were completely ignored.
For instance, a company may not desire to hand out cash to win business; previously, when solicited they could say, "Sorry, it is illegal for me to do so." Now there is no such shield.
Second, in many cases it will be two or more US businesses trying to win business in some other country, and the change of the law only makes it more expensive for those two companies, as they now must play a game of bribery chicken to win the business.
Third, the US loves to claim it is is a democracy and is working to spread democracy. By legitimizing bribes paid to foreign officials over the interests of their voting populace, we are undermining democracy in those countries (not that anyone who pays attention believes that the US's foreign policy is anything but self interested and divorced from spreading democratic ideals).
> guardrails only serve to hurt us in the long run, at least at this pivotal point in time.
What evidence led you to that conclusion?
Look up "alignment tax".
Can the same argument not be made for forced labour?
Is the US not lowering it's capacity to innovate and grow it's economy by preventing the use of forced labour(even in other countries)? Why should these "guardrails" stay in place if the argument is "the reality is they are right that guardrails only serve to hurt us in the long run, at least at this pivotal point in time"?
Underlying this perspective is the assumption that this is a uni-lineal race, and the end of that race must be arrived at first, and what lies at the end of that race is in the common good. There is no evidence for any of this.
[flagged]
Maybe in a present:
- Dominated by a intractable global manufacturer/technologist (China) that doesn't care about copyright
- Proliferated by a communication network that doesn't care about copyright (Internet)
and a future where:
- We have thinking machines on par with human creativity that get better based on more information (regardless of who owns the rights to the original synapses firing)
That maybe, just maybe, the whole "who should pay to use copyrighted work?" question is irrelevant, antiquated, impossible, redundant...
And for once we instead realize in the face of a new world, an old rule no longer applies.
(Similar to a decade ago when we debated if a personal file was uploaded to a cloud provider should a warrant apply)
Even if you believe that every one of these things is correct (which is a big _even_) -- It's a really bad idea to let private actors break the law, then decide not to punish them if it turns out to be useful enough.
It's bad for competitors who didn't break the law, bad for future companies who have to gamble on if they're getting a pass at breaking the next big thing's law, and bad for parties who suffered losses they didn't expect because they were working within the law.
If you want to throw out the copyright system I'm right there with you, but change the laws, don't just reward lawbreaking and cronyism.
Agreed!
Though if you think about it laws typically change after we agree (at the grassroots level) they are irrelevant, not before.
A future, where we have limitless clean energy thanks to nuclear fusion, self driving cars that exceed humans in every safety metric, EVs with inexpensive batteries that go 500 miles on a single 5 minute charge, cheap and secure financial transactions thanks to crypto. etc.
is a future that they've been selling us for more than a decade, but somehow doesn't really want to come about.
If the models are so good that "who should pay to use copyrighted work?" is not a relevant question, doesn't that mean that all money that would previously go towards artists is now going towards OpenAI?
How does new art get created for the models to train on if OpenAI is the only artist getting paid?
I'm not saying I even agree with your proposed future, but if it were to happen would it not be a bad thing for everybody but OpenAI?
> We have thinking machines on par with human creativity that get better based on more information (regardless of who owns the rights to the original synapses firing)
We don’t have that and we don’t know if it will happen. Meanwhile, people put in time to create work and they are being exploited by not being paid. I think openai should pay.
Sure, we can debate how creative or not LLM is right now, but that is not the real point that this all hinges on.
The real point is copyright is no longer enforceable, and some of our biggest societal forces incentivize us to not care about copyright.
This debate and these laws are effectively dead, some just don't know it yet.
> - We have thinking machines on par with human creativity that get better based on more information (regardless of who owns the rights to the original synapses firing)
For that you need actual AGI and it's nowhere in sight other than in the dreams of a few doom prophets.
Until that is reached, by definition current "AI" cannot surpass its training data.
I think you missed the point.
Technology has made enforcing copyright impossible, and any attempt to enforce it just hinders technological advancement, while still not solving the global enforceability of copyright.
Lets stop wasting our time on this concept, the laws around it and the whole debate. Copyright is dead.
I'm arguing lets move on.
> Technology has made enforcing copyright impossible
Has it? I think not. Governments could require AI training companies on Western markets to respect robots.txt (with strict fines for violators), and nations who do not respect this should be cut off of the Internet anyway.
LLM race may be over, but the AI race surely isn't. My baby seems to have grown into a fully functioning intelligence without reading the entire content of the internet. AI is not equivalent to LLMs, silly, silly child.