336 lines
17 KiB
Plaintext
336 lines
17 KiB
Plaintext
[1]Blog [2]About [3]Moonbound
|
||
|
||
This is a post from [4]Robin Sloan’s lab blog & notebook. You can [5]visit the
|
||
blog’s homepage, or [6]learn more about me.
|
||
|
||
[7]Is it okay?
|
||
|
||
February 11, 2025 Macbeth Consulting the Witches, 1825, Eugène Delacroix [8]
|
||
Macbeth Consulting the Witches, 1825, Eugène Delacroix
|
||
|
||
How do you make a language model? Goes like this: erect a trellis of code, then
|
||
allow the real program to grow, its development guided by a grueling training
|
||
process, fueled by reams of text, mostly scraped from the internet. Now. I want
|
||
to take a moment to think together about a question with no remaining practical
|
||
importance, but persistent moral urgency:
|
||
|
||
Is that okay?
|
||
|
||
The question doesn’t have any practical importance because the AI companies —
|
||
and not only the companies, but the enthusiasts, all over the world — are going
|
||
to keep doing what they’re doing, no matter what.
|
||
|
||
The question does still have moral urgency because, at its heart, it’s a ques
|
||
tion about the things people all share together: the hows and the whys of
|
||
humanity’s common inheritance. There’s hardly anything bigger.
|
||
|
||
And, even if the companies and the enthusiasts rampage ahead, there are still
|
||
plenty of us who have to make personal decisions about this stuff every day.
|
||
You gotta take care of your own soul, and I’m writing this because I want to
|
||
clarify mine.
|
||
|
||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||
|
||
A few ground rules.
|
||
|
||
First, if you (you engineer, you AI acolyte!) think the answer is obviously
|
||
“yes, it’s okay”, or if you (you journalist, you media executive!) think the
|
||
answer is obviously “no, it’s not okay”, then I will suggest that you are not
|
||
thinking with sufficient sensitivity and imagination about something truly new
|
||
on Earth. Nothing here is obvious.
|
||
|
||
Second, I’d like to proceed by depriving each side of its best weapon.
|
||
|
||
On the side of “yes, it’s okay”, I will insist that the analogy to human
|
||
learning is not admissible. “Don’t people read things, and learn from them, and
|
||
produce new work?” Yes, but speed and scale always influence our judgments
|
||
about safety and permissibility, and the speed and scale of machine learning is
|
||
off the charts. No human, no matter how well-read, could ever field requests
|
||
from a million other people, all at once, forever.
|
||
|
||
On the side of “no, it’s not okay”, I will set aside any arguments grounded in
|
||
copyright law. Not because they are irrelevant, but because … well, I think
|
||
modern copyright is flawed, so a victory on those grounds would be thin, a bit
|
||
sad. Instead, I’ll defer to deeper precedents: the intuitions and aspirations
|
||
that gave rise to copyright in the first place. To promote the Progress of Sci
|
||
ence and useful Arts, remember?
|
||
|
||
I hope partisans of both sides will agree this is a fair swap. Put down your
|
||
weapons, and let’s think together.
|
||
|
||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||
|
||
I want to go carefully, step by step — yet I want to do so with brevity. Lan
|
||
guage models produce so … many … WORDS, and they seem to coax just as many out
|
||
of their critics. Logorrhea begets logorrhea. We can do better.
|
||
|
||
I’ll begin with my sense of what language models are doing. Here it is: lan
|
||
guage models collate and precipitate all the diverse reasons for writing,
|
||
across a huge swath of human activity and aspiration. Start to enumerate those
|
||
reasons: to inform, to persuade, to sell this stupid alarm clock, to dump the
|
||
CUSTOMERS table into a CSV file … and you realize it’s a vast field of desire
|
||
and action, impossible to hold in your head.
|
||
|
||
The language models have many heads.
|
||
|
||
In this formulation, language models are not merely trained on human writing.
|
||
They are the writing: all those reasons, granted the ability to speak for
|
||
themselves. I imagine the PyTorch code as a mech suit, with squishy language
|
||
strapped in tight …
|
||
|
||
To make this work — you already know this, but I want to underscore it — only a
|
||
truly rich trove of writing suffices. Train a language model on all of
|
||
Shakespeare’s works and you won’t get anything useful, just a brittle
|
||
Shakespeare imitator.
|
||
|
||
In fact, the only trove known to produce noteworthy capabilities is: the entire
|
||
internet, or close enough. The whole extant commons of human writing. From here
|
||
on out, for brevity, we’ll call it Everything.
|
||
|
||
This is what makes these language models new: there has never, in human
|
||
history, been a way to operationalize Everything. There’s never been any
|
||
thing close.
|
||
|
||
Just as, above, I set copyright aside, I want also to set aside fair use and
|
||
the public domain. Again, not because they are irrelevant, but because those
|
||
intuitions and frameworks all assume we are talking about using some part of
|
||
the commons — not all of it.
|
||
|
||
I mean: ALL of it!
|
||
|
||
If language models worked like cartoon villains, slurping up Everything and
|
||
tainting it with techno-ooze, our judgment would be easy. But of course, digiti
|
||
zation is trickier than that: the airy touch of the copy complicates the sce
|
||
nario.
|
||
|
||
The language model reads Everything, and leaves Everything untouched — yet sud
|
||
denly this new thing exists, with strange and formidable powers.
|
||
|
||
Is that okay?
|
||
|
||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||
|
||
As we begin to feel our way across truly new terrain, we can inquire: how much
|
||
of the value of these models comes from Everything? If the fraction was just
|
||
one percent, or even ten, then we wouldn’t have much more to say.
|
||
|
||
But the fraction is, for sure, larger than that.
|
||
|
||
What goes into a language model? Data and compute.
|
||
|
||
For the foundation models like Claude, data means: Everything.
|
||
|
||
Compute combines two pursuits:
|
||
|
||
1. software: the trellises and applications that support the development and
|
||
deployment of these models, and
|
||
|
||
2. hardware: the vast sultry data centers, stocked with chips, that give them
|
||
room to run
|
||
|
||
There’s a lot of value in those pursuits; I don’t take either for granted, or
|
||
the labor they require. The experience you get using a model like Claude
|
||
depends on an ingenious scaffolding. [9]Truly! At the same time: I believe
|
||
anyone who works on these models has to concede that the trellises and the
|
||
chips, without data, are empty vessels. Inert.
|
||
|
||
Reasonable people can disagree about how the value breaks down. While I believe
|
||
the relative value of Everything in this mix is something close to 90%, I’m
|
||
willing to concede a 50/50 split.
|
||
|
||
And here is the important thing: there is no substitute.
|
||
|
||
You’ve probably heard about the race to generate novel training data, and all
|
||
the interesting effects such data can have. It is sometimes lost in those dis
|
||
cussions that these sophisticated new curricula can only be provided to a lan
|
||
guage model already trained on Everything. That training is what allows it to
|
||
make sense of the new material.
|
||
|
||
Also, it is often the case — not always, but often — that the novel training
|
||
data is generated by … a language model … which has itself been trained
|
||
on … you guessed it.
|
||
|
||
It’s Everything, all the way down.
|
||
|
||
Would it be possible to commission a fresh body of work, Everything’s equal in
|
||
scale and diversity, without any of the encumbrances of the commons? If you
|
||
could do it, and you trained a clean-room model on that writing alone, I con
|
||
cede that my question would be moot. (There would be other questions! Just not
|
||
this one.) Certainly, with as much money as the AI companies have now, you’d
|
||
expect they might try. We know they are already paying to produce new content,
|
||
lots of it, across all sorts of business and technical domains.
|
||
|
||
But this still wouldn’t match the depth and richness of Everything. I have a
|
||
hypothesis, which naturally might be wrong: that it is precisely the naivete of
|
||
Everything, the fact that its writing was actually produced for all those dif
|
||
ferent reasons, that makes it so valuable. Composing a fake corporate email,
|
||
knowing it will be used to train a language model, you’re not doing nothing,
|
||
but you’re not doing the same thing as the real email-writer. Your document
|
||
doesn’t have the same … what? The same grain. The same umami.
|
||
|
||
Maybe one of these companies will spend ten billion dollars to commission a
|
||
whole new internet’s worth of text and prove me wrong. However, I think there
|
||
are information-theoretic reasons to believe the results of such a project
|
||
would disappoint them.
|
||
|
||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||
|
||
So! Understanding that these models are reliant on Everything, and derive a
|
||
large fraction of their value from it, one judgment becomes clear:
|
||
|
||
If their primary application is to produce writing and other media that crowds
|
||
out human composition, human production: no, it’s not okay.
|
||
|
||
For me, this is intuitively, almost viscerally, obvious. Here is the ultimate
|
||
act of pulling the ladder up behind you, a giant “fuck you” to every human who
|
||
ever wanted to accomplish anything, who matched desire to action, in writing,
|
||
part of Everything. Here is a technology founded in the commons, working to
|
||
undermine it. Immanuel Kant would like a word.
|
||
|
||
Fine. But what if that isn’t the primary application? What if language models,
|
||
by collating and precipitating all the diverse reasons for writing, become flex
|
||
ible general-purpose reasoners, and most of their “output” is never actually
|
||
read by anyone, instead running silent like the electricity in your walls?
|
||
|
||
It’s possible that language models could go on broadening and deepening in this
|
||
way, and eventually become valuable [10]aids to science and technology, [11]to
|
||
medicine and more.
|
||
|
||
This is tricky — it’s so, so tricky — because the claim is both (1) true, and
|
||
(2) convenient. One wishes it wasn’t so convenient. Can’t these companies
|
||
simply promise, with every passing year, that AI super science is just around
|
||
the corner … and meanwhile, wreck every creative industry, flood the internet
|
||
with garbage, grow rich on the value of Everything? Let us cook—while culture
|
||
fades into a sort of oatmeal sludge.
|
||
|
||
They can do that! They probably will. And the claim might still be true.
|
||
|
||
If super science is a possibility — if, say, Claude 13 can help deliver cures
|
||
to a host of diseases — then, you know what? Yes, it is okay, all of it. I’m
|
||
not sure what kind of person could insist that the maintenance of a media
|
||
status quo trumps the eradication of, say, most cancers. Couldn’t be me. Fine,
|
||
wreck the arts as we know them. We’ll invent new ones.
|
||
|
||
(I know that seems awfully consequentialist. Would I sacrifice anything, or
|
||
everything, for super science? No. But art and media can find new forms. That’s
|
||
what they do.)
|
||
|
||
Obviously, this scenario is especially appealing if the super science, like
|
||
Everything at its foundation, flows out into the commons. It should.
|
||
|
||
So — is super science really on the menu? We don’t have any way of knowing; not
|
||
yet. Things will be clearer in a few years, I think. There will either be real
|
||
undeniable glimmers, reported by scientists putting language models to work, or
|
||
there will still only be visions.
|
||
|
||
For my part, I think the chance of super science is below fifty percent, owing
|
||
mostly to the friction of the real physical world, which the language models
|
||
have, so far, avoided. But, I also think the chance is above ten percent, so,
|
||
I remain curious.
|
||
|
||
It’s not unreasonable to find this wager suspicious, but if you do, I might
|
||
ask: is there any possible-but-unproven technology that you think is worth pur
|
||
suing even at the cost of itchy uncertainty in the present? If the answer is
|
||
“yes, just not this one”: fair enough. If the answer is “no”: aha! I see you’ve
|
||
answered the question at the top of this page for yourself already.
|
||
|
||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||
|
||
Where does this leave us?
|
||
|
||
I suppose it’s not surprising, in the end:
|
||
|
||
If an AI application delivers some profound public good, or even if it might,
|
||
it’s probably okay that its value is rooted in this unprecedented operational
|
||
ization of the commons.
|
||
|
||
If an AI application simply replicates Everything, it’s probably not okay.
|
||
|
||
I’ll sketch out my current opinions more specifically:
|
||
|
||
I think the image generation models, trained on the Everything of pictures,
|
||
are: probably not okay. They don’t do anything except make more images. They
|
||
pee in the pool.
|
||
|
||
I think the foundation models like Claude are: probably okay. If it seemed, a
|
||
couple of years ago, that they were going to be used mainly to barf out text,
|
||
that impression has faded. It’s clear their applications are diverse, and often
|
||
have more to do with processes than end products.
|
||
|
||
The case of translation is compelling. If language models are, indeed, the
|
||
Babel fish, they might justify the operationalization of the commons even
|
||
without super science.
|
||
|
||
I think the case of code is especially clear, and, for me, basically settled.
|
||
That’s both (1) because of where code sits in the creative process, as an inter
|
||
mediate product, the thing that makes the thing, and (2) because the commons of
|
||
open-source code has carried the expectation of rich and surprising reuse for
|
||
decades. I think this application has, in fact, already passed the threshold of
|
||
“profound public good”: opening up programming to whole new groups of people.
|
||
|
||
But, again, it’s important to say: the code only works because of Everything.
|
||
Take that data away, train a model using GitHub alone, and you’ll get a far
|
||
less useful tool.
|
||
|
||
Maybe (it turns out) I’m less interested in litigating my foundational question
|
||
and more interested in simply insisting on the overwhelming, irreplaceable con
|
||
tribution of this great central treasure: all of us, writing, for every conceiv
|
||
able reason; desire and action, impossible to hold in your head.
|
||
|
||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||
|
||
Did we make progress here? I think so. It’s possible my question, at the
|
||
outset, seemed broad. In fact, it’s fairly narrow, about this core mechanism,
|
||
the operationalization of the commons: whether I can live with it, or not.
|
||
|
||
One extreme: if these machines churn through all media, and then, in their
|
||
deployment, blow away any prospect for a healthy market for human-made media,
|
||
I’d say, no, that’s not what we want from technology, or from our future.
|
||
|
||
Another extreme: if these machines churn through all media, and then, in their
|
||
deployment, discover several superconductors and cure all cancers, I’d say,
|
||
okay … we’re good.
|
||
|
||
What if they do both? Well, it would be a bummer for media, but on balance I’d
|
||
take it. There will always be ways for artists to get out ahead again. More on
|
||
that in another post.
|
||
|
||
I also think there are some potential policy remedies that would even out the
|
||
allocation of value here — although, these days, imagining interesting policy
|
||
is a sort of fantastical entertainment. Even so, I’ll post about those later,
|
||
too.
|
||
|
||
In this discussion, I set copyright and fair use aside. I should say, however,
|
||
that I’m not at all interested in clearing the air for AI companies, legally.
|
||
They’ve chosen to plunge ahead into new terrain — so let them enjoy the fog of
|
||
war, Civ-style. Let them cook!
|
||
|
||
[12]To the blog home page
|
||
|
||
I'm [13]Robin Sloan, a fiction writer. The main thing to do here is sign up for
|
||
my newsletter:
|
||
|
||
[14][ ] [15][Subscribe]
|
||
This website doesn’t collect any information about you or your reading.
|
||
It aspires to the speed and privacy of the printed page.
|
||
|
||
Don’t miss [16]the colophon. Hony soyt qui mal pence
|
||
|
||
|
||
References:
|
||
|
||
[1] https://www.robinsloan.com/lab/
|
||
[2] https://www.robinsloan.com/about/
|
||
[3] https://www.robinsloan.com/moonbound/
|
||
[4] https://www.robinsloan.com/
|
||
[5] https://www.robinsloan.com/lab/
|
||
[6] https://www.robinsloan.com/about/
|
||
[7] https://www.robinsloan.com/lab/is-it-okay/
|
||
[8] https://www.clevelandart.org/art/1962.109?utm_source=Robin_Sloan_sent_me
|
||
[9] https://www.youtube.com/watch?v=ugvHCXCOmm4#t=9780
|
||
[10] https://research.google/blog/accelerating-scientific-breakthroughs-with-an-ai-co-scientist/?utm_source=Robin_Sloan_sent_me
|
||
[11] https://darioamodei.com/machines-of-loving-grace?utm_source=Robin_Sloan_sent_me
|
||
[12] https://www.robinsloan.com/lab/
|
||
[13] https://www.robinsloan.com/about?utm_source=Robin_Sloan_sent_me
|
||
[16] https://www.robinsloan.com/colophon/
|