1023 lines
49 KiB
Plaintext
1023 lines
49 KiB
Plaintext
[1]Out of the Software Crisis Bird flying logo [2]Newsletter [3]Book [4]AI Book
|
||
[5]Archive [6]Author
|
||
|
||
Modern software quality, or why I think using language models for programming
|
||
is a bad idea
|
||
|
||
By Baldur Bjarnason,
|
||
May 30th, 2023
|
||
|
||
This essay is based on a talk I gave at [7]Hakkavélin, a hackerspace in
|
||
Reykjavík. I had a wonderful time presenting to a lovely crowd, full of
|
||
inquisitive and critically-minded people. Their questions and the discussion
|
||
afterwards led to a number of improvements and clarifications as I turned my
|
||
notes into this letter. This resulted in a substantial expansion of this essay.
|
||
Many of the expanded points, such as the ones surrounding language model
|
||
security, come directly from these discussions.
|
||
|
||
Many thanks to all of those who attended. The references for the presentation
|
||
are also the references for this essay, which you can find all the way down in
|
||
the footnotes section.
|
||
|
||
The best way to support this newsletter or my blog is to buy one of my books,
|
||
[8]The Intelligence Illusion: a practical guide to the business risks of
|
||
Generative AI or [9]Out of the Software Crisis. Or, you can buy them both [10]
|
||
as a bundle.
|
||
|
||
The software industry is very bad at software
|
||
|
||
Here’s a true story. Names withheld to protect the innocent.
|
||
|
||
A chain of stores here in Iceland recently upgraded their point-of-sale
|
||
terminals to use new software.
|
||
|
||
Disaster, obviously, ensued. The barcode scanner stopped working properly,
|
||
leading customer to be either overcharged or undercharged. Everything was
|
||
extremely slow. The terminals started to lock up regularly. The new invoice
|
||
printer sucked. A process that had been working smoothly was now harder and
|
||
took more time.
|
||
|
||
The store, where my “informant” is a manager, deals with a lot of businesses,
|
||
many of them stores. When they explain to their customers why everything is
|
||
taking so long, their answer is generally the same:
|
||
|
||
“Ah, software upgrade. The same happened to us when we upgraded our terminals.”
|
||
|
||
This is the norm.
|
||
|
||
The new software is worse in every way than what it’s replacing. Despite having
|
||
a more cluttered UI, it seems to have omitted a bunch of important features.
|
||
Despite being new and “optimised”, it’s considerably slower than what it’s
|
||
replacing.
|
||
|
||
This is also the norm.
|
||
|
||
Switching costs are, more often than not, massive for business software, and
|
||
purchases are not decided by anybody who actually uses it. The quality of the
|
||
software disconnects from sales performance very quickly in a growing software
|
||
company. The company ends up “owning” the customer and no longer has any
|
||
incentive to improve the software. In fact, because adding features is a key
|
||
marketing and sales tactic, the software development cycle becomes an act of
|
||
intentional, controlled deterioration.
|
||
|
||
Enormous engineering resources go into finding new ways to minimise the
|
||
deterioration—witness Microsoft’s “ribbon menu”, a widget invented entirely to
|
||
manage the feature escalation mandated by marketing.
|
||
|
||
This is the norm.
|
||
|
||
This has always been the norm, from the early days of software.
|
||
|
||
The software industry is bad at software. Great at shipping features and
|
||
selling software. Bad at the software itself.
|
||
|
||
Why I started researching “AI” for programming
|
||
|
||
In most sectors of the software industry, sales performance and product quality
|
||
are disconnected.
|
||
|
||
By its nature software has enormous margins which further cushion it from the
|
||
effect of delivering bad products.
|
||
|
||
The objective impact of poor software quality on the bottom lines of companies
|
||
like Microsoft, Google, Apple, Facebook, or the retail side of Amazon is a
|
||
rounding error. The rest only need to deliver usable early versions, but once
|
||
you have an established customer base and an experienced sales team, you can
|
||
coast for a long, long time without improving your product in any meaningful
|
||
way.
|
||
|
||
You only need to show change. Improvements don’t sell, it’s freshness that
|
||
moves product. It’s like store tomatoes. Needs to look good and be fresh.
|
||
They’re only going to taste it after they’ve paid, so who cares about the
|
||
actual quality.
|
||
|
||
Uptime reliability is the only quality measurement with a real impact on ad
|
||
revenue or the success of enterprise contracts, so that’s the only quality
|
||
measurement that ultimately matters to them.
|
||
|
||
Bugs, shoddy UX, poor accessibility—even when accessibility is required by
|
||
law—are non-factors in modern software management, especially at larger
|
||
software companies.
|
||
|
||
The rest of us in the industry then copy their practices, and we mostly get
|
||
away with it. Our margins may not be as enormous as Google’s, but they are
|
||
still quite good compared to non-software industries.
|
||
|
||
We have an industry that’s largely disconnected from the consequences of making
|
||
bad products, which means that we have a lot of successful but bad products.
|
||
|
||
The software crisis
|
||
|
||
Research bears this out. I pointed out in my 2021 essay [11]Software Crisis 2.0
|
||
that very few non-trivial software projects are successful, even when your
|
||
benchmarks are fundamentally conservative and short term.
|
||
|
||
For example, the following table is from [12]a 2015 report by the Standish
|
||
Group on their long term study in software project success:
|
||
|
||
SUCCESSFUL CHALLENGED FAILED TOTAL
|
||
Grand 6% 51% 43% 100%
|
||
Large 11% 59% 30% 100%
|
||
Medium 12% 62% 26% 100%
|
||
Moderate 24% 64% 12% 100%
|
||
Small 61% 32% 7% 100%
|
||
|
||
The Chaos Report 2015 resolution by project size
|
||
|
||
This is based on data that’s collected and anonymised from a number of
|
||
organisations in a variety of industries. You’ll note that very few projects
|
||
outright succeed. Most of them go over budget or don’t deliver the
|
||
functionality they were supposed to. A frightening number of large projects
|
||
outright fail to ship anything usable.
|
||
|
||
In my book [13]Out of the Software Crisis, I expanded on this by pointing out
|
||
that there are many classes and types of bugs and defects that we don’t measure
|
||
at all, many of them catastrophic, which means that these estimates are
|
||
conservative. Software project failure is substantially higher than commonly
|
||
estimated, and success if much rarer than the numbers would indicate.
|
||
|
||
The true percentage of large software projects that are genuinely successful in
|
||
the long term—that don’t have any catastrophic bugs, don’t suffer from UX
|
||
deterioration, don’t end up having core issues that degrade their business
|
||
value—is probably closer to 1–3%.
|
||
|
||
The management crisis
|
||
|
||
We also have a management crisis.
|
||
|
||
The methods of top-down-control taught to managers are counterproductive for
|
||
software development.
|
||
|
||
• Managers think design is about decoration when it’s the key to making
|
||
software that generates value.
|
||
• Trying to prevent projects that are likely to fail is harmful for your
|
||
career, even if the potential failure is wide-ranging and potentially
|
||
catastrophic.
|
||
• When projects fail, it’s the critics who tried to prevent disaster who are
|
||
blamed, not the people who ran it into the ground.
|
||
• Supporting a project that is guaranteed to fail is likely to benefit your
|
||
career, establish you as a “team player”, and protects you from harmful
|
||
consequences when the project crashes.
|
||
• Teams and staff management in the software industry commonly ignores every
|
||
innovation and discovery in organisational psychology, management, and
|
||
systems-thinking since the early sixties and operate mostly on management
|
||
ideas that Henry Ford considered outdated in the 1920s.
|
||
|
||
We are a mismanaged industry that habitually fails to deliver usable software
|
||
that actually solves the problems it’s supposed to.
|
||
|
||
Thus, [14]Weinberg’s Law:
|
||
|
||
If builders built buildings the way programmers wrote programs, then the
|
||
first woodpecker that came along would destroy civilization.
|
||
|
||
It’s into this environment that “AI” software development tools appear.
|
||
|
||
The punditry presented it as a revolutionary improvement in how we make
|
||
software. It’s supposed to fix everything.
|
||
|
||
—This time the silver bullet will work!
|
||
|
||
Because, of course, we have had such a great track record with [15]silver
|
||
bullets.
|
||
|
||
So, I had to dive into it, research it, and figure out how it really worked. I
|
||
needed to understand how generative AI works, as a system. I haven’t researched
|
||
any single topic to this degree since I finished my PhD in 2006.
|
||
|
||
This research led me to write my book [16]The Intelligence Illusion: a
|
||
practical guide to the business risks of Generative AI. In it, I take a broader
|
||
view and go over the risks I discovered that come with business use of
|
||
generative AI.
|
||
|
||
But, ultimately, all that work was to answer the one question that I was
|
||
ultimately interested in:
|
||
|
||
Is generative AI good or bad for software development?
|
||
|
||
To even have a hope of answering this, we first need to define our terms,
|
||
because the conclusion is likely to vary a lot depending on how you define “AI”
|
||
or even "software development.
|
||
|
||
A theory of software development as an inclusive system
|
||
|
||
Software development is the entire system of creating, delivering, and using a
|
||
software project, from idea to end-user.
|
||
|
||
That includes the entire process on the development side—the idea, planning,
|
||
management, design, collaboration, programming, testing, prototyping—as well as
|
||
the value created by the system when it has been shipped and is being used.
|
||
|
||
My model is that of [17]theory-building. From my essay on theory-building,
|
||
which itself is an excerpt from [18]Out of the Software Crisis:
|
||
|
||
Beyond that, software is a theory. It’s a theory about a particular
|
||
solution to a problem. Like the proverbial garden, it is composed of a
|
||
microscopic ecosystem of artefacts, each of whom has to be treated like a
|
||
living thing. The gardener develops a sense of how the parts connect and
|
||
affect each other, what makes them thrive, what kills them off, and how you
|
||
prompt them to grow. The software project and its programmers are an
|
||
indivisible and organic entity that our industry treats like a toy model
|
||
made of easily replaceable lego blocks. They believe a software project and
|
||
its developers can be broken apart and reassembled without dying.
|
||
|
||
What keeps the software alive are the programmers who have an accurate
|
||
mental model (theory) of how it is built and works. That mental model can
|
||
only be learned by having worked on the project while it grew or by working
|
||
alongside somebody who did, who can help you absorb the theory. Replace
|
||
enough of the programmers, and their mental models become disconnected from
|
||
the reality of the code, and the code dies. That dead code can only be
|
||
replaced by new code that has been ‘grown’ by the current programmers.
|
||
|
||
Design and user research is an integral part of the mental model the programmer
|
||
needs to build, because none of the software components ultimately make sense
|
||
without the end-user.
|
||
|
||
But, design is also vital because it is, to reuse Donald G. Reinertsen’s
|
||
definition from Managing the Design Factory (p. 11), design is economically
|
||
useful information that generally only becomes useful information through
|
||
validation of some sort. Otherwise it’s just a guess.
|
||
|
||
The economic part usually comes from the end-user in some way.
|
||
|
||
This systemic view is inclusive by design as you can’t accurately measure the
|
||
productivity or quality of a software project unless you look at it end to end,
|
||
from idea to end-user.
|
||
|
||
• If it doesn’t work for the end-user, then it’s a failure.
|
||
• If the management is dysfunctional, then the entire system is
|
||
dysfunctional.
|
||
• If you keep starting projects based on unworkable ideas, then your
|
||
programmer productivity doesn’t matter.
|
||
|
||
Lines of code isn’t software development. Working software, productively used,
|
||
understood by the developers, is software development.
|
||
|
||
A high-level crash course in language models
|
||
|
||
Language models, small or large, are today either used as autocomplete copilots
|
||
or as chatbots. Some of these language model tools would be used by the
|
||
developer, some by the manager or other staff.
|
||
|
||
I’m treating generative media and image models as a separate topic, even when
|
||
they’re used by people in the software industry to generate icons, graphics, or
|
||
even UIs. They matter as well, but don’t have the same direct impact on
|
||
software quality.
|
||
|
||
To understand the role these systems could play in software development, we
|
||
need a little bit more detail on what language models are, how they are made,
|
||
and how they work.
|
||
|
||
Most modern machine learning models are layered networks of parameters, each
|
||
representing its connection to its neighbouring parameters. In a modern
|
||
transformer-based language model most of these parameters are floating point
|
||
numbers—weights—that describe the connection. Positive numbers are an
|
||
excitatory connection. Negative numbers are inhibitory.
|
||
|
||
These models are built by feeding data through a tokeniser that breaks text
|
||
into tokens—often one word per token—that are ultimately fed into an algorithm.
|
||
That algorithm constructs the network, node by node, layer by layer, based on
|
||
the relationships it calculates between the tokens/words. This is done in
|
||
several runs and, usually, the developer of the model will evaluate after each
|
||
run that the model is progressing in the right direction, with some doing more
|
||
thorough evaluation at specific checkpoints.
|
||
|
||
The network is, in a very fundamental way, a mathematical derivation of the
|
||
language in the data.
|
||
|
||
A language model is constructed from the data. The transformer code regulates
|
||
and guides the process, but the distributions within the data set are what
|
||
defines the network.
|
||
|
||
This process takes time—both collecting and managing the data set and the build
|
||
process itself—which inevitably introduces a cut-off point for the data set.
|
||
For OpenAI and Anthropic, that cut-off point is in 2021. For Google’s PaLM2
|
||
it’s early 2023.
|
||
|
||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||
|
||
Aside: not a brain
|
||
|
||
This is very, very different from how a biological neural network interacts
|
||
with data. A biological brain is modified by input and data—its environment—but
|
||
its construction is derived from nutrition, its chemical environment, and
|
||
genetics.
|
||
|
||
The data set, conversely, is a deep and fundamental part of the language model.
|
||
The algorithm’s code provides the process while the weights themselves are
|
||
derived from the data, and the model itself is dead and static during input and
|
||
output.
|
||
|
||
The construction process of a neural network is called “training”, which is yet
|
||
another incredibly inaccurate term used by the industry.
|
||
|
||
• A pregnant mother isn’t “training” the fetus.
|
||
• A language model isn’t “trained” from the data, but constructed.
|
||
|
||
This is nonsense.
|
||
|
||
But this is the term that the AI industry uses, so we’re stuck with it.
|
||
|
||
A language model is a mathematical model built as a derivation of its training
|
||
data. There is no actual training, only construction.
|
||
|
||
This is also why it’s inaccurate to say that these systems are inspired by
|
||
their training data. Even though genes and nutrition make an artist’s mind they
|
||
are not in what any reasonable person would call “their inspiration”. Even when
|
||
they are sought out for study and genuine inspiration, it’s our representations
|
||
of our understanding of the genes that are the true source of inspiration.
|
||
Nobody sticks their hand in a gelatinous puddle of DNA and spontaneously gets
|
||
inspired by the data it encodes.
|
||
|
||
Training data are construction materials for a language models. A language
|
||
model can never be inspired. It is itself a cultural artefact derived from
|
||
other cultural artefacts.
|
||
|
||
The machine learning process is loosely based on decades-old grossly simplified
|
||
models of how brains work.
|
||
|
||
A biological neuron is a complex system in its own right—one of the more
|
||
complex cells in an animal’s body. In a living brain, a biological neuron will
|
||
use electricity, multiple different classes of neurotransmitters, and timing to
|
||
accomplish its function in ways that we still don’t fully understand. It even
|
||
has its own [19]built-in engine for chemical energy.
|
||
|
||
The brain as a whole is composed of not just a massive neural network, but also
|
||
layers of hormonal chemical networks that dynamically modify its function, both
|
||
granularly and as a whole.
|
||
|
||
The digital neuron—a single signed floating point number—is to a biological
|
||
neuron what a flat-head screwdriver is to a Tesla.
|
||
|
||
They both contain metal and that’s about the extent of their similarity.
|
||
|
||
The human brain contains roughly 100 billion neuron cells, a layered chemical
|
||
network, and a cerebrovascular system that all integrate as a whole to create a
|
||
functioning, self-aware system capable of general reasoning and autonomous
|
||
behaviour. This system is multiple orders of magnitude more complex than even
|
||
the largest language model to date, both in terms of individual neuron
|
||
structure, and taken as a whole.
|
||
|
||
It’s important to remember this so that we don’t fall for marketing claims that
|
||
constantly imply that these tools are fully functioning assistants.
|
||
|
||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||
|
||
The prompt
|
||
|
||
After all of this, we have a data set which can be used to generate text in
|
||
response to prompts.
|
||
|
||
Prompts such as:
|
||
|
||
Who was the first man on the moon?
|
||
|
||
The input phrase, or prompt, has no structure beyond the linguistic. It’s just
|
||
a blob of text. You can’t give the model commands or parameters separately from
|
||
other input. Because of this, if your model lets a third party enter text, an
|
||
attacker will always be able to bypass whatever restrictions you put on it.
|
||
Control prompts or prefixes will be discovered and countermanded. Delimiters
|
||
don’t work. Fine-tuning the model only limits the harm, but doesn’t prevent it.
|
||
|
||
This is called a prompt injection and what it means is that model input can’t
|
||
be secured. You have to assume that anybody that can send text to the model has
|
||
full access to it.
|
||
|
||
Language models need to be treated like an unsecured client and only very
|
||
carefully integrated into other systems.
|
||
|
||
The response
|
||
|
||
What you’re likely to get back from that prompt would be something like:
|
||
|
||
On July 20, 1969, Neil Armstrong became the first human to step on the
|
||
moon.
|
||
|
||
This is NASA’s own phrasing. Most answers on the web are likely to be
|
||
variations on this, so the answer from a language model is likely to be so too.
|
||
|
||
• The moon landing happens to be a fact, but the language model only knows it
|
||
as a text.
|
||
|
||
The prompt we provided is strongly associated in the training data set with
|
||
other sentences that are all variations of NASA’s phrasing of the answer. The
|
||
model won’t answer with just “Neil Armstrong” because it isn’t actually
|
||
answering the question, it’s responding with the text that correlates with the
|
||
question. It doesn’t “know” anything.
|
||
|
||
• The language model is fabricating a mathematically plausible response,
|
||
based on word distributions in the training data.
|
||
• There are no facts in a language model or its output. Only memorised text.
|
||
|
||
It only fabricates. It’s all “hallucinations” all the way down.
|
||
|
||
Occasionally those fabrications correlate with facts, but that is a
|
||
mathematical quirk resulting from the fact that, on average, what people write
|
||
roughly correlates with their understanding of a factual reality, which in turn
|
||
roughly correlates with a factual reality.
|
||
|
||
A knowledge system?
|
||
|
||
To be able to answer that question and pass as a knowledge system, the model
|
||
needs to memorise the answer, or at least parts of the phrase.
|
||
|
||
Because “AI” vendors are performing a sleight-of-hand here and presenting
|
||
statistical language synthesis engines as knowledge retrieval systems, their
|
||
focus in training and testing is on “facts” and minimising “falsehoods”. The
|
||
model has no notion of either, as it’s entirely a language model, so the only
|
||
way to square this circle is for the model to memorise it all.
|
||
|
||
• To be able to answer a question factually, not “hallucinate”, and pass as a
|
||
knowledge system, the model needs to memorise the answer.
|
||
• The model doesn’t know facts, only text.
|
||
• If you want a fact from it, the model will need to memorise text that
|
||
correlates with that fact.
|
||
|
||
“Dr. AI”?
|
||
|
||
Vendors then compound this by using human exams as benchmarks for reasoning
|
||
performance. The problem is that bar exams, medical exams, and diagnosis tests
|
||
are specifically designed to mostly test rote memorisation. That’s what they’re
|
||
for.
|
||
|
||
The human brain is bad at rote memorisation and generally it only happens with
|
||
intensive work and practice. If you want to design a test that’s specifically
|
||
intended to verify that somebody has spent a large amount of time studying a
|
||
subject, you test for rote memorisation.
|
||
|
||
Many other benchmarks they use, such as those related to programming languages
|
||
also require memorisation, otherwise the systems would just constantly make up
|
||
APIs.
|
||
|
||
• Vendors use human exams as benchmarks.
|
||
• These are specifically designed to test rote memorisation, because that’s
|
||
hard for humans.
|
||
• Programming benchmarks also require memorisation. Otherwise, you’d only get
|
||
pseudocode.
|
||
|
||
Between the tailoring of these systems for knowledge retrieval, and the use of
|
||
rote memorisation exams and code generation as benchmarks, the tech industry
|
||
has created systems where memorisation is a core part of how they function. In
|
||
all research to date, memorisation has been key to language model performance
|
||
in a range of benchmarks.^[20][1]
|
||
|
||
If you’re familiar with storytelling devices, this here would be a [21]
|
||
Chekhov’s gun. Observe! The gun is above the mantelpiece:
|
||
|
||
👉🏻👉🏻 memorisation!
|
||
|
||
Make a note of it, because those finger guns are going to be fired later.
|
||
|
||
Biases
|
||
|
||
Beyond question and answer, these systems are great at generating the averagely
|
||
plausible text for a given prompt. In prose, current system output smells
|
||
vaguely of sweaty-but-quiet LinkedIn desperation and over-enthusiastic social
|
||
media. The general style will vary, but it’s always going to be the most
|
||
plausible style and response based on the training data.
|
||
|
||
One consequence of how these systems are made is that they are constantly
|
||
backwards-facing. Where brains are focused on the present, often to their
|
||
detriment, “AI” models are built using historical data.
|
||
|
||
The training data encompasses thousands of diverse voices, styles, structures,
|
||
and tones, but some word distributions will be more common in the set than
|
||
others and those will end up dominating the output. As a result, language
|
||
models tend to lean towards the “racist grandpa who has learned to speak fluent
|
||
LinkedIn” end of the spectrum.^[22][2]
|
||
|
||
This has implications for a whole host of use cases:
|
||
|
||
• Generated text is going to skew conservative in content and marketing copy
|
||
in structure and vocabulary. (Bigoted, prejudiced, but polite and
|
||
inoffensively phrased.)
|
||
• Even when the cut-off date for the data set is recent, it’s still going to
|
||
skew historical because what’s new is also comparatively smaller than the
|
||
old.
|
||
• Language models will always skew towards the more common, middling,
|
||
mediocre, and predictable.
|
||
• Because most of these models are trained on the web, much of which is
|
||
unhinged, violent, pornographic, and abusive, some of that language will be
|
||
represented in the output.
|
||
|
||
Modify, summarise, and “reason”
|
||
|
||
The superpower that these systems provide is conversion or modification. They
|
||
can, generally, take text and convert it to another style or structure. Take
|
||
this note and turn it into a formal prose, and it will! That’s amazing. I don’t
|
||
think that’s a trillion-dollar industry, but it’s a neat feature that will
|
||
definitely be useful.
|
||
|
||
They can summarise text too, but that’s much less reliable than you’d expect.
|
||
It unsurprisingly works best with text that already provides its own summary,
|
||
such as a newspaper article (first paragraphs always summarise the story),
|
||
academic paper (the abstract), or corporate writing (executive summary).
|
||
Anything that’s a mix of styles, voices, or has an unusual structure won’t work
|
||
as well.
|
||
|
||
What little reasoning they do is entirely based on finding through correlation
|
||
and re-enacting prior textual descriptions of reasoning. They fail utterly when
|
||
confronted with adversarial or novel examples. They also fail if you rephrase
|
||
the question so that it no longer correlates with the phrasing in the data set.
|
||
^[23][3]
|
||
|
||
So, not actual reasoning. “Reasoning”, if you will. In other “AI” model genres
|
||
these correlations are often called “shortcuts”, which feels apt.
|
||
|
||
To summarise:
|
||
|
||
• Language models are a mathematical expression of the training data set.
|
||
• Have very little in common with human brains.
|
||
• Rely on inputs that can’t be secured.
|
||
• Lie. Everything they output is a fabrication.
|
||
• Memorise heavily.
|
||
• Great for modifying text. No sarcasm. Genuinely good at this.
|
||
• Occasionally useful for summarisation if you don’t mind being lied to
|
||
regularly.
|
||
• Don’t actually reason.
|
||
|
||
Why I believe “AI” for programming is a bad idea
|
||
|
||
If you recall from the start of this essay, I began my research into machine
|
||
learning and language models because I was curious to see if they could help
|
||
fix or improve the mess that is modern software development.
|
||
|
||
There was reason to be hopeful. Programming languages are more uniform and
|
||
structured than prose, so it’s not too unreasonable to expect that they might
|
||
lend themselves to language models. Programming language output can often be
|
||
tested directly, which might help with the evaluation of each training run.
|
||
|
||
Training a language model on code also seems to benefit the model. Models that
|
||
include substantial code in their data set tend to be better at correlative
|
||
“reasoning” (to a point, still not actual reasoning), which makes sense since
|
||
code is all about representing structured logic in text.
|
||
|
||
But, there is an inherent [24]Catch 22 to any attempt at fixing software
|
||
industry dysfunction with more software. The structure of the industry depends
|
||
entirely on variables that everybody pretends are proxies for end user value,
|
||
but generally aren’t. This will always tend to sabotage our efforts at
|
||
industrial self-improvement.
|
||
|
||
The more I studied language models as a technology the more flaws I found until
|
||
it became clear to me that odds are that the overall effect on software
|
||
development will be harmful. The problem starts with the models themselves.
|
||
|
||
1. Language models can’t be secured
|
||
|
||
This first issue has less to do with the use of language models for software
|
||
development and more to do with their use in software products, which is likely
|
||
to be a priority for many software companies over the next few years.
|
||
|
||
Prompt injections are not a solved problem. OpenAI has come up with a few
|
||
“solutions” in the past, but none of them actually worked. Everybody expects
|
||
this to be fixed, but nobody has a clue how.
|
||
|
||
Language models are fundamentally based on the idea that you give it text as
|
||
input and get text as output. It’s entirely possible that the only way to
|
||
completely fix this is to invent a completely new kind of language model and
|
||
spend a few years training it from scratch.
|
||
|
||
• A language model needs to be treated like an unsecured client. It’s about
|
||
as secure as a web page form. It’s vulnerable to a new generation of
|
||
injection vulnerabilities, both direct and indirect, that we still don’t
|
||
quite understand.^[25][4]
|
||
|
||
The training data set itself is also a security hazard. I’ve gone into this in
|
||
more detail elsewhere^[26][5], but the short version is that training data set
|
||
is vulnerable to keyword manipulation, both in terms of altering sentiment and
|
||
censorship.
|
||
|
||
Again, fully defending against this kind of attack would seem to require
|
||
inventing a completely new kind of language model.
|
||
|
||
Neither of these issues affect the use of language models for software
|
||
development, but it does affect our work because we’re the ones who will be
|
||
expected to integrate these systems into existing websites and products.
|
||
|
||
2. It encourages the worst of our management and development practices
|
||
|
||
A language model will never question, push back, doubt, hesitate, or waver.
|
||
|
||
Your managers are going to use it to flesh out and describe unworkable ideas,
|
||
and it won’t complain. The resulting spec won’t have any bearing with reality.
|
||
|
||
People on your team will do “user research” by asking a language model, which
|
||
it will do even though the resulting research will be fiction and entirely
|
||
useless.
|
||
|
||
It’ll let you implement the worst ideas ever in your code without protest. Ask
|
||
a copilot “how can I roll my own cryptography?” and it’ll regurgitate a
|
||
half-baked expression of sha1 in PHP for you.
|
||
|
||
Think of all the times you’ve had an idea for an approach, looked up how to do
|
||
it on the web, and found out that, no, this was a really bad idea? I have a
|
||
couple of those every week when I’m in the middle of a project.
|
||
|
||
Language models don’t deliver productivity improvements. They increase the
|
||
volume, unchecked by reason.
|
||
|
||
A core aspect of the theory-building model of software development is code that
|
||
developers don’t understand is a liability. It means your mental model of the
|
||
software is inaccurate which will lead you to create bugs as you modify it or
|
||
add other components that interact with pieces you don’t understand.
|
||
|
||
Language model tools for software development are specifically designed to
|
||
create large volumes of code that the programmer doesn’t understand. They are
|
||
liability engines for all but the most experienced developer. You can’t solve
|
||
this problem by having the “AI” understand the codebase and how its various
|
||
components interact with each other because a language model isn’t a mind. It
|
||
can’t have a mental model of anything. It only works through correlation.
|
||
|
||
These tools will indeed make you go faster, but it’s going to be accelerating
|
||
in the wrong direction. That is objectively worse than just standing still.
|
||
|
||
3. Its User Interfaces do not work, and we haven’t found interfaces that do
|
||
work
|
||
|
||
Human factors studies, the field responsible for designing cockpits and the
|
||
like, discovered that humans suffer from an automation bias.
|
||
|
||
What it means is that when you have cognitive automation—something that helps
|
||
you think less—you inevitably think less. That means that you are less critical
|
||
of the output than if you were doing it yourself. That’s potentially
|
||
catastrophic when the output is code, especially since the quality of the
|
||
generated code is, understandably considering how the system works, broadly on
|
||
the level of a novice developer.^[27][6]
|
||
|
||
Copilots and chatbots—exacerbated by anthropomorphism—seem to trigger our
|
||
automation biases.
|
||
|
||
Microsoft themselves have said that 40% of GitHub Copilot’s output is committed
|
||
unchanged.^[28][7]
|
||
|
||
Let’s not get into the question of how we, as an industry, put ourselves in the
|
||
position where Microsoft can follow a line of code from their language model,
|
||
through your text editor, and into your supposedly decentralised version
|
||
control system.
|
||
|
||
People overwhelmingly seem to trust the output of a language model.
|
||
|
||
If it runs without errors, it must be fine.
|
||
|
||
But that’s never the case. We all know this. We’ve all seen running code turn
|
||
out to be buggy as hell. But something in our mind switches off when we use
|
||
tools for cognitive automation.
|
||
|
||
4. It’s biased towards the stale and popular
|
||
|
||
The biases inherent in these language models are bad enough when it comes to
|
||
prose, but they become a functional problem in code.
|
||
|
||
• Its JS code will lean towards React and node, most of it several versions
|
||
old, and away from the less popular corners of the JS ecosystem.
|
||
• The code is, inevitably, more likely to be built around CommonJS modules
|
||
instead of the modern ESM modules.
|
||
• It won’t know much about Deno or Cloudflare Workers.
|
||
• It’ll always prefer older APIs over new. Most of these models won’t know
|
||
about any API or module released after 2021. This is going to be an issue
|
||
for languages such as Swift.
|
||
• New platforms and languages don’t exist to it.
|
||
• Existing data will outweigh deprecations and security issues.
|
||
• Popular but obsolete or outdated open source projects will always win out
|
||
over the up-to-date equivalent.
|
||
|
||
These systems live in the popular past, like the middle-aged man who doesn’t
|
||
realise he isn’t the popular kid at school any more. Everything he thinks is
|
||
cool is actually very much not cool. More the other thing.
|
||
|
||
This is an issue for software because our industry is entirely structured
|
||
around constant change. Software security hinges on it. All of our practices
|
||
are based on constant march towards the new and fancy. We go from framework to
|
||
framework to try and find the magic solution that will solve everything. In
|
||
some cases language models might help push back against that, but it’ll also
|
||
push back against all the very many changes that are necessary because the old
|
||
stuff turned out to be broken.
|
||
|
||
• The software industry is built on change.
|
||
• Language models are built on a static past.
|
||
|
||
5. No matter how the lawsuits go, this threatens the existence of free and open
|
||
source software
|
||
|
||
Many AI vendors are mired in lawsuits.^[29][8]
|
||
|
||
These lawsuits all concentrate on the relationship between the training data
|
||
set and the model and they do so from a variety of angles. Some are based on
|
||
contract and licensing law. Others are claiming that the models violate fair
|
||
use. It’s hard to predict how they will go. They might not all go the same way,
|
||
as laws will vary across industries and jurisdictions.
|
||
|
||
No matter the result, we’re likely to be facing a major decline in the free and
|
||
open source ecosystem.
|
||
|
||
1. All of these models are trained on open source code without payment or even
|
||
acknowledgement, which is a major disincentive for contributors and
|
||
maintainers. That large corporations might benefit from your code is a
|
||
fixture of open source, but they do occasionally give back to the
|
||
community.
|
||
2. Language models—built on open source code—commonly replace that code.
|
||
Instead of importing a module to do a thing, you prompt your Copilot. The
|
||
code generated is almost certainly based on the open source module, at
|
||
least partially, but it has been laundered through the language model,
|
||
disconnecting the programmer from the community, recognition, and what
|
||
little reward there was.
|
||
|
||
Language models demotivate maintainers and drain away both resources and users.
|
||
What you’re likely to be left with are those who are building core
|
||
infrastructure or end-user software out of principle. The “free software” side
|
||
of the community is more likely to survive than the rest. The Linux kernel,
|
||
Gnome, KDE—that sort of thing.
|
||
|
||
The “open source” ecosystem, especially that surrounding the web and node, is
|
||
likely to be hit the hardest. The more driven the open source project was by
|
||
its proximity to either an employed contributor or actively dependent business,
|
||
the bigger the impact from a shift to language models will be.
|
||
|
||
This is a serious problem for the software industry as arguably much of the
|
||
economic value the industry has provided over the past decade comes from
|
||
strip-mining open source and free software.
|
||
|
||
6. Licence contamination
|
||
|
||
Microsoft and Google don’t train their language models on their own code.
|
||
GitHub’s Copilot isn’t trained on code from Microsoft’s office suite, even
|
||
though many of its products are likely to be some of the largest React Native
|
||
projects in existence. There aren’t many C++ code bases as big as Windows.
|
||
Google’s repository is probably one of the biggest collection of python and
|
||
java code you can find.
|
||
|
||
They don’t seem to use it for training, but instead train on collections of
|
||
open source code that contain both permissive and copyleft licences.
|
||
|
||
Copyleft licences, if used, force you to release your own project under their
|
||
licence. Many of them, even non-copyleft, have patent clauses, which is poison
|
||
for quite a few employers. Even permissive licences require attribution, and
|
||
you can absolutely get sued if you’re caught copying open source code without
|
||
attribution.
|
||
|
||
Remember our Chekhov’s gun?
|
||
|
||
👉🏻👉🏻 memorisation!
|
||
|
||
Well, 👉🏻👉🏻 pewpew!!!
|
||
|
||
Turns out blindly copying open source code is problematic. Whodathunkit?
|
||
|
||
These models all memorise a lot, and they tend to copy what they memorise into
|
||
their output. [30]GitHub’s own numbers peg verbatim copies of code that’s at
|
||
least 150 characters at 1%^[31][9], which is roughly the same, in terms of
|
||
verbatim copying, as what you seem to get in other language models.
|
||
|
||
For context, that means that if you use a language model for development, a
|
||
copilot or chatbot, three or four times a day, you’re going to get a verbatim
|
||
copy of open source code injected into your project about once a month. If
|
||
every team member uses one, then multiply that by the size of the team.
|
||
|
||
GitHub’s Copilot has a feature that lets you block verbatim copies. This
|
||
obviously requires both a check, which slows the result down, and it will throw
|
||
out a bunch of useful results, making the language model less useful. It’s
|
||
already not as useful as it’s made out to be and pretty darn slow so many
|
||
people are going to turn off the “please don’t plagiarise” checkbox.
|
||
|
||
But even GitHub’s checks are insufficient. The keyword there is “verbatim”,
|
||
because language models have a tendency to rephrase their output. If GitHub
|
||
Copilot copies a GPLed implementation of an algorithm into your project but
|
||
changes all the variable names, Copilot won’t detect it, it’ll still be
|
||
plagiarism and the copied code is still under the GPL. This isn’t unlikely as
|
||
this is how language models work. Memorisation and then copying with light
|
||
rephrasing is what they do.
|
||
|
||
Training the system only on permissively licensed code doesn’t solve the
|
||
problem. It won’t force your project to adopt an MIT licence or anything like
|
||
that, but you can still be sued if it’s discovered.
|
||
|
||
This would seem to give Microsoft and GitHub a good reason not to train on the
|
||
Office code base, for example. If they did, there’s a good chance that a prompt
|
||
to generate DOCX parsing code might “generate” a verbatim copy of the DOCX
|
||
parsing code from Microsoft Word.
|
||
|
||
And they can’t have that, can they? This would both undercut their own
|
||
strategic advantage, and it would break the illusion that these systems are
|
||
generating novel code from scratch.
|
||
|
||
This should make it clear that what they’re actually doing is strip-mine the
|
||
free and open source software ecosystem.
|
||
|
||
How much of a problem is this?
|
||
|
||
—It won’t matter. I won’t get caught.
|
||
|
||
You personally won’t get caught, but your employer might, and Intellectual
|
||
Property scans or similar code audits tend to come up at the absolute worst
|
||
moments in the history of any given organisation:
|
||
|
||
• During due diligence for an acquisition. Could cost the company and
|
||
managers a fortune.
|
||
• In discovery for an unrelated lawsuit. Again, could cost the company a
|
||
fortune.
|
||
• During hacks and other security incidents. Could. Cost. A. Fortune.
|
||
|
||
“AI” vendors won’t take any responsibility for this risk. I doubt your business
|
||
insurance covers “automated language model plagiarism” lawsuits.
|
||
|
||
Language models for software development are a lawsuit waiting to happen.
|
||
|
||
Unless they are completely reinvented from scratch, language model code
|
||
generators are, in my opinion, unsuitable for anything except for prototypes
|
||
and throwaway projects.
|
||
|
||
So, obviously, everybody’s going to use them
|
||
|
||
• All the potentially bad stuff happens later. Unlikely to affect your
|
||
bonuses or employment.
|
||
• It’ll be years before the first licence contamination lawsuits happen.
|
||
• Most employees will be long gone before anybody realises just how much of a
|
||
bad idea it was.
|
||
• But you’ll still get that nice “AI” bump in the stock market.
|
||
|
||
What all of these problems have in common is that their impact is delayed and
|
||
most of them will only appear in the form of increased frequency of bugs and
|
||
other defects and general project chaos.
|
||
|
||
The biggest issue, licence contamination, will likely take years before it
|
||
starts to hit the industry, and is likely to be mitigated by virtue of the fact
|
||
that many of the heaviest users of “AI”-generated code will have folded due to
|
||
general mismanagement long before anybody cares enough to check their code.
|
||
|
||
If you were ever wondering if we, as an industry, were capable of coming up
|
||
with a systemic issue to rival the Y2K bug in scale and stupidity? Well, here
|
||
you go.
|
||
|
||
You can start using a language model, get the stock market bump, present the
|
||
short term increase in volume as productivity, and be long gone before anybody
|
||
connects the dots between language model use and the jump in defects.
|
||
|
||
Even if you purposefully tried to come up with a technology that played
|
||
directly into and magnified the software industry’s dysfunctions you wouldn’t
|
||
be able to come up with anything as perfectly imperfect as these language
|
||
models.
|
||
|
||
It’s nonsense without consequence.
|
||
|
||
Counterproductive novelty that you can indulge in without harming your career.
|
||
|
||
It might even do your career some good. Show that you’re embracing the future.
|
||
|
||
But…
|
||
|
||
The best is yet to come
|
||
|
||
In a few years’ time, once the effects of the “AI” bubble finally dissipates…
|
||
|
||
Somebody’s going to get paid to fix the crap it left behind.
|
||
|
||
The best way to support this newsletter or my blog is to buy one of my books,
|
||
[32]The Intelligence Illusion: a practical guide to the business risks of
|
||
Generative AI or [33]Out of the Software Crisis. Or, you can buy them both [34]
|
||
as a bundle.
|
||
|
||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||
|
||
1. There’s quite a bit of papers that either highlight the tendency to
|
||
memorise or demonstrate a strong relationship between that tendency and
|
||
eventual performance.
|
||
|
||
□ [35]An Empirical Study of Memorization in NLP (Zheng & Jiang, ACL 2022)
|
||
□ [36]Does learning require memorization? a short tale about a long tail.
|
||
(Feldman, 2020)
|
||
□ [37]When is memorization of irrelevant training data necessary for
|
||
high-accuracy learning? (Brown et al. 2021)
|
||
□ [38]What Neural Networks Memorize and Why: Discovering the Long Tail
|
||
via Influence Estimation (Feldman & Zhang, 2020)
|
||
□ [39]Question and Answer Test-Train Overlap in Open-Domain Question
|
||
Answering Datasets (Lewis et al., EACL 2021)
|
||
□ [40]Quantifying Memorization Across Neural Language Models (Carlini et
|
||
al. 2022)
|
||
□ [41]On Training Sample Memorization: Lessons from Benchmarking
|
||
Generative Modeling with a Large-scale Competition (Bai et al. 2021)
|
||
[42]↩︎
|
||
2. See the [43]Bias & Safety card at [44]needtoknow.fyi for references. [45]↩︎
|
||
|
||
3. See the [46]Shortcut “Reasoning” card at [47]needtoknow.fyi for references.
|
||
[48]↩︎
|
||
|
||
4. Simon Willison has been covering this issue [49]in a series of blog posts.
|
||
[50]↩︎
|
||
|
||
5.
|
||
□ [51]The poisoning of ChatGPT
|
||
□ [52]Google Bard is a glorious reinvention of black-hat SEO spam and
|
||
keyword-stuffing
|
||
[53]↩︎
|
||
6. See, for example:
|
||
|
||
□ [54]Asleep at the Keyboard? Assessing the Security of GitHub Copilot’s
|
||
Code Contributions (Hammond Pearce et al., December 2021)
|
||
□ [55]Do Users Write More Insecure Code with AI Assistants? (Neil Perry
|
||
et al., December 2022)
|
||
[56]↩︎
|
||
7. This came out [57]during an investor event and was presented as evidence of
|
||
the high quality of Copilot’s output. [58]↩︎
|
||
|
||
8.
|
||
□ [59]Getty Images v. Stability AI - Complaint
|
||
□ [60]Getty Images is suing the creators of AI art tool Stable Diffusion
|
||
for scraping its content
|
||
□ [61]The Wave of AI Lawsuits Have Begun
|
||
□ [62]Copyright lawsuits pose a serious threat to generative AI
|
||
□ [63]GitHub Copilot litigation
|
||
□ [64]Stable Diffusion litigation
|
||
[65]↩︎
|
||
9. Archived link of the [66]GitHub Copilot feature page. [67]↩︎
|
||
|
||
Join the Newsletter
|
||
|
||
Subscribe to the Out of the Software Crisis newsletter to get my weekly (at
|
||
least) essays on how to avoid or get out of software development crises.
|
||
|
||
Join now and get a free PDF of three bonus essays from Out of the Software
|
||
Crisis.
|
||
|
||
[68][ ]
|
||
Subscribe
|
||
|
||
We respect your privacy.
|
||
|
||
Unsubscribe at any time.
|
||
|
||
[70]Mastodon [71]Twitter [72]GitHub [73]Feed
|
||
|
||
References:
|
||
|
||
[1] https://softwarecrisis.dev/
|
||
[2] https://softwarecrisis.dev/
|
||
[3] https://softwarecrisis.baldurbjarnason.com/
|
||
[4] https://illusion.baldurbjarnason.com/
|
||
[5] https://softwarecrisis.dev/archive/
|
||
[6] https://softwarecrisis.dev/author/
|
||
[7] https://www.hakkavelin.is/
|
||
[8] https://illusion.baldurbjarnason.com/
|
||
[9] https://softwarecrisis.baldurbjarnason.com/
|
||
[10] https://baldurbjarnason.lemonsqueezy.com/checkout/buy/cfc2f2c6-34af-436f-91c1-cb2e47283c40
|
||
[11] https://www.baldurbjarnason.com/2021/software-crisis-2/
|
||
[12] https://standishgroup.com/sample_research_files/CHAOSReport2015-Final.pdf
|
||
[13] https://softwarecrisis.baldurbjarnason.com/
|
||
[14] https://quoteinvestigator.com/2019/09/19/woodpecker/
|
||
[15] http://worrydream.com/refs/Brooks-NoSilverBullet.pdf
|
||
[16] https://illusion.baldurbjarnason.com/
|
||
[17] https://www.baldurbjarnason.com/2022/theory-building/
|
||
[18] https://softwarecrisis.baldurbjarnason.com/
|
||
[19] https://en.wikipedia.org/wiki/Mitochondrion
|
||
[20] https://softwarecrisis.dev/letters/ai-and-software-quality/#fn1
|
||
[21] https://en.wikipedia.org/wiki/Chekhov's_gun
|
||
[22] https://softwarecrisis.dev/letters/ai-and-software-quality/#fn2
|
||
[23] https://softwarecrisis.dev/letters/ai-and-software-quality/#fn3
|
||
[24] https://en.wikipedia.org/wiki/Catch-22_(logic)
|
||
[25] https://softwarecrisis.dev/letters/ai-and-software-quality/#fn4
|
||
[26] https://softwarecrisis.dev/letters/ai-and-software-quality/#fn5
|
||
[27] https://softwarecrisis.dev/letters/ai-and-software-quality/#fn6
|
||
[28] https://softwarecrisis.dev/letters/ai-and-software-quality/#fn7
|
||
[29] https://softwarecrisis.dev/letters/ai-and-software-quality/#fn8
|
||
[30] https://archive.ph/2023.01.11-224507/https://github.com/features/copilot#selection-19063.298-19063.462:~:text=Our%20latest%20internal%20research%20shows%20that%20about%201%25%20of%20the%20time%2C%20a%20suggestion%20may%20contain%20some%20code%20snippets%20longer%20than%20~150%20characters%20that%20matches%20the%20training%20set.
|
||
[31] https://softwarecrisis.dev/letters/ai-and-software-quality/#fn9
|
||
[32] https://illusion.baldurbjarnason.com/
|
||
[33] https://softwarecrisis.baldurbjarnason.com/
|
||
[34] https://baldurbjarnason.lemonsqueezy.com/checkout/buy/cfc2f2c6-34af-436f-91c1-cb2e47283c40
|
||
[35] https://aclanthology.org/2022.acl-long.434
|
||
[36] https://doi.org/10.1145/3357713.3384290
|
||
[37] https://doi.org/10.1145/3406325.3451131
|
||
[38] https://papers.nips.cc/paper/2020/hash/1e14bfe2714193e7af5abc64ecbd6b46-Abstract.html
|
||
[39] https://aclanthology.org/2021.eacl-main.86
|
||
[40] https://arxiv.org/abs/2202.07646
|
||
[41] https://dl.acm.org/doi/10.1145/3447548.3467198
|
||
[42] https://softwarecrisis.dev/letters/ai-and-software-quality/#fnref1
|
||
[43] https://needtoknow.fyi/card/bias/
|
||
[44] https://needtoknow.fyi/
|
||
[45] https://softwarecrisis.dev/letters/ai-and-software-quality/#fnref2
|
||
[46] https://needtoknow.fyi/card/shortcut-reasoning/
|
||
[47] https://needtoknow.fyi/
|
||
[48] https://softwarecrisis.dev/letters/ai-and-software-quality/#fnref3
|
||
[49] https://simonwillison.net/series/prompt-injection/
|
||
[50] https://softwarecrisis.dev/letters/ai-and-software-quality/#fnref4
|
||
[51] https://softwarecrisis.dev/letters/the-poisoning-of-chatgpt/
|
||
[52] https://softwarecrisis.dev/letters/google-bard-seo/
|
||
[53] https://softwarecrisis.dev/letters/ai-and-software-quality/#fnref5
|
||
[54] https://doi.org/10.48550/arXiv.2108.09293
|
||
[55] https://doi.org/10.48550/arXiv.2211.03622
|
||
[56] https://softwarecrisis.dev/letters/ai-and-software-quality/#fnref6
|
||
[57] https://www.microsoft.com/en-us/Investor/events/FY-2023/Morgan-Stanley-TMT-Conference#:~:text=Scott%20Guthrie%3A%20I%20think%20you%27re,is%20now%20AI%2Dgenerated%20and%20unmodified
|
||
[58] https://softwarecrisis.dev/letters/ai-and-software-quality/#fnref7
|
||
[59] https://copyrightlately.com/pdfviewer/getty-images-v-stability-ai-complaint/?auto_viewer=true#page=&zoom=auto&pagemode=none
|
||
[60] https://www.theverge.com/2023/1/17/23558516/ai-art-copyright-stable-diffusion-getty-images-lawsuit
|
||
[61] https://www.plagiarismtoday.com/2023/01/17/the-wave-of-ai-lawsuits-have-begun/
|
||
[62] https://www.understandingai.org/p/copyright-lawsuits-pose-a-serious
|
||
[63] https://githubcopilotlitigation.com/
|
||
[64] https://stablediffusionlitigation.com/
|
||
[65] https://softwarecrisis.dev/letters/ai-and-software-quality/#fnref8
|
||
[66] https://archive.ph/2023.01.11-224507/https://github.com/features/copilot#selection-19063.298-19063.462:~:text=Our%20latest%20internal%20research%20shows%20that%20about%201%25%20of%20the%20time%2C%20a%20suggestion%20may%20contain%20some%20code%20snippets%20longer%20than%20~150%20characters%20that%20matches%20the%20training%20set.
|
||
[67] https://softwarecrisis.dev/letters/ai-and-software-quality/#fnref9
|
||
[70] https://toot.cafe/@baldur
|
||
[71] https://twitter.com/fakebaldur
|
||
[72] https://github.com/baldurbjarnason
|
||
[73] https://softwarecrisis.dev/feed.xml
|