Making Software Last Forever
This commit is contained in:
@@ -47,6 +47,10 @@ references:
|
|||||||
url: https://aaronmbushnell.com/keep-your-stack-short/
|
url: https://aaronmbushnell.com/keep-your-stack-short/
|
||||||
date: 2023-10-25T14:21:17Z
|
date: 2023-10-25T14:21:17Z
|
||||||
file: aaronmbushnell-com-x5e6nn.txt
|
file: aaronmbushnell-com-x5e6nn.txt
|
||||||
|
- title: "Dan Stroot · Blog · Making Software Last Forever"
|
||||||
|
url: https://www.danstroot.com/posts/2023-05-25-making_software_last_forever
|
||||||
|
date: 2023-11-13T04:44:27Z
|
||||||
|
file: www-danstroot-com-wwjfi6.txt
|
||||||
---
|
---
|
||||||
|
|
||||||
### Thoughts on priorities in software development
|
### Thoughts on priorities in software development
|
||||||
@@ -71,6 +75,7 @@ references:
|
|||||||
* [There's still no silver bullet][9]
|
* [There's still no silver bullet][9]
|
||||||
* [No one actually wants simplicity][10]
|
* [No one actually wants simplicity][10]
|
||||||
* [Keep your stack short][11]
|
* [Keep your stack short][11]
|
||||||
|
* [Making Software Last Forever][12]
|
||||||
|
|
||||||
[3]: https://grugbrain.dev/
|
[3]: https://grugbrain.dev/
|
||||||
[4]: https://world.hey.com/dhh/even-amazon-can-t-make-sense-of-serverless-or-microservices-59625580
|
[4]: https://world.hey.com/dhh/even-amazon-can-t-make-sense-of-serverless-or-microservices-59625580
|
||||||
@@ -81,3 +86,4 @@ references:
|
|||||||
[9]: https://changelog.com/posts/still-no-silver-bullet
|
[9]: https://changelog.com/posts/still-no-silver-bullet
|
||||||
[10]: https://lukeplant.me.uk/blog/posts/no-one-actually-wants-simplicity/
|
[10]: https://lukeplant.me.uk/blog/posts/no-one-actually-wants-simplicity/
|
||||||
[11]: https://aaronmbushnell.com/keep-your-stack-short/
|
[11]: https://aaronmbushnell.com/keep-your-stack-short/
|
||||||
|
[12]: https://www.danstroot.com/posts/2023-05-25-making_software_last_forever
|
||||||
|
|||||||
716
static/archive/www-danstroot-com-wwjfi6.txt
Normal file
716
static/archive/www-danstroot-com-wwjfi6.txt
Normal file
@@ -0,0 +1,716 @@
|
|||||||
|
#[1]Blog Posts RSS
|
||||||
|
|
||||||
|
IFRAME: [2]https://www.googletagmanager.com/ns.html?id=GTM-55JC288
|
||||||
|
|
||||||
|
[3]Dan Stroot
|
||||||
|
[4]Home[5]About[6]Archive[7]Snippets[8]Uses
|
||||||
|
|
||||||
|
(BUTTON) Toggle Menu
|
||||||
|
|
||||||
|
Making Software Last Forever
|
||||||
|
|
||||||
|
Hero image for Making Software Last Forever
|
||||||
|
27 min read
|
||||||
|
Dan Stroot
|
||||||
|
Dan Stroot
|
||||||
|
May 25, 2023
|
||||||
|
|
||||||
|
How many of us have bought a new home because our prior home was not
|
||||||
|
quite meeting our needs? Maybe we needed an extra bedroom, or wanted a
|
||||||
|
bigger backyard? Now, as a thought experiment, assume you couldn't sell
|
||||||
|
your existing home. If you bought a new home, you'd have to "retire" or
|
||||||
|
"decommission" your prior home (and your investment in it). Does that
|
||||||
|
change your thinking?
|
||||||
|
|
||||||
|
Further, imagine you had a team of five people maintaining your prior
|
||||||
|
home, improving it, and keeping it updated, for the last ten years.
|
||||||
|
You'd have a cumulative investment of 50 person/years in your existing
|
||||||
|
home (5 people x 10 years) just in maintenance, on top of the initial
|
||||||
|
investment. If each person was paid the equivalent of a software
|
||||||
|
developer (we'll use $200k to include benefits, office space,
|
||||||
|
leadership, etc.) you'd have an investment just in labor of $10 million
|
||||||
|
dollars (50 person/years x $200,000). Would you walk away from that
|
||||||
|
investment?
|
||||||
|
|
||||||
|
When companies decide to re-write or replace an existing software
|
||||||
|
application, they are making a similar decision. Existing software is
|
||||||
|
"retired" or "decommissioned" (along with its cumulative investment).
|
||||||
|
Yet the belief that new code is always better than old is patently
|
||||||
|
absurd. Old code has weathered and withstood the test of time. It has
|
||||||
|
been battle-tested. You know it's failure modes. Bugs have been found,
|
||||||
|
and more importantly, fixed.
|
||||||
|
|
||||||
|
Joel Spolsky (of Fog Creek Software and Stack Overflow) describes
|
||||||
|
system re-writes in "[9]Things You Should Never Do, Part I" as “the
|
||||||
|
single worst strategic mistake that any software company can make.”
|
||||||
|
|
||||||
|
Continuing our home analogy, recent price increases for construction
|
||||||
|
materials like lumber, drywall, and wiring (and frankly everything
|
||||||
|
else) should, according to Economics 101, cause us to treat our current
|
||||||
|
homes more dearly. Similarly, price increases for quality software
|
||||||
|
engineers should force companies to treat existing software more
|
||||||
|
dearly.
|
||||||
|
|
||||||
|
Lots of current software started out as C software from the 1980s.
|
||||||
|
Engineers don't often write software with portability as a goal at the
|
||||||
|
beginning, but once something is relatively portable, it tends to stay
|
||||||
|
that way. Code that was well designed and written often migrated from
|
||||||
|
mini-computers to i386, from i386 to amd64, and now ARM and arch64,
|
||||||
|
with a minimum of redesign or effort. You can take large, complicated
|
||||||
|
programs from the 1980s written in C, and compile/run them on a modern
|
||||||
|
Linux computer - even when the modern computer is running architectures
|
||||||
|
which hadn't even been dreamt of when the software was originally
|
||||||
|
written.
|
||||||
|
|
||||||
|
Why can't software last forever? It's not made of wood, concrete, or
|
||||||
|
steel. It doesn't "wear out", rot, weather, or rust. A working
|
||||||
|
algorithm is a working algorithm. Technology doesn’t need to be
|
||||||
|
beautiful, or impress other people, to be effective. Aren't
|
||||||
|
technologists ultimately in the business of producing cost effective
|
||||||
|
technology?
|
||||||
|
|
||||||
|
I am going to attempt to convince you that maintaining your existing
|
||||||
|
systems is one the most cost-effective technology investments you can
|
||||||
|
make.
|
||||||
|
|
||||||
|
The World's Oldest Software Systems
|
||||||
|
|
||||||
|
In 1958, the United States Department of Defense launched a new
|
||||||
|
computer-based contract management system called "Mechanization of
|
||||||
|
Contract Administration Services", or MOCAS (pronounced “MOH-cass”). In
|
||||||
|
2015, [10]MIT Technology Review stated that MOCAS was the oldest
|
||||||
|
computer program in continuous use they could verify. At that time
|
||||||
|
MOCAS managed about $1.3 trillion in government obligations and 340,000
|
||||||
|
contracts.
|
||||||
|
|
||||||
|
According to the [11]Guinness Book of World Records, the oldest
|
||||||
|
software system in use today is either the [12]SABRE Airline
|
||||||
|
Reservation System (introduced in 1960), or the IRS Individual Master
|
||||||
|
File (IMF) and Business Master File (BMF) systems introduced in
|
||||||
|
1962–63.
|
||||||
|
|
||||||
|
SABRE went online in 1960. It had cost $40 million to develop and
|
||||||
|
install (about $400 million in 2022 dollars). The system took over all
|
||||||
|
American Airlines booking functions in 1964, and the system was
|
||||||
|
expanded to provide access to external travel agents in 1976.
|
||||||
|
|
||||||
|
What is the secret to the long lifespan of these systems? Shouldn't
|
||||||
|
companies with long-lived products (annuities, life insurance, etc.)
|
||||||
|
study these examples? After all, they need systems to support products
|
||||||
|
that last most of a human lifespan. However, shouldn't all companies
|
||||||
|
want to their investments in software to last as long as possible?
|
||||||
|
|
||||||
|
Maintenance is About Making Something Last
|
||||||
|
|
||||||
|
We spoke of SABRE above, and we know that airlines recognize the value
|
||||||
|
of maintenance. Commercial aircraft are inspected at least once every
|
||||||
|
two days. Engines, hydraulics, environmental, and electrical systems
|
||||||
|
all have additional maintenance schedules. A "heavy" maintenance
|
||||||
|
inspection occurs once every few years. This process maintains the
|
||||||
|
aircraft's service life over decades.
|
||||||
|
|
||||||
|
On average, an aircraft is operable for about 30 years before it must
|
||||||
|
be retired. A Boeing 747 can endure 35,000 pressurization cycles —
|
||||||
|
roughly 135,000 to 165,000 flight hours — before metal fatigue sets in.
|
||||||
|
However, most older airframes are retired for fuel-efficiency reasons,
|
||||||
|
not because they're worn out.
|
||||||
|
|
||||||
|
Even stuctures made of grass can last indefinitely. [13]Inca rope
|
||||||
|
bridges were simple suspension bridges constructed by the Inca Empire.
|
||||||
|
The bridges were an integral part of the Inca road system were
|
||||||
|
constructed using ichu grass.
|
||||||
|
|
||||||
|
Inca Rope Bridge
|
||||||
|
|
||||||
|
Even though they were made of grass, these bridges were maintained with
|
||||||
|
such regularity and attention they lasted centuries. The bridge's
|
||||||
|
strength and reliability came from the fact that each cable was
|
||||||
|
replaced every June.
|
||||||
|
|
||||||
|
The goal of maintenance is catching problems before they happen. That’s
|
||||||
|
the difference between maintenance and repair. Repair is about fixing
|
||||||
|
something that’s already broken. Maintenance is about making something
|
||||||
|
last.
|
||||||
|
|
||||||
|
Unfortunately, Maintenance is Chronically Undervalued
|
||||||
|
|
||||||
|
Maintenance is one of the easiest things to cut when budgets get tight.
|
||||||
|
Some legacy software systems have decades of underinvestment in
|
||||||
|
maintenance. This leads up to the inevitable "we have to replace it"
|
||||||
|
discussion - which somehow always sounds more persuasive (even though
|
||||||
|
it’s more expensive and riskier) than arguing to invest in system
|
||||||
|
rehabilitation and deferred system maintenance.
|
||||||
|
|
||||||
|
Executives generally can't refuse "repair" work because the system is
|
||||||
|
broken and must be fixed. However, maintenance is a tougher sell. It’s
|
||||||
|
not strictly necessary — or at least it doesn’t seem to be until things
|
||||||
|
start falling apart. It is so easy to divert maintenance budget into a
|
||||||
|
halo project that gets an executive noticed (and possibly promoted)
|
||||||
|
before the long-term effects of underinvestment in maintenance become
|
||||||
|
visible. Even worse, the executive is also admired for reducing the
|
||||||
|
costs of maintenance and switching costs from "run" to "grow" - while
|
||||||
|
they are torpedoing the company under the waterline.
|
||||||
|
|
||||||
|
The other challenge is conflating enhancement work with maintenance
|
||||||
|
work. Imagine you have $1,000 and you want to add a sunroof to your
|
||||||
|
car, but you also need new tires (which coincidentally also cost
|
||||||
|
$1,000). You have to replace the tires every so often, but a sunroof is
|
||||||
|
"forever" right? If you spend the money on the sunroof the tires could
|
||||||
|
get replaced next month, or maybe the month after - they'll last a
|
||||||
|
couple more months, won't they?
|
||||||
|
|
||||||
|
With software, users can't see "the bald tires" - they only thing they
|
||||||
|
see, or experience (and value), are new features and capabilities.
|
||||||
|
Pressure is always present to cut costs and to add new features. The
|
||||||
|
result is budget always swings away from maintenance work towards
|
||||||
|
enhancements.
|
||||||
|
|
||||||
|
Finally, maintenance work is typically an operational cost, yet
|
||||||
|
building a new system, or a significant new feature, can often be
|
||||||
|
capitalized - making the future costs someone else's problem.
|
||||||
|
|
||||||
|
Risks of Replacing Software Systems
|
||||||
|
|
||||||
|
It's usually not the design or the age of a system that causes it to
|
||||||
|
fail but rather neglect. People fail to maintain software systems
|
||||||
|
because they are not given the time, incentives, or resources to
|
||||||
|
maintain them.
|
||||||
|
|
||||||
|
"Most of the systems I work on rescuing are not badly built. They
|
||||||
|
are badly maintained."
|
||||||
|
— Marianne Bellotti, Kill it With Fire
|
||||||
|
|
||||||
|
Once a system degrades it is an enormous challenge to fund deferred
|
||||||
|
maintenance (or "technical debt"). No one plans for it, no one wants to
|
||||||
|
pay for it, and no engineer wants to do it. Initiatives to restore
|
||||||
|
operational excellence, much the way one would fix up an old house,
|
||||||
|
tend to have few volunteers among engineering teams. No one gets
|
||||||
|
noticed doing maintenance. No one ever gets promoted because of
|
||||||
|
maintenance.
|
||||||
|
|
||||||
|
It should be clear why engineers prefer to re-write a system rather
|
||||||
|
than maintain it. They get to "write a new story" rather than edit
|
||||||
|
someone else's. They will attempt to convince a senior executive to
|
||||||
|
fund a project to replace a problematic system by describing all the
|
||||||
|
new features and capabilities that could be added as well as how "bad"
|
||||||
|
the existing, unmaintained, system has become. Further, they will get
|
||||||
|
to use modern technology that makes them much more valuable in the
|
||||||
|
market.
|
||||||
|
|
||||||
|
Incentives aside, engineering teams tend to gravitate toward system
|
||||||
|
rewrites because they incorrectly think of old systems as specs. They
|
||||||
|
assume that since an old system works, the functional risks have been
|
||||||
|
eliminated. They can focus on adding more features to the new system or
|
||||||
|
make changes to the underlying architecture without worry. Either they
|
||||||
|
do not perceive the ambiguity these changes introduce, or they see such
|
||||||
|
ambiguity positively, imagining only gains in performance and the
|
||||||
|
potential for innovation.
|
||||||
|
|
||||||
|
Why not authorize that multimillion-dollar replacement if the engineers
|
||||||
|
convince management the existing system is doomed? Eventually a
|
||||||
|
"replacement" project will be funded (typically at a much higher
|
||||||
|
expenditure than rehabilitating the existing system). Even if the
|
||||||
|
executives are not listening to the engineers, they will be listening
|
||||||
|
to external consultants telling them they are falling behind.
|
||||||
|
|
||||||
|
What do you do with the old system while you’re building the new one?
|
||||||
|
Most organizations put the old system on “life support” and give it
|
||||||
|
only the resources for patches and fixes necessary to keep it running.
|
||||||
|
This reduces maintenance even further and becomes a self-fulfilling
|
||||||
|
prophecy that the existing system will eventually fail.
|
||||||
|
|
||||||
|
Who gets to work on the new system, and who takes on the maintenance
|
||||||
|
tasks of the old system? If the old system is written in older
|
||||||
|
technology that the company is actively abandoning, the team
|
||||||
|
maintaining the old system is essentially sitting around waiting to be
|
||||||
|
fired. And don’t kid yourself, they know it. If the people maintaining
|
||||||
|
the old system are not participating in the creation of the new system,
|
||||||
|
you should expect that they are also looking for new jobs. If they
|
||||||
|
leave before your new system is operational, you lose both their
|
||||||
|
expertise and their institutional knowledge.
|
||||||
|
|
||||||
|
If the new project falls behind schedule (and it almost certainly
|
||||||
|
will), the existing system continues to degrade, and knowledge
|
||||||
|
continues to walk out the door. If the new project fails and is
|
||||||
|
subsequently canceled, the gap between the legacy system and
|
||||||
|
operational excellence has widened significantly in the meantime.
|
||||||
|
|
||||||
|
This explains why executives are loathe to cancel system replacement
|
||||||
|
projects even when they are obviously years behind schedule and failing
|
||||||
|
to live up to expectations. Stopping the replacement project seems
|
||||||
|
impossible because the legacy system is now so degraded that restoring
|
||||||
|
it to operational excellence seems impossible. Plus, politically
|
||||||
|
canceling a marquee project can be career suicide for the sponsoring
|
||||||
|
executive(s). Much better to do "deep dives" and "assessments" on why
|
||||||
|
the project is failing and soldier on than cancel it.
|
||||||
|
|
||||||
|
The interim state is not pretty. The company now has two systems to
|
||||||
|
operate, much higher costs and new risks.
|
||||||
|
* The new system will have high costs, limited functionality, new and
|
||||||
|
unique errors/issues, and lower volumes (so the "per unit cost" of
|
||||||
|
the new system will be quite high).
|
||||||
|
* The older system will still be running most of the business, and
|
||||||
|
usually all of the complex business, while having lost its best
|
||||||
|
engineers and subject matter experts. Its maintenance budget will
|
||||||
|
have been whittled down to nothing to redirect spending to
|
||||||
|
implement (save?) the new system. This system will be in grave
|
||||||
|
danger to significant system failure (which proponents of the new
|
||||||
|
system will use to justify the investment in the new system, not
|
||||||
|
admitting to a self-fulfilling prophecy).
|
||||||
|
|
||||||
|
Neither system will exhibit operational excellence, and both put the
|
||||||
|
organization at significant risk in addition to the higher costs and
|
||||||
|
complexity of running two systems.
|
||||||
|
|
||||||
|
Maintaining Software to Last Forever
|
||||||
|
|
||||||
|
As I discussed in [14]How Software Learns, software adapts over time -
|
||||||
|
as it is continually refined and reshaped by maintenance and
|
||||||
|
enhancements. Maintenance is crucial to software's lifespan and
|
||||||
|
business relevance/value. When software systems are first developed,
|
||||||
|
they are based on a prediction of the future - a prediction of the
|
||||||
|
future that we know is wrong even as we make it. No set of requirements
|
||||||
|
have ever been perfect. However, all new systems become "less wrong" as
|
||||||
|
time, experience, and knowledge are continually added (e.g.,
|
||||||
|
maintenance).
|
||||||
|
|
||||||
|
Futureproofing means constantly rethinking and iterating on the
|
||||||
|
existing system. We know from both research and experience that
|
||||||
|
iterating and maintaining existing solutions is a much more likely, and
|
||||||
|
less expensive, way to improve software's lifespan and functionality.
|
||||||
|
|
||||||
|
Before choosing to replace a system that needs deferred maintenance
|
||||||
|
remember it’s the lack of maintenance that create the impression that
|
||||||
|
failure is inevitable, and pushes otherwise rational engineers and
|
||||||
|
executives toward rewrites or replacements. What mechanisms will
|
||||||
|
prevent lack of maintenance from eventually dooming the brand-new
|
||||||
|
system? Has the true root problem been addressed?
|
||||||
|
|
||||||
|
Robust maintenance practices could preserve software for decades, but
|
||||||
|
first maintenance must be valued, funded, and applied. To maintain
|
||||||
|
software properly we have to consider:
|
||||||
|
1. How do you measure the overall health of a system?
|
||||||
|
2. How do you define and manage maintenance work?
|
||||||
|
3. How do you define a reasonable maintenance budget? How can you
|
||||||
|
protect that budget?
|
||||||
|
4. How do you motivate engineers to perform maintenance?
|
||||||
|
|
||||||
|
1. How do you measure the overall health of a system?
|
||||||
|
|
||||||
|
Objective measures
|
||||||
|
|
||||||
|
1. Maintenance Backlog — If you added up all the open work requests,
|
||||||
|
including work the software engineers deem necessary to eliminate
|
||||||
|
technical debt, what is the total amount of effort? Now, divide
|
||||||
|
that by the team capacity. For example, imagine you have a total
|
||||||
|
amount of work of 560 days, and you have one person assigned to
|
||||||
|
support the system - they work approximately 200 days annually. The
|
||||||
|
backlog in days in 560, but in time it is 2.8 years (560 days / 200
|
||||||
|
days/year = 2.8 years). What is a reasonable amount of backlog
|
||||||
|
time?
|
||||||
|
2. System Reliability/Downtime — If you added up all the time the
|
||||||
|
system is down in a given period, what is the total amount? What is
|
||||||
|
the user or customer impact of that downtime? Conversely, what
|
||||||
|
would reducing that downtime be worth? What is the relationship of
|
||||||
|
maintenance and downtime? In other words, does the system need to
|
||||||
|
be taken down to maintain it (planned maintenance)? Does planned
|
||||||
|
maintenance reduce unplanned downtime?
|
||||||
|
3. Capacity/Performance Constraints — Is the existing hitting capacity
|
||||||
|
constraints that will prevent future growth of the business? How
|
||||||
|
unpredictable are the system capacity demands? What is the customer
|
||||||
|
experience when the system capacity is breached? What is
|
||||||
|
relationship between hardware and software that constrains the
|
||||||
|
system? Is the software performant? Can hardware solve the problem?
|
||||||
|
|
||||||
|
Subjective measures
|
||||||
|
|
||||||
|
1. User Satisfaction: User satisfaction includes both how happy your
|
||||||
|
employees are with the applications and/or how well those
|
||||||
|
applications meet your customer's needs. Many times I have found
|
||||||
|
the technology team and the business users arguing over "bug" vs.
|
||||||
|
"enhancement". It is a way of assigning blame. "Bug" means its
|
||||||
|
engineering's fault, "enhancement" means it was a missed
|
||||||
|
requirement. When emotions run hot it means that the maintenance
|
||||||
|
budget is insufficient. I always tell everyone they are both just
|
||||||
|
maintenance and the only important decision is which to prioritize
|
||||||
|
and fix first.
|
||||||
|
2. “Shadow IT” — If you used applications in the past that didn’t meet
|
||||||
|
employees’ needs, and didn’t have a good governance plan to address
|
||||||
|
problems, you may have noticed employees found other solutions on
|
||||||
|
their own. This is an indication of underfunded maintenance.
|
||||||
|
3. Adaptable Architecture — "The cloud", API-based integration, and
|
||||||
|
unlocking your data are no longer “nice to haves.” Your
|
||||||
|
architecture needs to adapt. If these are challenges, then the
|
||||||
|
architecture must be addressed.
|
||||||
|
4. Governance — Healthy application architecture isn’t just about
|
||||||
|
technology—it’s also about having well-documented and
|
||||||
|
well-understood governance documents that guide technology
|
||||||
|
investments for your organization. Good governance helps create
|
||||||
|
adaptable architecture and avoid “shadow IT” applications.
|
||||||
|
|
||||||
|
2. How do you define maintenance work?
|
||||||
|
|
||||||
|
There are four general types of software maintenance. The first two
|
||||||
|
types take up the majority of most organizations' maintenance budget,
|
||||||
|
and may not even be considered maintenance - however, all four types
|
||||||
|
must be funded adequately for software to remain healthy. If you can't
|
||||||
|
fully address types three and four your maintenance budget is
|
||||||
|
inadequate.
|
||||||
|
|
||||||
|
1. Corrective Software Maintenance (more accurately called "repair")
|
||||||
|
|
||||||
|
Corrective software maintenance is necessary when something goes wrong
|
||||||
|
in a piece of software including faults and errors. These can have a
|
||||||
|
widespread impact on the functionality of the software in general and
|
||||||
|
therefore must be addressed as quickly as possible. However, it is
|
||||||
|
important to consider repair work separate from the other types of
|
||||||
|
maintenance because repair work must get done. Note: this is generally
|
||||||
|
the only type of work that happens when a system is put on "life
|
||||||
|
support".
|
||||||
|
|
||||||
|
2. Perfective Software Maintenance (more accurately called "enhancements")
|
||||||
|
|
||||||
|
Once software is released and is being used new issues and ideas come
|
||||||
|
to the surface. Users will think up new features or requirements that
|
||||||
|
they would like to see. Perfective software maintenance aims to adjust
|
||||||
|
software by adding new features as necessary (and removing features
|
||||||
|
that are irrelevant or not effective). This process keeps software
|
||||||
|
relevant as the market, and user needs, evolve. It there is funding
|
||||||
|
beyond "life support" it usually is spent here.
|
||||||
|
|
||||||
|
3. Preventative Software Maintenance (true maintenance is catching problems
|
||||||
|
before they happen.)
|
||||||
|
|
||||||
|
Preventative software maintenance is looking into the future so that
|
||||||
|
your software can keep working as desired for as long as possible. This
|
||||||
|
includes making necessary changes, upgrades, and adaptations.
|
||||||
|
Preventative software maintenance may address small issues which at the
|
||||||
|
given time may lack significance but may turn into larger problems in
|
||||||
|
the future. These are called latent faults which need to be detected
|
||||||
|
and corrected to make sure that they won’t turn into effective faults.
|
||||||
|
This type of maintenance is generally underfunded.
|
||||||
|
|
||||||
|
4. Adaptive Software Maintenance (true maintenance adapts to changes)
|
||||||
|
|
||||||
|
Adaptive software maintenance is responding to the changing technology
|
||||||
|
landscape, as well as new company policies and rules regarding your
|
||||||
|
software. These include operating system changes, using cloud
|
||||||
|
technology, security policies, hardware changes, etc. When these
|
||||||
|
changes are performed, your software (and possibly architecture) must
|
||||||
|
adapt to properly meet new requirements and meet current security and
|
||||||
|
other policies.
|
||||||
|
|
||||||
|
3. How do you define a reasonable maintenance budget? How can you protect
|
||||||
|
that budget?
|
||||||
|
|
||||||
|
In the case of the Inca rope bridges what was the cost of maintenance
|
||||||
|
annually? Let's assume some of the build work was site preparation and
|
||||||
|
building the stone anchors on each side, but most of the work was
|
||||||
|
constructing the bridge itself. Since the bridge was entirely replaced
|
||||||
|
each year, the maintenance costs could be as much as 80% of the initial
|
||||||
|
build effort, every year.
|
||||||
|
|
||||||
|
Comparing to "software as a service" (SaaS) vendors is difficult
|
||||||
|
because they have shifted to a subscription model that bundles
|
||||||
|
infrastructure, enhancements, and ongoing maintenance. Prior to SaaS
|
||||||
|
subscription-based pricing one would typically buy a perpetual license
|
||||||
|
plus maintenance at ~20-30% annual cost of the license to obtain
|
||||||
|
support and updates.
|
||||||
|
|
||||||
|
Side note: Now that the SaaS annual costs are commingled, some
|
||||||
|
enterprises fall into the trap that “building it is cheaper because we
|
||||||
|
pay up front but then it will cost less in the long run” assuming the
|
||||||
|
"long run" almost always underprices infrastructure and assumes near
|
||||||
|
zero maintenance cost. In the case of a brand-new, internally designed
|
||||||
|
and developed software system - one that is well architected, well
|
||||||
|
designed, well built, and meets all reliability, scalability, and
|
||||||
|
performance needs (i.e., fantasy software) it's conceivable that there
|
||||||
|
is no maintenance necessary for some period of time - but very
|
||||||
|
unlikely.
|
||||||
|
|
||||||
|
So, maintenance costs can have a very wide range. A general rule of
|
||||||
|
thumb is 20-30% of the initial build cost will be required for ongoing
|
||||||
|
maintenance work annually. However, maintenance costs usually start off
|
||||||
|
lower and increase over time. They are also unpredictable costs that
|
||||||
|
are hard to budget.
|
||||||
|
|
||||||
|
The challenges should be obvious. First, budgets in large organizations
|
||||||
|
tend be last year's budget plus 2-3%. If you start with a maintenance
|
||||||
|
budget of zero on a new system, how do you ever get to the point of a
|
||||||
|
healthy maintenance budget in the future? Second, maintenance costs are
|
||||||
|
unpredictable, and organizations hate unpredictable costs. It's
|
||||||
|
impossible to say when the next new hardware, or storage, or
|
||||||
|
programming construct will occur, or when the existing system will hit
|
||||||
|
a performance or scalability inflection point.
|
||||||
|
|
||||||
|
This is like buying a brand-new car. The maintenance costs are
|
||||||
|
negligible in the first couple years, until they start to creep up.
|
||||||
|
Then things start to need maintenance, replacement, or repair. As the
|
||||||
|
car ages the maintenance costs continue to increase until at some point
|
||||||
|
it makes economic sense to buy another new car. Except none of us wait
|
||||||
|
that long. Most of us buy new cars before our old one is completely
|
||||||
|
worn out. As a counter-example, in Cuba some cars have been maintained
|
||||||
|
meticulously for 30-40 years and run better than new.
|
||||||
|
|
||||||
|
Protecting your maintenance budget - creating a "maintenance fund"
|
||||||
|
|
||||||
|
We know that maintenance cost increase over time, and the costs of
|
||||||
|
proper maintenance are unpredictable. In addition, there is some amount
|
||||||
|
of management discretion that can be applied. When your house needs a
|
||||||
|
new roof it's reasonable to defer it through summer, but it probably
|
||||||
|
needs to be done before winter.
|
||||||
|
|
||||||
|
Since business require predictability of costs, unpredictable
|
||||||
|
maintenance costs are easy to defer. "We didn't budget for that; we'll
|
||||||
|
have to put it in next year's budget." Except of course in the budget
|
||||||
|
process it will compete with other projects and enhancement work, where
|
||||||
|
it's again likely to be deprioritized.
|
||||||
|
|
||||||
|
What's the solution? Could it be possible to create some type of
|
||||||
|
maintenance fund where a predictable amount is budgeted each year, and
|
||||||
|
then spent "unpredictably" when/as needed? Could this also be a
|
||||||
|
solution to preventing executives from diverting maintenance budget
|
||||||
|
into pet projects by protecting this maintenance fund in some fashion?
|
||||||
|
|
||||||
|
4. How do you motivate software engineers to perform maintenance?
|
||||||
|
|
||||||
|
There is a Chinese proverb about a discussion between a king and a
|
||||||
|
famous doctor. The well-known doctor explains to the king that his
|
||||||
|
brother (who is also a doctor) is superior at medicine, but he is
|
||||||
|
unknown because he always successfully treats small illnesses,
|
||||||
|
preventing them from evolving into more serious or terminal ones. So,
|
||||||
|
people say "Oh he is a fine doctor, but he only treats minor
|
||||||
|
illnesses". It's true: [15]Nobody Ever Gets Credit for Fixing Problems
|
||||||
|
that Never Happened.
|
||||||
|
|
||||||
|
To most software engineers, legacy systems seem like torturous dead-end
|
||||||
|
work, but the reality is systems that are not important get turned off.
|
||||||
|
Working on "estate" systems means working on some of the most critical
|
||||||
|
systems that exist — computers that govern millions of people’s lives
|
||||||
|
in enumerable ways. This is not the work of technical janitors, but
|
||||||
|
battlefield surgeons.
|
||||||
|
|
||||||
|
Engineering loves new technology. It gains the engineers attention and
|
||||||
|
industry marketability. [16]Boring technology on the other hand is
|
||||||
|
great for the company. The engineering cost is lower, and the skills
|
||||||
|
are easier to obtain and keep, because these engineers are not being
|
||||||
|
pulled out of your organization for double their salary by Amazon or
|
||||||
|
Google.
|
||||||
|
|
||||||
|
Well-designed, high-functioning software that is easy to understand
|
||||||
|
usually blends in. Simple solutions do not do much to enhance one’s
|
||||||
|
personal brand. Therefore, when an organization provides limited
|
||||||
|
pathways to promotion for software engineers, they tend to make
|
||||||
|
technical decisions that emphasize their individual contribution and
|
||||||
|
technical prowess. You have to be very careful to reward what you want
|
||||||
|
from your engineering team.
|
||||||
|
|
||||||
|
What earns them the acknowledgment of their peers? What gets people
|
||||||
|
seen is what they will ultimately prioritize, even if those behaviors
|
||||||
|
are in open conflict with the official direction they receive from
|
||||||
|
management. In most organizations shipping new code gets attention,
|
||||||
|
while technical debt accrues silently in the background.
|
||||||
|
|
||||||
|
The specific form of acknowledgment also matters a lot. Positive
|
||||||
|
reinforcement in the form of social recognition tends to be a more
|
||||||
|
effective motivator than the traditional incentive structure of
|
||||||
|
promotions, raises, and bonuses. Behavioral economist Dan Ariely
|
||||||
|
attributes this to the difference between social markets and
|
||||||
|
traditional monetary-based markets. Social markets are governed by
|
||||||
|
social norms (read: peer pressure and social capital), and they often
|
||||||
|
inspire people to work harder and longer than much more expensive
|
||||||
|
incentives that represent the traditional work-for-pay exchange. In
|
||||||
|
other words, people will work really hard for positive reinforcement
|
||||||
|
from their peers.
|
||||||
|
|
||||||
|
Legacy System Modernization
|
||||||
|
|
||||||
|
Unmaintained software will certainly die at some point. Due to factors
|
||||||
|
discussed above, software does not always receive the proper amount of
|
||||||
|
maintenance to remain healthy. Eventually a larger modernization effort
|
||||||
|
may become necessary to restore a system to operational and functional
|
||||||
|
excellence.
|
||||||
|
|
||||||
|
Legacy modernization projects start off feeling easy. The organization
|
||||||
|
once had a reliable working system and kept it running for years. All
|
||||||
|
the modernizing team should need to do is simply reshape it using
|
||||||
|
better technology, better architecture, the benefit of hindsight, and
|
||||||
|
improved tooling. It should be simple. But, because people do not see
|
||||||
|
the hidden technical challenges they are about to uncover, they also
|
||||||
|
assume the work will be boring. There’s little glory to be had
|
||||||
|
re-implementing a solved problem.
|
||||||
|
|
||||||
|
Modernization projects are also typically the ones organizations just
|
||||||
|
want to get out of the way, so they launch into them unprepared for the
|
||||||
|
time and resource commitments they require. Modernization projects take
|
||||||
|
months, if not years of work. Keeping a team of engineers focused,
|
||||||
|
inspired, and motivated from beginning to end is difficult. Keeping
|
||||||
|
their senior leadership prepared to invest in what is, in effect,
|
||||||
|
something they already have is a huge challenge. Creating momentum and
|
||||||
|
sustaining it are where most modernization projects fail.
|
||||||
|
|
||||||
|
The hard part about legacy modernization is the "system around the
|
||||||
|
system". The organization, its communication structures, its politics,
|
||||||
|
and its incentives are all intertwined with the technical product in
|
||||||
|
such a way that to improve the product, you must do it by turning the
|
||||||
|
gears of this other, complex, undocumented system. Pay attention to
|
||||||
|
politics and culture. Technology is at most only 50% of the legacy
|
||||||
|
problem, ways of working, organization structure and
|
||||||
|
leadership/sponsorship are just as important to success.
|
||||||
|
|
||||||
|
To do this, you need to overcome people’s natural skepticism and get
|
||||||
|
them to buy in. The important word in the phrase "proof of concept" is
|
||||||
|
proof. You need to prove to people that success is possible and worth
|
||||||
|
doing. It can't be just an MVP, because [17]MVPs are dangerous.. A red
|
||||||
|
flag is raised when companies talk about the phases of their
|
||||||
|
modernization plans in terms of which technologies they are going to
|
||||||
|
use rather than what value they will add.
|
||||||
|
|
||||||
|
For all that people talk about COBOL dying off, it is good at certain
|
||||||
|
tasks. The problem with most old COBOL systems is that they were
|
||||||
|
designed at a time when COBOL was the only option. Start by sorting
|
||||||
|
which parts of the system are in COBOL because COBOL is good at
|
||||||
|
performing that task, and which parts are in COBOL because there were
|
||||||
|
no other technologies available. Once you have that mapping, start by
|
||||||
|
pulling the latter off into separate services that are written and
|
||||||
|
designed using the technology we would choose for that task today.
|
||||||
|
|
||||||
|
Going through the exercise of understanding what functionality is fit
|
||||||
|
for use for specific languages/technologies not only gives engineers a
|
||||||
|
way to keep building their skillsets but also is an opportunity to pair
|
||||||
|
with other engineers who have different/complimentary skills. This
|
||||||
|
exchange also has the benefit of diffusing the understanding of the
|
||||||
|
system to a broader group of people without needing to solely rely on
|
||||||
|
documentation (which never exists).
|
||||||
|
|
||||||
|
Counterintuitively, SLAs/SLOs are valuable because they provide a
|
||||||
|
"failure budget". When organizations stop aiming for perfection and
|
||||||
|
accept that all systems will occasionally fail, they stop letting their
|
||||||
|
technology rot for fear of change. In most cases, mean time to recovery
|
||||||
|
(MTTR) is a more useful statistic to push than reliability. MTTR tracks
|
||||||
|
how long it takes the organization to recover from failure. Resilience
|
||||||
|
in engineering is all about recovering stronger from failure. That
|
||||||
|
means better monitoring, better documentation, and better processes for
|
||||||
|
restoring services, but you can’t improve any of that if you don’t
|
||||||
|
occasionally fail.
|
||||||
|
|
||||||
|
Although a system that constantly breaks, or that breaks in unexpected
|
||||||
|
ways without warning, will lose its users’ trust, the reverse isn’t
|
||||||
|
necessarily true. A system that never breaks doesn’t necessarily
|
||||||
|
inspire high degrees of trust - and its maintenance budget is even
|
||||||
|
easier to cut.
|
||||||
|
|
||||||
|
People take systems that are too reliable for granted. Italian
|
||||||
|
researchers Cristiano Castelfranchi and Rino Falcone have been
|
||||||
|
advancing a general model of trust that postulates trust naturally
|
||||||
|
degrades over time, regardless of whether any action has been taken to
|
||||||
|
violate that trust. Under Castelfranchi and Falcone’s model,
|
||||||
|
maintaining trust doesn’t mean establishing a perfect record; it means
|
||||||
|
continuing to rack up observations of resilience. If a piece of
|
||||||
|
technology is so reliable it has been completely forgotten, it is not
|
||||||
|
creating those regular observations. Through no fault of the
|
||||||
|
technology, the user’s trust in it slowly deteriorates.
|
||||||
|
|
||||||
|
When both observability and testing are lacking on your legacy system,
|
||||||
|
observability comes first. Tests tell you only what shouldn’t fail;
|
||||||
|
monitoring tells you what is failing. Don’t forget: a perfect record
|
||||||
|
will always be broken, but resilience is an accomplishment that lasts.
|
||||||
|
Modern engineering teams use stats like service level objectives, error
|
||||||
|
budgets, and mean time to recovery to move the emphasis away from
|
||||||
|
avoiding failure and toward recovering quickly.
|
||||||
|
|
||||||
|
Summary
|
||||||
|
|
||||||
|
Maintenance mostly happens out of sight, mysteriously. If we notice it,
|
||||||
|
it’s a nuisance. When road crews block off sections of highway to fix
|
||||||
|
cracks or potholes, we treat it as an obstruction, not a vital and
|
||||||
|
necessary process. This is especially true in the public sector: it’s
|
||||||
|
almost impossible to get governmental action on, or voter interest in,
|
||||||
|
spending on preventive maintenance, yet governments make seemly
|
||||||
|
unlimited funds available once we have a disaster. We are okay spending
|
||||||
|
a massive amount of money to fix a problem, but consistently resist
|
||||||
|
spending a much smaller amount of money to prevent it; as a business
|
||||||
|
strategy this makes no sense.
|
||||||
|
|
||||||
|
The [18]Open Mainframe Project estimates that there about 250 billion
|
||||||
|
lines of COBOL code running today in the world economy, and nearly all
|
||||||
|
COBOL code contains critical business logic. Companies should maintain
|
||||||
|
that software and make it last as long as possible.
|
||||||
|
|
||||||
|
References
|
||||||
|
|
||||||
|
* [19]Things You Should Never Do, Part I
|
||||||
|
* [20]Patterns of Legacy Displacement
|
||||||
|
* [21]Kill It with Fire: Manage Aging Computer Systems (and Future
|
||||||
|
Proof Modern Ones)
|
||||||
|
* [22]Building software to last forever
|
||||||
|
* [23]The Disappearing Art Of Maintenance
|
||||||
|
* [24]Inca rope bridge
|
||||||
|
* [25]How Often Do Commercial Airplanes Need Maintenance?
|
||||||
|
* [26]Nobody Ever Gets Credit for Fixing Problems that Never Happened
|
||||||
|
* [27]Boring Technology Club
|
||||||
|
* [28]Open Mainframe Project 2021 Annual Report
|
||||||
|
* [29]How Popular is COBOL?
|
||||||
|
__________________________________________________________________
|
||||||
|
|
||||||
|
Image Credit: Bill Gates, CEO of Microsoft, holds Windows 1.0 floppy
|
||||||
|
discs.
|
||||||
|
|
||||||
|
(Photo by Deborah Feingold/Corbis via Getty Images) This was the
|
||||||
|
release of Windows 1.0. The beginning. Computers evolve. The underlying
|
||||||
|
hardware, CPU, memory, and storage evolves. The operating system
|
||||||
|
evolves. Of course, the software we use must evolve as well.
|
||||||
|
|
||||||
|
Sharing is Caring
|
||||||
|
|
||||||
|
(BUTTON) (BUTTON) (BUTTON)
|
||||||
|
|
||||||
|
[30]Edit this page
|
||||||
|
|
||||||
|
Dan Stroot · Blog
|
||||||
|
I love building things. Made in California. Family man, technologist
|
||||||
|
and Hacker News aficionado. Eternally curious.
|
||||||
|
[31]Join me on Twitter.[32]Join me on LinkedIn.[33]Join me on GitHub.
|
||||||
|
Crafted with ♥️ in California. © 2023, [34]Dan Stroot
|
||||||
|
|
||||||
|
References
|
||||||
|
|
||||||
|
Visible links:
|
||||||
|
1. https://www.danstroot.com/feed.xml
|
||||||
|
2. https://www.googletagmanager.com/ns.html?id=GTM-55JC288
|
||||||
|
3. file:///
|
||||||
|
4. file:///
|
||||||
|
5. file:///about
|
||||||
|
6. file:///archive
|
||||||
|
7. file:///snippets
|
||||||
|
8. file:///uses
|
||||||
|
9. https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/
|
||||||
|
10. https://www.technologyreview.com/2015/08/06/166822/what-is-the-oldest-computer-program-still-in-use/
|
||||||
|
11. https://www.guinnessworldrecords.com/world-records/636196-oldest-software-system-in-continuous-use
|
||||||
|
12. https://en.wikipedia.org/wiki/Sabre_(travel_reservation_system)
|
||||||
|
13. https://en.wikipedia.org/wiki/Inca_rope_bridge
|
||||||
|
14. file:///posts/2022-06-05-how-software-learns
|
||||||
|
15. https://web.mit.edu/nelsonr/www/Repenning=Sterman_CMR_su01_.pdf
|
||||||
|
16. https://engineering.atspotify.com/2013/02/in-praise-of-boring-technology/
|
||||||
|
17. file:///posts/2021-12-27-dangerous-mvps
|
||||||
|
18. https://www.openmainframeproject.org/
|
||||||
|
19. https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/
|
||||||
|
20. https://martinfowler.com/articles/patterns-legacy-displacement/
|
||||||
|
21. https://www.amazon.com/Kill-Fire-Manage-Computer-Systems/dp/1718501188
|
||||||
|
22. https://herman.bearblog.dev/building-software-to-last-forever/
|
||||||
|
23. https://www.noemamag.com/the-disappearing-art-of-maintenance/
|
||||||
|
24. https://en.wikipedia.org/wiki/Inca_rope_bridge
|
||||||
|
25. https://monroeaerospace.com/blog/how-often-do-commercial-airplanes-need-maintenance/#:~:text=Commercial airplanes require frequent maintenance,inspection once every few years.
|
||||||
|
26. https://web.mit.edu/nelsonr/www/Repenning=Sterman_CMR_su01_.pdf
|
||||||
|
27. https://boringtechnology.club/
|
||||||
|
28. https://www.openmainframeproject.org/wp-content/uploads/sites/11/2022/04/OMP_Annual_Report_2021_040622.pdf
|
||||||
|
29. https://news.ycombinator.com/item?id=33999718
|
||||||
|
30. https://github.com/dstroot/blog-next-13/blob/master/content/posts/2023-05-25-making_software_last_forever.mdx
|
||||||
|
31. https://twitter.com/danstroot
|
||||||
|
32. https://www.linkedin.com/in/danstroot
|
||||||
|
33. https://github.com/dstroot/blog-next
|
||||||
|
34. file:///analytics
|
||||||
|
|
||||||
|
Hidden links:
|
||||||
|
36. file://localhost/search
|
||||||
|
37. file://localhost/about
|
||||||
Reference in New Issue
Block a user