davideisinger.com/static/archive/www-danstroot-com-wwjfi6.txt

   #[1]Blog Posts RSS

   IFRAME: [2]https://www.googletagmanager.com/ns.html?id=GTM-55JC288

   [3]Dan Stroot
   [4]Home[5]About[6]Archive[7]Snippets[8]Uses[9]Quotes

   (BUTTON) Toggle Menu

Making Software Last Forever

   Hero image for Making Software Last Forever
   27 min read
   Dan Stroot
   Dan Stroot
   May 25, 2023

   How many of us have bought a new home because our prior home was not
   quite meeting our needs? Maybe we needed an extra bedroom, or wanted a
   bigger backyard? Now, as a thought experiment, assume you couldn't sell
   your existing home. If you bought a new home, you'd have to "retire" or
   "decommission" your prior home (and your investment in it). Does that
   change your thinking?

   Further, imagine you had a team of five people maintaining your prior
   home, improving it, and keeping it updated, for the last ten years.
   You'd have a cumulative investment of 50 person/years in your existing
   home (5 people x 10 years) just in maintenance, on top of the initial
   investment. If each person was paid the equivalent of a software
   developer (we'll use $200k to include benefits, office space,
   leadership, etc.) you'd have an investment just in labor of $10 million
   dollars (50 person/years x $200,000). Would you walk away from that
   investment?

   When companies decide to re-write or replace an existing software
   application, they are making a similar decision. Existing software is
   "retired" or "decommissioned" (along with its cumulative investment).
   Yet the belief that new code is always better than old is patently
   absurd. Old code has weathered and withstood the test of time. It has
   been battle-tested. You know it's failure modes. Bugs have been found,
   and more importantly, fixed.

   Joel Spolsky (of Fog Creek Software and Stack Overflow) describes
   system re-writes in "[10]Things You Should Never Do, Part I" as “the
   single worst strategic mistake that any software company can make.”

   Continuing our home analogy, recent price increases for construction
   materials like lumber, drywall, and wiring (and frankly everything
   else) should, according to Economics 101, cause us to treat our current
   homes more dearly. Similarly, price increases for quality software
   engineers should force companies to treat existing software more
   dearly.

   Lots of current software started out as C software from the 1980s.
   Engineers don't often write software with portability as a goal at the
   beginning, but once something is relatively portable, it tends to stay
   that way. Code that was well designed and written often migrated from
   mini-computers to i386, from i386 to amd64, and now ARM and arch64,
   with a minimum of redesign or effort. You can take large, complicated
   programs from the 1980s written in C, and compile/run them on a modern
   Linux computer - even when the modern computer is running architectures
   which hadn't even been dreamt of when the software was originally
   written.

   Why can't software last forever? It's not made of wood, concrete, or
   steel. It doesn't "wear out", rot, weather, or rust. A working
   algorithm is a working algorithm. Technology doesn’t need to be
   beautiful, or impress other people, to be effective. Aren't
   technologists ultimately in the business of producing cost effective
   technology?

   I am going to attempt to convince you that maintaining your existing
   systems is one the most cost-effective technology investments you can
   make.

The World's Oldest Software Systems

   In 1958, the United States Department of Defense launched a new
   computer-based contract management system called "Mechanization of
   Contract Administration Services", or MOCAS (pronounced “MOH-cass”). In
   2015, [11]MIT Technology Review stated that MOCAS was the oldest
   computer program in continuous use they could verify. At that time
   MOCAS managed about $1.3 trillion in government obligations and 340,000
   contracts.

   According to the [12]Guinness Book of World Records, the oldest
   software system in use today is either the [13]SABRE Airline
   Reservation System (introduced in 1960), or the IRS Individual Master
   File (IMF) and Business Master File (BMF) systems introduced in
   1962–63.

   SABRE went online in 1960. It had cost $40 million to develop and
   install (about $400 million in 2022 dollars). The system took over all
   American Airlines booking functions in 1964, and the system was
   expanded to provide access to external travel agents in 1976.

   What is the secret to the long lifespan of these systems? Shouldn't
   companies with long-lived products (annuities, life insurance, etc.)
   study these examples? After all, they need systems to support products
   that last most of a human lifespan. However, shouldn't all companies
   want to their investments in software to last as long as possible?

Maintenance is About Making Something Last

   We spoke of SABRE above, and we know that airlines recognize the value
   of maintenance. Commercial aircraft are inspected at least once every
   two days. Engines, hydraulics, environmental, and electrical systems
   all have additional maintenance schedules. A "heavy" maintenance
   inspection occurs once every few years. This process maintains the
   aircraft's service life over decades.

   On average, an aircraft is operable for about 30 years before it must
   be retired. A Boeing 747 can endure 35,000 pressurization cycles —
   roughly 135,000 to 165,000 flight hours — before metal fatigue sets in.
   However, most older airframes are retired for fuel-efficiency reasons,
   not because they're worn out.

   Even stuctures made of grass can last indefinitely. [14]Inca rope
   bridges were simple suspension bridges constructed by the Inca Empire.
   The bridges were an integral part of the Inca road system were
   constructed using ichu grass.

   Inca Rope Bridge

   Even though they were made of grass, these bridges were maintained with
   such regularity and attention they lasted centuries. The bridge's
   strength and reliability came from the fact that each cable was
   replaced every June.

   The goal of maintenance is catching problems before they happen. That’s
   the difference between maintenance and repair. Repair is about fixing
   something that’s already broken. Maintenance is about making something
   last.

Unfortunately, Maintenance is Chronically Undervalued

   Maintenance is one of the easiest things to cut when budgets get tight.
   Some legacy software systems have decades of underinvestment in
   maintenance. This leads up to the inevitable "we have to replace it"
   discussion - which somehow always sounds more persuasive (even though
   it’s more expensive and riskier) than arguing to invest in system
   rehabilitation and deferred system maintenance.

   Executives generally can't refuse "repair" work because the system is
   broken and must be fixed. However, maintenance is a tougher sell. It’s
   not strictly necessary — or at least it doesn’t seem to be until things
   start falling apart. It is so easy to divert maintenance budget into a
   halo project that gets an executive noticed (and possibly promoted)
   before the long-term effects of underinvestment in maintenance become
   visible. Even worse, the executive is also admired for reducing the
   costs of maintenance and switching costs from "run" to "grow" - while
   they are torpedoing the company under the waterline.

   The other challenge is conflating enhancement work with maintenance
   work. Imagine you have $1,000 and you want to add a sunroof to your
   car, but you also need new tires (which coincidentally also cost
   $1,000). You have to replace the tires every so often, but a sunroof is
   "forever" right? If you spend the money on the sunroof the tires could
   get replaced next month, or maybe the month after - they'll last a
   couple more months, won't they?

   With software, users can't see "the bald tires" - they only thing they
   see, or experience (and value), are new features and capabilities.
   Pressure is always present to cut costs and to add new features. The
   result is budget always swings away from maintenance work towards
   enhancements.

   Finally, maintenance work is typically an operational cost, yet
   building a new system, or a significant new feature, can often be
   capitalized - making the future costs someone else's problem.

Risks of Replacing Software Systems

   It's usually not the design or the age of a system that causes it to
   fail but rather neglect. People fail to maintain software systems
   because they are not given the time, incentives, or resources to
   maintain them.

     "Most of the systems I work on rescuing are not badly built. They
     are badly maintained."
     — Marianne Bellotti, Kill it With Fire

   Once a system degrades it is an enormous challenge to fund deferred
   maintenance (or "technical debt"). No one plans for it, no one wants to
   pay for it, and no engineer wants to do it. Initiatives to restore
   operational excellence, much the way one would fix up an old house,
   tend to have few volunteers among engineering teams. No one gets
   noticed doing maintenance. No one ever gets promoted because of
   maintenance.

   It should be clear why engineers prefer to re-write a system rather
   than maintain it. They get to "write a new story" rather than edit
   someone else's. They will attempt to convince a senior executive to
   fund a project to replace a problematic system by describing all the
   new features and capabilities that could be added as well as how "bad"
   the existing, unmaintained, system has become. Further, they will get
   to use modern technology that makes them much more valuable in the
   market.

   Incentives aside, engineering teams tend to gravitate toward system
   rewrites because they incorrectly think of old systems as specs. They
   assume that since an old system works, the functional risks have been
   eliminated. They can focus on adding more features to the new system or
   make changes to the underlying architecture without worry. Either they
   do not perceive the ambiguity these changes introduce, or they see such
   ambiguity positively, imagining only gains in performance and the
   potential for innovation.

   Why not authorize that multimillion-dollar replacement if the engineers
   convince management the existing system is doomed? Eventually a
   "replacement" project will be funded (typically at a much higher
   expenditure than rehabilitating the existing system). Even if the
   executives are not listening to the engineers, they will be listening
   to external consultants telling them they are falling behind.

   What do you do with the old system while you’re building the new one?
   Most organizations put the old system on “life support” and give it
   only the resources for patches and fixes necessary to keep it running.
   This reduces maintenance even further and becomes a self-fulfilling
   prophecy that the existing system will eventually fail.

   Who gets to work on the new system, and who takes on the maintenance
   tasks of the old system? If the old system is written in older
   technology that the company is actively abandoning, the team
   maintaining the old system is essentially sitting around waiting to be
   fired. And don’t kid yourself, they know it. If the people maintaining
   the old system are not participating in the creation of the new system,
   you should expect that they are also looking for new jobs. If they
   leave before your new system is operational, you lose both their
   expertise and their institutional knowledge.

   If the new project falls behind schedule (and it almost certainly
   will), the existing system continues to degrade, and knowledge
   continues to walk out the door. If the new project fails and is
   subsequently canceled, the gap between the legacy system and
   operational excellence has widened significantly in the meantime.

   This explains why executives are loathe to cancel system replacement
   projects even when they are obviously years behind schedule and failing
   to live up to expectations. Stopping the replacement project seems
   impossible because the legacy system is now so degraded that restoring
   it to operational excellence seems impossible. Plus, politically
   canceling a marquee project can be career suicide for the sponsoring
   executive(s). Much better to do "deep dives" and "assessments" on why
   the project is failing and soldier on than cancel it.

   The interim state is not pretty. The company now has two systems to
   operate, much higher costs and new risks.
     * The new system will have high costs, limited functionality, new and
       unique errors/issues, and lower volumes (so the "per unit cost" of
       the new system will be quite high).
     * The older system will still be running most of the business, and
       usually all of the complex business, while having lost its best
       engineers and subject matter experts. Its maintenance budget will
       have been whittled down to nothing to redirect spending to
       implement (save?) the new system. This system will be in grave
       danger to significant system failure (which proponents of the new
       system will use to justify the investment in the new system, not
       admitting to a self-fulfilling prophecy).

   Neither system will exhibit operational excellence, and both put the
   organization at significant risk in addition to the higher costs and
   complexity of running two systems.

Maintaining Software to Last Forever

   As I discussed in [15]How Software Learns, software adapts over time -
   as it is continually refined and reshaped by maintenance and
   enhancements. Maintenance is crucial to software's lifespan and
   business relevance/value. When software systems are first developed,
   they are based on a prediction of the future - a prediction of the
   future that we know is wrong even as we make it. No set of requirements
   have ever been perfect. However, all new systems become "less wrong" as
   time, experience, and knowledge are continually added (e.g.,
   maintenance).

   Futureproofing means constantly rethinking and iterating on the
   existing system. We know from both research and experience that
   iterating and maintaining existing solutions is a much more likely, and
   less expensive, way to improve software's lifespan and functionality.

   Before choosing to replace a system that needs deferred maintenance
   remember it’s the lack of maintenance that create the impression that
   failure is inevitable, and pushes otherwise rational engineers and
   executives toward rewrites or replacements. What mechanisms will
   prevent lack of maintenance from eventually dooming the brand-new
   system? Has the true root problem been addressed?

   Robust maintenance practices could preserve software for decades, but
   first maintenance must be valued, funded, and applied. To maintain
   software properly we have to consider:
    1. How do you measure the overall health of a system?
    2. How do you define and manage maintenance work?
    3. How do you define a reasonable maintenance budget? How can you
       protect that budget?
    4. How do you motivate engineers to perform maintenance?

1. How do you measure the overall health of a system?

Objective measures

    1. Maintenance Backlog — If you added up all the open work requests,
       including work the software engineers deem necessary to eliminate
       technical debt, what is the total amount of effort? Now, divide
       that by the team capacity. For example, imagine you have a total
       amount of work of 560 days, and you have one person assigned to
       support the system - they work approximately 200 days annually. The
       backlog in days in 560, but in time it is 2.8 years (560 days / 200
       days/year = 2.8 years). What is a reasonable amount of backlog
       time?
    2. System Reliability/Downtime — If you added up all the time the
       system is down in a given period, what is the total amount? What is
       the user or customer impact of that downtime? Conversely, what
       would reducing that downtime be worth? What is the relationship of
       maintenance and downtime? In other words, does the system need to
       be taken down to maintain it (planned maintenance)? Does planned
       maintenance reduce unplanned downtime?
    3. Capacity/Performance Constraints — Is the existing hitting capacity
       constraints that will prevent future growth of the business? How
       unpredictable are the system capacity demands? What is the customer
       experience when the system capacity is breached? What is
       relationship between hardware and software that constrains the
       system? Is the software performant? Can hardware solve the problem?

Subjective measures

    1. User Satisfaction: User satisfaction includes both how happy your
       employees are with the applications and/or how well those
       applications meet your customer's needs. Many times I have found
       the technology team and the business users arguing over "bug" vs.
       "enhancement". It is a way of assigning blame. "Bug" means its
       engineering's fault, "enhancement" means it was a missed
       requirement. When emotions run hot it means that the maintenance
       budget is insufficient. I always tell everyone they are both just
       maintenance and the only important decision is which to prioritize
       and fix first.
    2. “Shadow IT” — If you used applications in the past that didn’t meet
       employees’ needs, and didn’t have a good governance plan to address
       problems, you may have noticed employees found other solutions on
       their own. This is an indication of underfunded maintenance.
    3. Adaptable Architecture — "The cloud", API-based integration, and
       unlocking your data are no longer “nice to haves.” Your
       architecture needs to adapt. If these are challenges, then the
       architecture must be addressed.
    4. Governance — Healthy application architecture isn’t just about
       technology—it’s also about having well-documented and
       well-understood governance documents that guide technology
       investments for your organization. Good governance helps create
       adaptable architecture and avoid “shadow IT” applications.

2. How do you define maintenance work?

   There are four general types of software maintenance. The first two
   types take up the majority of most organizations' maintenance budget,
   and may not even be considered maintenance - however, all four types
   must be funded adequately for software to remain healthy. If you can't
   fully address types three and four your maintenance budget is
   inadequate.

1. Corrective Software Maintenance (more accurately called "repair")

   Corrective software maintenance is necessary when something goes wrong
   in a piece of software including faults and errors. These can have a
   widespread impact on the functionality of the software in general and
   therefore must be addressed as quickly as possible. However, it is
   important to consider repair work separate from the other types of
   maintenance because repair work must get done. Note: this is generally
   the only type of work that happens when a system is put on "life
   support".

2. Perfective Software Maintenance (more accurately called "enhancements")

   Once software is released and is being used new issues and ideas come
   to the surface. Users will think up new features or requirements that
   they would like to see. Perfective software maintenance aims to adjust
   software by adding new features as necessary (and removing features
   that are irrelevant or not effective). This process keeps software
   relevant as the market, and user needs, evolve. It there is funding
   beyond "life support" it usually is spent here.

3. Preventative Software Maintenance (true maintenance is catching problems
before they happen.)

   Preventative software maintenance is looking into the future so that
   your software can keep working as desired for as long as possible. This
   includes making necessary changes, upgrades, and adaptations.
   Preventative software maintenance may address small issues which at the
   given time may lack significance but may turn into larger problems in
   the future. These are called latent faults which need to be detected
   and corrected to make sure that they won’t turn into effective faults.
   This type of maintenance is generally underfunded.

4. Adaptive Software Maintenance (true maintenance adapts to changes)

   Adaptive software maintenance is responding to the changing technology
   landscape, as well as new company policies and rules regarding your
   software. These include operating system changes, using cloud
   technology, security policies, hardware changes, etc. When these
   changes are performed, your software (and possibly architecture) must
   adapt to properly meet new requirements and meet current security and
   other policies.

3. How do you define a reasonable maintenance budget? How can you protect
that budget?

   In the case of the Inca rope bridges what was the cost of maintenance
   annually? Let's assume some of the build work was site preparation and
   building the stone anchors on each side, but most of the work was
   constructing the bridge itself. Since the bridge was entirely replaced
   each year, the maintenance costs could be as much as 80% of the initial
   build effort, every year.

   Comparing to "software as a service" (SaaS) vendors is difficult
   because they have shifted to a subscription model that bundles
   infrastructure, enhancements, and ongoing maintenance. Prior to SaaS
   subscription-based pricing one would typically buy a perpetual license
   plus maintenance at ~20-30% annual cost of the license to obtain
   support and updates.

   Side note: Now that the SaaS annual costs are commingled, some
   enterprises fall into the trap that “building it is cheaper because we
   pay up front but then it will cost less in the long run” assuming the
   "long run" almost always underprices infrastructure and assumes near
   zero maintenance cost. In the case of a brand-new, internally designed
   and developed software system - one that is well architected, well
   designed, well built, and meets all reliability, scalability, and
   performance needs (i.e., fantasy software) it's conceivable that there
   is no maintenance necessary for some period of time - but very
   unlikely.

   So, maintenance costs can have a very wide range. A general rule of
   thumb is 20-30% of the initial build cost will be required for ongoing
   maintenance work annually. However, maintenance costs usually start off
   lower and increase over time. They are also unpredictable costs that
   are hard to budget.

   The challenges should be obvious. First, budgets in large organizations
   tend be last year's budget plus 2-3%. If you start with a maintenance
   budget of zero on a new system, how do you ever get to the point of a
   healthy maintenance budget in the future? Second, maintenance costs are
   unpredictable, and organizations hate unpredictable costs. It's
   impossible to say when the next new hardware, or storage, or
   programming construct will occur, or when the existing system will hit
   a performance or scalability inflection point.

   This is like buying a brand-new car. The maintenance costs are
   negligible in the first couple years, until they start to creep up.
   Then things start to need maintenance, replacement, or repair. As the
   car ages the maintenance costs continue to increase until at some point
   it makes economic sense to buy another new car. Except none of us wait
   that long. Most of us buy new cars before our old one is completely
   worn out. As a counter-example, in Cuba some cars have been maintained
   meticulously for 30-40 years and run better than new.

Protecting your maintenance budget - creating a "maintenance fund"

   We know that maintenance cost increase over time, and the costs of
   proper maintenance are unpredictable. In addition, there is some amount
   of management discretion that can be applied. When your house needs a
   new roof it's reasonable to defer it through summer, but it probably
   needs to be done before winter.

   Since business require predictability of costs, unpredictable
   maintenance costs are easy to defer. "We didn't budget for that; we'll
   have to put it in next year's budget." Except of course in the budget
   process it will compete with other projects and enhancement work, where
   it's again likely to be deprioritized.

   What's the solution? Could it be possible to create some type of
   maintenance fund where a predictable amount is budgeted each year, and
   then spent "unpredictably" when/as needed? Could this also be a
   solution to preventing executives from diverting maintenance budget
   into pet projects by protecting this maintenance fund in some fashion?

4. How do you motivate software engineers to perform maintenance?

   There is a Chinese proverb about a discussion between a king and a
   famous doctor. The well-known doctor explains to the king that his
   brother (who is also a doctor) is superior at medicine, but he is
   unknown because he always successfully treats small illnesses,
   preventing them from evolving into more serious or terminal ones. So,
   people say "Oh he is a fine doctor, but he only treats minor
   illnesses". It's true: [16]Nobody Ever Gets Credit for Fixing Problems
   that Never Happened.

   To most software engineers, legacy systems seem like torturous dead-end
   work, but the reality is systems that are not important get turned off.
   Working on "estate" systems means working on some of the most critical
   systems that exist — computers that govern millions of people’s lives
   in enumerable ways. This is not the work of technical janitors, but
   battlefield surgeons.

   Engineering loves new technology. It gains the engineers attention and
   industry marketability. [17]Boring technology on the other hand is
   great for the company. The engineering cost is lower, and the skills
   are easier to obtain and keep, because these engineers are not being
   pulled out of your organization for double their salary by Amazon or
   Google.

   Well-designed, high-functioning software that is easy to understand
   usually blends in. Simple solutions do not do much to enhance one’s
   personal brand. Therefore, when an organization provides limited
   pathways to promotion for software engineers, they tend to make
   technical decisions that emphasize their individual contribution and
   technical prowess. You have to be very careful to reward what you want
   from your engineering team.

   What earns them the acknowledgment of their peers? What gets people
   seen is what they will ultimately prioritize, even if those behaviors
   are in open conflict with the official direction they receive from
   management. In most organizations shipping new code gets attention,
   while technical debt accrues silently in the background.

   The specific form of acknowledgment also matters a lot. Positive
   reinforcement in the form of social recognition tends to be a more
   effective motivator than the traditional incentive structure of
   promotions, raises, and bonuses. Behavioral economist Dan Ariely
   attributes this to the difference between social markets and
   traditional monetary-based markets. Social markets are governed by
   social norms (read: peer pressure and social capital), and they often
   inspire people to work harder and longer than much more expensive
   incentives that represent the traditional work-for-pay exchange. In
   other words, people will work really hard for positive reinforcement
   from their peers.

Legacy System Modernization

   Unmaintained software will certainly die at some point. Due to factors
   discussed above, software does not always receive the proper amount of
   maintenance to remain healthy. Eventually a larger modernization effort
   may become necessary to restore a system to operational and functional
   excellence.

   Legacy modernization projects start off feeling easy. The organization
   once had a reliable working system and kept it running for years. All
   the modernizing team should need to do is simply reshape it using
   better technology, better architecture, the benefit of hindsight, and
   improved tooling. It should be simple. But, because people do not see
   the hidden technical challenges they are about to uncover, they also
   assume the work will be boring. There’s little glory to be had
   re-implementing a solved problem.

   Modernization projects are also typically the ones organizations just
   want to get out of the way, so they launch into them unprepared for the
   time and resource commitments they require. Modernization projects take
   months, if not years of work. Keeping a team of engineers focused,
   inspired, and motivated from beginning to end is difficult. Keeping
   their senior leadership prepared to invest in what is, in effect,
   something they already have is a huge challenge. Creating momentum and
   sustaining it are where most modernization projects fail.

   The hard part about legacy modernization is the "system around the
   system". The organization, its communication structures, its politics,
   and its incentives are all intertwined with the technical product in
   such a way that to improve the product, you must do it by turning the
   gears of this other, complex, undocumented system. Pay attention to
   politics and culture. Technology is at most only 50% of the legacy
   problem, ways of working, organization structure and
   leadership/sponsorship are just as important to success.

   To do this, you need to overcome people’s natural skepticism and get
   them to buy in. The important word in the phrase "proof of concept" is
   proof. You need to prove to people that success is possible and worth
   doing. It can't be just an MVP, because [18]MVPs are dangerous.. A red
   flag is raised when companies talk about the phases of their
   modernization plans in terms of which technologies they are going to
   use rather than what value they will add.

   For all that people talk about COBOL dying off, it is good at certain
   tasks. The problem with most old COBOL systems is that they were
   designed at a time when COBOL was the only option. Start by sorting
   which parts of the system are in COBOL because COBOL is good at
   performing that task, and which parts are in COBOL because there were
   no other technologies available. Once you have that mapping, start by
   pulling the latter off into separate services that are written and
   designed using the technology we would choose for that task today.

   Going through the exercise of understanding what functionality is fit
   for use for specific languages/technologies not only gives engineers a
   way to keep building their skillsets but also is an opportunity to pair
   with other engineers who have different/complimentary skills. This
   exchange also has the benefit of diffusing the understanding of the
   system to a broader group of people without needing to solely rely on
   documentation (which never exists).

   Counterintuitively, SLAs/SLOs are valuable because they provide a
   "failure budget". When organizations stop aiming for perfection and
   accept that all systems will occasionally fail, they stop letting their
   technology rot for fear of change. In most cases, mean time to recovery
   (MTTR) is a more useful statistic to push than reliability. MTTR tracks
   how long it takes the organization to recover from failure. Resilience
   in engineering is all about recovering stronger from failure. That
   means better monitoring, better documentation, and better processes for
   restoring services, but you can’t improve any of that if you don’t
   occasionally fail.

   Although a system that constantly breaks, or that breaks in unexpected
   ways without warning, will lose its users’ trust, the reverse isn’t
   necessarily true. A system that never breaks doesn’t necessarily
   inspire high degrees of trust - and its maintenance budget is even
   easier to cut.

   People take systems that are too reliable for granted. Italian
   researchers Cristiano Castelfranchi and Rino Falcone have been
   advancing a general model of trust that postulates trust naturally
   degrades over time, regardless of whether any action has been taken to
   violate that trust. Under Castelfranchi and Falcone’s model,
   maintaining trust doesn’t mean establishing a perfect record; it means
   continuing to rack up observations of resilience. If a piece of
   technology is so reliable it has been completely forgotten, it is not
   creating those regular observations. Through no fault of the
   technology, the user’s trust in it slowly deteriorates.

   When both observability and testing are lacking on your legacy system,
   observability comes first. Tests tell you only what shouldn’t fail;
   monitoring tells you what is failing. Don’t forget: a perfect record
   will always be broken, but resilience is an accomplishment that lasts.
   Modern engineering teams use stats like service level objectives, error
   budgets, and mean time to recovery to move the emphasis away from
   avoiding failure and toward recovering quickly.

Summary

   Maintenance mostly happens out of sight, mysteriously. If we notice it,
   it’s a nuisance. When road crews block off sections of highway to fix
   cracks or potholes, we treat it as an obstruction, not a vital and
   necessary process. This is especially true in the public sector: it’s
   almost impossible to get governmental action on, or voter interest in,
   spending on preventive maintenance, yet governments make seemly
   unlimited funds available once we have a disaster. We are okay spending
   a massive amount of money to fix a problem, but consistently resist
   spending a much smaller amount of money to prevent it; as a business
   strategy this makes no sense.

   The [19]Open Mainframe Project estimates that there about 250 billion
   lines of COBOL code running today in the world economy, and nearly all
   COBOL code contains critical business logic. Companies should maintain
   that software and make it last as long as possible.

References

     * [20]Things You Should Never Do, Part I
     * [21]Patterns of Legacy Displacement
     * [22]Kill It with Fire: Manage Aging Computer Systems (and Future
       Proof Modern Ones)
     * [23]Building software to last forever
     * [24]The Disappearing Art Of Maintenance
     * [25]Inca rope bridge
     * [26]How Often Do Commercial Airplanes Need Maintenance?
     * [27]Nobody Ever Gets Credit for Fixing Problems that Never Happened
     * [28]Boring Technology Club
     * [29]Open Mainframe Project 2021 Annual Report
     * [30]How Popular is COBOL?
     __________________________________________________________________

   Image Credit: Bill Gates, CEO of Microsoft, holds Windows 1.0 floppy
   discs.

   (Photo by Deborah Feingold/Corbis via Getty Images) This was the
   release of Windows 1.0. The beginning. Computers evolve. The underlying
   hardware, CPU, memory, and storage evolves. The operating system
   evolves. Of course, the software we use must evolve as well.

Sharing is Caring

   (BUTTON) (BUTTON) (BUTTON)

   [31]Edit this page

   Dan Stroot · Blog
   I love building things. Made in California. Family man, technologist
   and Hacker News aficionado. Eternally curious.
   [32]Join me on Twitter.[33]Join me on LinkedIn.[34]Join me on GitHub.
   Crafted with ♥️ in California. © 2024, [35]Dan Stroot

References

   Visible links:
   1. https://www.danstroot.com/feed.xml
   2. https://www.googletagmanager.com/ns.html?id=GTM-55JC288
   3. https://www.danstroot.com/
   4. https://www.danstroot.com/
   5. https://www.danstroot.com/about
   6. https://www.danstroot.com/archive
   7. https://www.danstroot.com/snippets
   8. https://www.danstroot.com/uses
   9. https://www.danstroot.com/quotes
  10. https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/
  11. https://www.technologyreview.com/2015/08/06/166822/what-is-the-oldest-computer-program-still-in-use/
  12. https://www.guinnessworldrecords.com/world-records/636196-oldest-software-system-in-continuous-use
  13. https://en.wikipedia.org/wiki/Sabre_(travel_reservation_system)
  14. https://en.wikipedia.org/wiki/Inca_rope_bridge
  15. https://www.danstroot.com/posts/2022-06-05-how-software-learns
  16. https://web.mit.edu/nelsonr/www/Repenning=Sterman_CMR_su01_.pdf
  17. https://engineering.atspotify.com/2013/02/in-praise-of-boring-technology/
  18. https://www.danstroot.com/posts/2021-12-27-dangerous-mvps
  19. https://www.openmainframeproject.org/
  20. https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/
  21. https://martinfowler.com/articles/patterns-legacy-displacement/
  22. https://www.amazon.com/Kill-Fire-Manage-Computer-Systems/dp/1718501188
  23. https://herman.bearblog.dev/building-software-to-last-forever/
  24. https://www.noemamag.com/the-disappearing-art-of-maintenance/
  25. https://en.wikipedia.org/wiki/Inca_rope_bridge
  26. https://monroeaerospace.com/blog/how-often-do-commercial-airplanes-need-maintenance/#:~:text=Commercial airplanes require frequent maintenance,inspection once every few years.
  27. https://web.mit.edu/nelsonr/www/Repenning=Sterman_CMR_su01_.pdf
  28. https://boringtechnology.club/
  29. https://www.openmainframeproject.org/wp-content/uploads/sites/11/2022/04/OMP_Annual_Report_2021_040622.pdf
  30. https://news.ycombinator.com/item?id=33999718
  31. https://github.com/dstroot/blog-next-13/blob/master/content/posts/2023-05-25-making_software_last_forever.mdx
  32. https://twitter.com/danstroot
  33. https://www.linkedin.com/in/danstroot
  34. https://github.com/dstroot/blog-next
  35. https://www.danstroot.com/analytics

   Hidden links:
  37. https://www.danstroot.com/search
  38. https://www.danstroot.com/about