From 14c4b8ef1a7613726dba7247da72d3cf2aa65520 Mon Sep 17 00:00:00 2001 From: David Eisinger Date: Sun, 12 Nov 2023 23:45:25 -0500 Subject: [PATCH] Making Software Last Forever --- content/notes/first-it-must-work/index.md | 8 +- static/archive/www-danstroot-com-wwjfi6.txt | 716 ++++++++++++++++++++ 2 files changed, 723 insertions(+), 1 deletion(-) create mode 100644 static/archive/www-danstroot-com-wwjfi6.txt diff --git a/content/notes/first-it-must-work/index.md b/content/notes/first-it-must-work/index.md index 80c542b..540712a 100644 --- a/content/notes/first-it-must-work/index.md +++ b/content/notes/first-it-must-work/index.md @@ -47,6 +47,10 @@ references: url: https://aaronmbushnell.com/keep-your-stack-short/ date: 2023-10-25T14:21:17Z file: aaronmbushnell-com-x5e6nn.txt +- title: "Dan Stroot · Blog · Making Software Last Forever" + url: https://www.danstroot.com/posts/2023-05-25-making_software_last_forever + date: 2023-11-13T04:44:27Z + file: www-danstroot-com-wwjfi6.txt --- ### Thoughts on priorities in software development @@ -71,6 +75,7 @@ references: * [There's still no silver bullet][9] * [No one actually wants simplicity][10] * [Keep your stack short][11] +* [Making Software Last Forever][12] [3]: https://grugbrain.dev/ [4]: https://world.hey.com/dhh/even-amazon-can-t-make-sense-of-serverless-or-microservices-59625580 @@ -80,4 +85,5 @@ references: [8]: https://mcfunley.com/choose-boring-technology [9]: https://changelog.com/posts/still-no-silver-bullet [10]: https://lukeplant.me.uk/blog/posts/no-one-actually-wants-simplicity/ -[11]: https://aaronmbushnell.com/keep-your-stack-short/ \ No newline at end of file +[11]: https://aaronmbushnell.com/keep-your-stack-short/ +[12]: https://www.danstroot.com/posts/2023-05-25-making_software_last_forever diff --git a/static/archive/www-danstroot-com-wwjfi6.txt b/static/archive/www-danstroot-com-wwjfi6.txt new file mode 100644 index 0000000..92344f0 --- /dev/null +++ b/static/archive/www-danstroot-com-wwjfi6.txt @@ -0,0 +1,716 @@ + #[1]Blog Posts RSS + + IFRAME: [2]https://www.googletagmanager.com/ns.html?id=GTM-55JC288 + + [3]Dan Stroot + [4]Home[5]About[6]Archive[7]Snippets[8]Uses + + (BUTTON) Toggle Menu + +Making Software Last Forever + + Hero image for Making Software Last Forever + 27 min read + Dan Stroot + Dan Stroot + May 25, 2023 + + How many of us have bought a new home because our prior home was not + quite meeting our needs? Maybe we needed an extra bedroom, or wanted a + bigger backyard? Now, as a thought experiment, assume you couldn't sell + your existing home. If you bought a new home, you'd have to "retire" or + "decommission" your prior home (and your investment in it). Does that + change your thinking? + + Further, imagine you had a team of five people maintaining your prior + home, improving it, and keeping it updated, for the last ten years. + You'd have a cumulative investment of 50 person/years in your existing + home (5 people x 10 years) just in maintenance, on top of the initial + investment. If each person was paid the equivalent of a software + developer (we'll use $200k to include benefits, office space, + leadership, etc.) you'd have an investment just in labor of $10 million + dollars (50 person/years x $200,000). Would you walk away from that + investment? + + When companies decide to re-write or replace an existing software + application, they are making a similar decision. Existing software is + "retired" or "decommissioned" (along with its cumulative investment). + Yet the belief that new code is always better than old is patently + absurd. Old code has weathered and withstood the test of time. It has + been battle-tested. You know it's failure modes. Bugs have been found, + and more importantly, fixed. + + Joel Spolsky (of Fog Creek Software and Stack Overflow) describes + system re-writes in "[9]Things You Should Never Do, Part I" as “the + single worst strategic mistake that any software company can make.” + + Continuing our home analogy, recent price increases for construction + materials like lumber, drywall, and wiring (and frankly everything + else) should, according to Economics 101, cause us to treat our current + homes more dearly. Similarly, price increases for quality software + engineers should force companies to treat existing software more + dearly. + + Lots of current software started out as C software from the 1980s. + Engineers don't often write software with portability as a goal at the + beginning, but once something is relatively portable, it tends to stay + that way. Code that was well designed and written often migrated from + mini-computers to i386, from i386 to amd64, and now ARM and arch64, + with a minimum of redesign or effort. You can take large, complicated + programs from the 1980s written in C, and compile/run them on a modern + Linux computer - even when the modern computer is running architectures + which hadn't even been dreamt of when the software was originally + written. + + Why can't software last forever? It's not made of wood, concrete, or + steel. It doesn't "wear out", rot, weather, or rust. A working + algorithm is a working algorithm. Technology doesn’t need to be + beautiful, or impress other people, to be effective. Aren't + technologists ultimately in the business of producing cost effective + technology? + + I am going to attempt to convince you that maintaining your existing + systems is one the most cost-effective technology investments you can + make. + +The World's Oldest Software Systems + + In 1958, the United States Department of Defense launched a new + computer-based contract management system called "Mechanization of + Contract Administration Services", or MOCAS (pronounced “MOH-cass”). In + 2015, [10]MIT Technology Review stated that MOCAS was the oldest + computer program in continuous use they could verify. At that time + MOCAS managed about $1.3 trillion in government obligations and 340,000 + contracts. + + According to the [11]Guinness Book of World Records, the oldest + software system in use today is either the [12]SABRE Airline + Reservation System (introduced in 1960), or the IRS Individual Master + File (IMF) and Business Master File (BMF) systems introduced in + 1962–63. + + SABRE went online in 1960. It had cost $40 million to develop and + install (about $400 million in 2022 dollars). The system took over all + American Airlines booking functions in 1964, and the system was + expanded to provide access to external travel agents in 1976. + + What is the secret to the long lifespan of these systems? Shouldn't + companies with long-lived products (annuities, life insurance, etc.) + study these examples? After all, they need systems to support products + that last most of a human lifespan. However, shouldn't all companies + want to their investments in software to last as long as possible? + +Maintenance is About Making Something Last + + We spoke of SABRE above, and we know that airlines recognize the value + of maintenance. Commercial aircraft are inspected at least once every + two days. Engines, hydraulics, environmental, and electrical systems + all have additional maintenance schedules. A "heavy" maintenance + inspection occurs once every few years. This process maintains the + aircraft's service life over decades. + + On average, an aircraft is operable for about 30 years before it must + be retired. A Boeing 747 can endure 35,000 pressurization cycles — + roughly 135,000 to 165,000 flight hours — before metal fatigue sets in. + However, most older airframes are retired for fuel-efficiency reasons, + not because they're worn out. + + Even stuctures made of grass can last indefinitely. [13]Inca rope + bridges were simple suspension bridges constructed by the Inca Empire. + The bridges were an integral part of the Inca road system were + constructed using ichu grass. + + Inca Rope Bridge + + Even though they were made of grass, these bridges were maintained with + such regularity and attention they lasted centuries. The bridge's + strength and reliability came from the fact that each cable was + replaced every June. + + The goal of maintenance is catching problems before they happen. That’s + the difference between maintenance and repair. Repair is about fixing + something that’s already broken. Maintenance is about making something + last. + +Unfortunately, Maintenance is Chronically Undervalued + + Maintenance is one of the easiest things to cut when budgets get tight. + Some legacy software systems have decades of underinvestment in + maintenance. This leads up to the inevitable "we have to replace it" + discussion - which somehow always sounds more persuasive (even though + it’s more expensive and riskier) than arguing to invest in system + rehabilitation and deferred system maintenance. + + Executives generally can't refuse "repair" work because the system is + broken and must be fixed. However, maintenance is a tougher sell. It’s + not strictly necessary — or at least it doesn’t seem to be until things + start falling apart. It is so easy to divert maintenance budget into a + halo project that gets an executive noticed (and possibly promoted) + before the long-term effects of underinvestment in maintenance become + visible. Even worse, the executive is also admired for reducing the + costs of maintenance and switching costs from "run" to "grow" - while + they are torpedoing the company under the waterline. + + The other challenge is conflating enhancement work with maintenance + work. Imagine you have $1,000 and you want to add a sunroof to your + car, but you also need new tires (which coincidentally also cost + $1,000). You have to replace the tires every so often, but a sunroof is + "forever" right? If you spend the money on the sunroof the tires could + get replaced next month, or maybe the month after - they'll last a + couple more months, won't they? + + With software, users can't see "the bald tires" - they only thing they + see, or experience (and value), are new features and capabilities. + Pressure is always present to cut costs and to add new features. The + result is budget always swings away from maintenance work towards + enhancements. + + Finally, maintenance work is typically an operational cost, yet + building a new system, or a significant new feature, can often be + capitalized - making the future costs someone else's problem. + +Risks of Replacing Software Systems + + It's usually not the design or the age of a system that causes it to + fail but rather neglect. People fail to maintain software systems + because they are not given the time, incentives, or resources to + maintain them. + + "Most of the systems I work on rescuing are not badly built. They + are badly maintained." + — Marianne Bellotti, Kill it With Fire + + Once a system degrades it is an enormous challenge to fund deferred + maintenance (or "technical debt"). No one plans for it, no one wants to + pay for it, and no engineer wants to do it. Initiatives to restore + operational excellence, much the way one would fix up an old house, + tend to have few volunteers among engineering teams. No one gets + noticed doing maintenance. No one ever gets promoted because of + maintenance. + + It should be clear why engineers prefer to re-write a system rather + than maintain it. They get to "write a new story" rather than edit + someone else's. They will attempt to convince a senior executive to + fund a project to replace a problematic system by describing all the + new features and capabilities that could be added as well as how "bad" + the existing, unmaintained, system has become. Further, they will get + to use modern technology that makes them much more valuable in the + market. + + Incentives aside, engineering teams tend to gravitate toward system + rewrites because they incorrectly think of old systems as specs. They + assume that since an old system works, the functional risks have been + eliminated. They can focus on adding more features to the new system or + make changes to the underlying architecture without worry. Either they + do not perceive the ambiguity these changes introduce, or they see such + ambiguity positively, imagining only gains in performance and the + potential for innovation. + + Why not authorize that multimillion-dollar replacement if the engineers + convince management the existing system is doomed? Eventually a + "replacement" project will be funded (typically at a much higher + expenditure than rehabilitating the existing system). Even if the + executives are not listening to the engineers, they will be listening + to external consultants telling them they are falling behind. + + What do you do with the old system while you’re building the new one? + Most organizations put the old system on “life support” and give it + only the resources for patches and fixes necessary to keep it running. + This reduces maintenance even further and becomes a self-fulfilling + prophecy that the existing system will eventually fail. + + Who gets to work on the new system, and who takes on the maintenance + tasks of the old system? If the old system is written in older + technology that the company is actively abandoning, the team + maintaining the old system is essentially sitting around waiting to be + fired. And don’t kid yourself, they know it. If the people maintaining + the old system are not participating in the creation of the new system, + you should expect that they are also looking for new jobs. If they + leave before your new system is operational, you lose both their + expertise and their institutional knowledge. + + If the new project falls behind schedule (and it almost certainly + will), the existing system continues to degrade, and knowledge + continues to walk out the door. If the new project fails and is + subsequently canceled, the gap between the legacy system and + operational excellence has widened significantly in the meantime. + + This explains why executives are loathe to cancel system replacement + projects even when they are obviously years behind schedule and failing + to live up to expectations. Stopping the replacement project seems + impossible because the legacy system is now so degraded that restoring + it to operational excellence seems impossible. Plus, politically + canceling a marquee project can be career suicide for the sponsoring + executive(s). Much better to do "deep dives" and "assessments" on why + the project is failing and soldier on than cancel it. + + The interim state is not pretty. The company now has two systems to + operate, much higher costs and new risks. + * The new system will have high costs, limited functionality, new and + unique errors/issues, and lower volumes (so the "per unit cost" of + the new system will be quite high). + * The older system will still be running most of the business, and + usually all of the complex business, while having lost its best + engineers and subject matter experts. Its maintenance budget will + have been whittled down to nothing to redirect spending to + implement (save?) the new system. This system will be in grave + danger to significant system failure (which proponents of the new + system will use to justify the investment in the new system, not + admitting to a self-fulfilling prophecy). + + Neither system will exhibit operational excellence, and both put the + organization at significant risk in addition to the higher costs and + complexity of running two systems. + +Maintaining Software to Last Forever + + As I discussed in [14]How Software Learns, software adapts over time - + as it is continually refined and reshaped by maintenance and + enhancements. Maintenance is crucial to software's lifespan and + business relevance/value. When software systems are first developed, + they are based on a prediction of the future - a prediction of the + future that we know is wrong even as we make it. No set of requirements + have ever been perfect. However, all new systems become "less wrong" as + time, experience, and knowledge are continually added (e.g., + maintenance). + + Futureproofing means constantly rethinking and iterating on the + existing system. We know from both research and experience that + iterating and maintaining existing solutions is a much more likely, and + less expensive, way to improve software's lifespan and functionality. + + Before choosing to replace a system that needs deferred maintenance + remember it’s the lack of maintenance that create the impression that + failure is inevitable, and pushes otherwise rational engineers and + executives toward rewrites or replacements. What mechanisms will + prevent lack of maintenance from eventually dooming the brand-new + system? Has the true root problem been addressed? + + Robust maintenance practices could preserve software for decades, but + first maintenance must be valued, funded, and applied. To maintain + software properly we have to consider: + 1. How do you measure the overall health of a system? + 2. How do you define and manage maintenance work? + 3. How do you define a reasonable maintenance budget? How can you + protect that budget? + 4. How do you motivate engineers to perform maintenance? + +1. How do you measure the overall health of a system? + +Objective measures + + 1. Maintenance Backlog — If you added up all the open work requests, + including work the software engineers deem necessary to eliminate + technical debt, what is the total amount of effort? Now, divide + that by the team capacity. For example, imagine you have a total + amount of work of 560 days, and you have one person assigned to + support the system - they work approximately 200 days annually. The + backlog in days in 560, but in time it is 2.8 years (560 days / 200 + days/year = 2.8 years). What is a reasonable amount of backlog + time? + 2. System Reliability/Downtime — If you added up all the time the + system is down in a given period, what is the total amount? What is + the user or customer impact of that downtime? Conversely, what + would reducing that downtime be worth? What is the relationship of + maintenance and downtime? In other words, does the system need to + be taken down to maintain it (planned maintenance)? Does planned + maintenance reduce unplanned downtime? + 3. Capacity/Performance Constraints — Is the existing hitting capacity + constraints that will prevent future growth of the business? How + unpredictable are the system capacity demands? What is the customer + experience when the system capacity is breached? What is + relationship between hardware and software that constrains the + system? Is the software performant? Can hardware solve the problem? + +Subjective measures + + 1. User Satisfaction: User satisfaction includes both how happy your + employees are with the applications and/or how well those + applications meet your customer's needs. Many times I have found + the technology team and the business users arguing over "bug" vs. + "enhancement". It is a way of assigning blame. "Bug" means its + engineering's fault, "enhancement" means it was a missed + requirement. When emotions run hot it means that the maintenance + budget is insufficient. I always tell everyone they are both just + maintenance and the only important decision is which to prioritize + and fix first. + 2. “Shadow IT” — If you used applications in the past that didn’t meet + employees’ needs, and didn’t have a good governance plan to address + problems, you may have noticed employees found other solutions on + their own. This is an indication of underfunded maintenance. + 3. Adaptable Architecture — "The cloud", API-based integration, and + unlocking your data are no longer “nice to haves.” Your + architecture needs to adapt. If these are challenges, then the + architecture must be addressed. + 4. Governance — Healthy application architecture isn’t just about + technology—it’s also about having well-documented and + well-understood governance documents that guide technology + investments for your organization. Good governance helps create + adaptable architecture and avoid “shadow IT” applications. + +2. How do you define maintenance work? + + There are four general types of software maintenance. The first two + types take up the majority of most organizations' maintenance budget, + and may not even be considered maintenance - however, all four types + must be funded adequately for software to remain healthy. If you can't + fully address types three and four your maintenance budget is + inadequate. + +1. Corrective Software Maintenance (more accurately called "repair") + + Corrective software maintenance is necessary when something goes wrong + in a piece of software including faults and errors. These can have a + widespread impact on the functionality of the software in general and + therefore must be addressed as quickly as possible. However, it is + important to consider repair work separate from the other types of + maintenance because repair work must get done. Note: this is generally + the only type of work that happens when a system is put on "life + support". + +2. Perfective Software Maintenance (more accurately called "enhancements") + + Once software is released and is being used new issues and ideas come + to the surface. Users will think up new features or requirements that + they would like to see. Perfective software maintenance aims to adjust + software by adding new features as necessary (and removing features + that are irrelevant or not effective). This process keeps software + relevant as the market, and user needs, evolve. It there is funding + beyond "life support" it usually is spent here. + +3. Preventative Software Maintenance (true maintenance is catching problems +before they happen.) + + Preventative software maintenance is looking into the future so that + your software can keep working as desired for as long as possible. This + includes making necessary changes, upgrades, and adaptations. + Preventative software maintenance may address small issues which at the + given time may lack significance but may turn into larger problems in + the future. These are called latent faults which need to be detected + and corrected to make sure that they won’t turn into effective faults. + This type of maintenance is generally underfunded. + +4. Adaptive Software Maintenance (true maintenance adapts to changes) + + Adaptive software maintenance is responding to the changing technology + landscape, as well as new company policies and rules regarding your + software. These include operating system changes, using cloud + technology, security policies, hardware changes, etc. When these + changes are performed, your software (and possibly architecture) must + adapt to properly meet new requirements and meet current security and + other policies. + +3. How do you define a reasonable maintenance budget? How can you protect +that budget? + + In the case of the Inca rope bridges what was the cost of maintenance + annually? Let's assume some of the build work was site preparation and + building the stone anchors on each side, but most of the work was + constructing the bridge itself. Since the bridge was entirely replaced + each year, the maintenance costs could be as much as 80% of the initial + build effort, every year. + + Comparing to "software as a service" (SaaS) vendors is difficult + because they have shifted to a subscription model that bundles + infrastructure, enhancements, and ongoing maintenance. Prior to SaaS + subscription-based pricing one would typically buy a perpetual license + plus maintenance at ~20-30% annual cost of the license to obtain + support and updates. + + Side note: Now that the SaaS annual costs are commingled, some + enterprises fall into the trap that “building it is cheaper because we + pay up front but then it will cost less in the long run” assuming the + "long run" almost always underprices infrastructure and assumes near + zero maintenance cost. In the case of a brand-new, internally designed + and developed software system - one that is well architected, well + designed, well built, and meets all reliability, scalability, and + performance needs (i.e., fantasy software) it's conceivable that there + is no maintenance necessary for some period of time - but very + unlikely. + + So, maintenance costs can have a very wide range. A general rule of + thumb is 20-30% of the initial build cost will be required for ongoing + maintenance work annually. However, maintenance costs usually start off + lower and increase over time. They are also unpredictable costs that + are hard to budget. + + The challenges should be obvious. First, budgets in large organizations + tend be last year's budget plus 2-3%. If you start with a maintenance + budget of zero on a new system, how do you ever get to the point of a + healthy maintenance budget in the future? Second, maintenance costs are + unpredictable, and organizations hate unpredictable costs. It's + impossible to say when the next new hardware, or storage, or + programming construct will occur, or when the existing system will hit + a performance or scalability inflection point. + + This is like buying a brand-new car. The maintenance costs are + negligible in the first couple years, until they start to creep up. + Then things start to need maintenance, replacement, or repair. As the + car ages the maintenance costs continue to increase until at some point + it makes economic sense to buy another new car. Except none of us wait + that long. Most of us buy new cars before our old one is completely + worn out. As a counter-example, in Cuba some cars have been maintained + meticulously for 30-40 years and run better than new. + +Protecting your maintenance budget - creating a "maintenance fund" + + We know that maintenance cost increase over time, and the costs of + proper maintenance are unpredictable. In addition, there is some amount + of management discretion that can be applied. When your house needs a + new roof it's reasonable to defer it through summer, but it probably + needs to be done before winter. + + Since business require predictability of costs, unpredictable + maintenance costs are easy to defer. "We didn't budget for that; we'll + have to put it in next year's budget." Except of course in the budget + process it will compete with other projects and enhancement work, where + it's again likely to be deprioritized. + + What's the solution? Could it be possible to create some type of + maintenance fund where a predictable amount is budgeted each year, and + then spent "unpredictably" when/as needed? Could this also be a + solution to preventing executives from diverting maintenance budget + into pet projects by protecting this maintenance fund in some fashion? + +4. How do you motivate software engineers to perform maintenance? + + There is a Chinese proverb about a discussion between a king and a + famous doctor. The well-known doctor explains to the king that his + brother (who is also a doctor) is superior at medicine, but he is + unknown because he always successfully treats small illnesses, + preventing them from evolving into more serious or terminal ones. So, + people say "Oh he is a fine doctor, but he only treats minor + illnesses". It's true: [15]Nobody Ever Gets Credit for Fixing Problems + that Never Happened. + + To most software engineers, legacy systems seem like torturous dead-end + work, but the reality is systems that are not important get turned off. + Working on "estate" systems means working on some of the most critical + systems that exist — computers that govern millions of people’s lives + in enumerable ways. This is not the work of technical janitors, but + battlefield surgeons. + + Engineering loves new technology. It gains the engineers attention and + industry marketability. [16]Boring technology on the other hand is + great for the company. The engineering cost is lower, and the skills + are easier to obtain and keep, because these engineers are not being + pulled out of your organization for double their salary by Amazon or + Google. + + Well-designed, high-functioning software that is easy to understand + usually blends in. Simple solutions do not do much to enhance one’s + personal brand. Therefore, when an organization provides limited + pathways to promotion for software engineers, they tend to make + technical decisions that emphasize their individual contribution and + technical prowess. You have to be very careful to reward what you want + from your engineering team. + + What earns them the acknowledgment of their peers? What gets people + seen is what they will ultimately prioritize, even if those behaviors + are in open conflict with the official direction they receive from + management. In most organizations shipping new code gets attention, + while technical debt accrues silently in the background. + + The specific form of acknowledgment also matters a lot. Positive + reinforcement in the form of social recognition tends to be a more + effective motivator than the traditional incentive structure of + promotions, raises, and bonuses. Behavioral economist Dan Ariely + attributes this to the difference between social markets and + traditional monetary-based markets. Social markets are governed by + social norms (read: peer pressure and social capital), and they often + inspire people to work harder and longer than much more expensive + incentives that represent the traditional work-for-pay exchange. In + other words, people will work really hard for positive reinforcement + from their peers. + +Legacy System Modernization + + Unmaintained software will certainly die at some point. Due to factors + discussed above, software does not always receive the proper amount of + maintenance to remain healthy. Eventually a larger modernization effort + may become necessary to restore a system to operational and functional + excellence. + + Legacy modernization projects start off feeling easy. The organization + once had a reliable working system and kept it running for years. All + the modernizing team should need to do is simply reshape it using + better technology, better architecture, the benefit of hindsight, and + improved tooling. It should be simple. But, because people do not see + the hidden technical challenges they are about to uncover, they also + assume the work will be boring. There’s little glory to be had + re-implementing a solved problem. + + Modernization projects are also typically the ones organizations just + want to get out of the way, so they launch into them unprepared for the + time and resource commitments they require. Modernization projects take + months, if not years of work. Keeping a team of engineers focused, + inspired, and motivated from beginning to end is difficult. Keeping + their senior leadership prepared to invest in what is, in effect, + something they already have is a huge challenge. Creating momentum and + sustaining it are where most modernization projects fail. + + The hard part about legacy modernization is the "system around the + system". The organization, its communication structures, its politics, + and its incentives are all intertwined with the technical product in + such a way that to improve the product, you must do it by turning the + gears of this other, complex, undocumented system. Pay attention to + politics and culture. Technology is at most only 50% of the legacy + problem, ways of working, organization structure and + leadership/sponsorship are just as important to success. + + To do this, you need to overcome people’s natural skepticism and get + them to buy in. The important word in the phrase "proof of concept" is + proof. You need to prove to people that success is possible and worth + doing. It can't be just an MVP, because [17]MVPs are dangerous.. A red + flag is raised when companies talk about the phases of their + modernization plans in terms of which technologies they are going to + use rather than what value they will add. + + For all that people talk about COBOL dying off, it is good at certain + tasks. The problem with most old COBOL systems is that they were + designed at a time when COBOL was the only option. Start by sorting + which parts of the system are in COBOL because COBOL is good at + performing that task, and which parts are in COBOL because there were + no other technologies available. Once you have that mapping, start by + pulling the latter off into separate services that are written and + designed using the technology we would choose for that task today. + + Going through the exercise of understanding what functionality is fit + for use for specific languages/technologies not only gives engineers a + way to keep building their skillsets but also is an opportunity to pair + with other engineers who have different/complimentary skills. This + exchange also has the benefit of diffusing the understanding of the + system to a broader group of people without needing to solely rely on + documentation (which never exists). + + Counterintuitively, SLAs/SLOs are valuable because they provide a + "failure budget". When organizations stop aiming for perfection and + accept that all systems will occasionally fail, they stop letting their + technology rot for fear of change. In most cases, mean time to recovery + (MTTR) is a more useful statistic to push than reliability. MTTR tracks + how long it takes the organization to recover from failure. Resilience + in engineering is all about recovering stronger from failure. That + means better monitoring, better documentation, and better processes for + restoring services, but you can’t improve any of that if you don’t + occasionally fail. + + Although a system that constantly breaks, or that breaks in unexpected + ways without warning, will lose its users’ trust, the reverse isn’t + necessarily true. A system that never breaks doesn’t necessarily + inspire high degrees of trust - and its maintenance budget is even + easier to cut. + + People take systems that are too reliable for granted. Italian + researchers Cristiano Castelfranchi and Rino Falcone have been + advancing a general model of trust that postulates trust naturally + degrades over time, regardless of whether any action has been taken to + violate that trust. Under Castelfranchi and Falcone’s model, + maintaining trust doesn’t mean establishing a perfect record; it means + continuing to rack up observations of resilience. If a piece of + technology is so reliable it has been completely forgotten, it is not + creating those regular observations. Through no fault of the + technology, the user’s trust in it slowly deteriorates. + + When both observability and testing are lacking on your legacy system, + observability comes first. Tests tell you only what shouldn’t fail; + monitoring tells you what is failing. Don’t forget: a perfect record + will always be broken, but resilience is an accomplishment that lasts. + Modern engineering teams use stats like service level objectives, error + budgets, and mean time to recovery to move the emphasis away from + avoiding failure and toward recovering quickly. + +Summary + + Maintenance mostly happens out of sight, mysteriously. If we notice it, + it’s a nuisance. When road crews block off sections of highway to fix + cracks or potholes, we treat it as an obstruction, not a vital and + necessary process. This is especially true in the public sector: it’s + almost impossible to get governmental action on, or voter interest in, + spending on preventive maintenance, yet governments make seemly + unlimited funds available once we have a disaster. We are okay spending + a massive amount of money to fix a problem, but consistently resist + spending a much smaller amount of money to prevent it; as a business + strategy this makes no sense. + + The [18]Open Mainframe Project estimates that there about 250 billion + lines of COBOL code running today in the world economy, and nearly all + COBOL code contains critical business logic. Companies should maintain + that software and make it last as long as possible. + +References + + * [19]Things You Should Never Do, Part I + * [20]Patterns of Legacy Displacement + * [21]Kill It with Fire: Manage Aging Computer Systems (and Future + Proof Modern Ones) + * [22]Building software to last forever + * [23]The Disappearing Art Of Maintenance + * [24]Inca rope bridge + * [25]How Often Do Commercial Airplanes Need Maintenance? + * [26]Nobody Ever Gets Credit for Fixing Problems that Never Happened + * [27]Boring Technology Club + * [28]Open Mainframe Project 2021 Annual Report + * [29]How Popular is COBOL? + __________________________________________________________________ + + Image Credit: Bill Gates, CEO of Microsoft, holds Windows 1.0 floppy + discs. + + (Photo by Deborah Feingold/Corbis via Getty Images) This was the + release of Windows 1.0. The beginning. Computers evolve. The underlying + hardware, CPU, memory, and storage evolves. The operating system + evolves. Of course, the software we use must evolve as well. + +Sharing is Caring + + (BUTTON) (BUTTON) (BUTTON) + + [30]Edit this page + + Dan Stroot · Blog + I love building things. Made in California. Family man, technologist + and Hacker News aficionado. Eternally curious. + [31]Join me on Twitter.[32]Join me on LinkedIn.[33]Join me on GitHub. + Crafted with ♥️ in California. © 2023, [34]Dan Stroot + +References + + Visible links: + 1. https://www.danstroot.com/feed.xml + 2. https://www.googletagmanager.com/ns.html?id=GTM-55JC288 + 3. file:/// + 4. file:/// + 5. file:///about + 6. file:///archive + 7. file:///snippets + 8. file:///uses + 9. https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/ + 10. https://www.technologyreview.com/2015/08/06/166822/what-is-the-oldest-computer-program-still-in-use/ + 11. https://www.guinnessworldrecords.com/world-records/636196-oldest-software-system-in-continuous-use + 12. https://en.wikipedia.org/wiki/Sabre_(travel_reservation_system) + 13. https://en.wikipedia.org/wiki/Inca_rope_bridge + 14. file:///posts/2022-06-05-how-software-learns + 15. https://web.mit.edu/nelsonr/www/Repenning=Sterman_CMR_su01_.pdf + 16. https://engineering.atspotify.com/2013/02/in-praise-of-boring-technology/ + 17. file:///posts/2021-12-27-dangerous-mvps + 18. https://www.openmainframeproject.org/ + 19. https://www.joelonsoftware.com/2000/04/06/things-you-should-never-do-part-i/ + 20. https://martinfowler.com/articles/patterns-legacy-displacement/ + 21. https://www.amazon.com/Kill-Fire-Manage-Computer-Systems/dp/1718501188 + 22. https://herman.bearblog.dev/building-software-to-last-forever/ + 23. https://www.noemamag.com/the-disappearing-art-of-maintenance/ + 24. https://en.wikipedia.org/wiki/Inca_rope_bridge + 25. https://monroeaerospace.com/blog/how-often-do-commercial-airplanes-need-maintenance/#:~:text=Commercial airplanes require frequent maintenance,inspection once every few years. + 26. https://web.mit.edu/nelsonr/www/Repenning=Sterman_CMR_su01_.pdf + 27. https://boringtechnology.club/ + 28. https://www.openmainframeproject.org/wp-content/uploads/sites/11/2022/04/OMP_Annual_Report_2021_040622.pdf + 29. https://news.ycombinator.com/item?id=33999718 + 30. https://github.com/dstroot/blog-next-13/blob/master/content/posts/2023-05-25-making_software_last_forever.mdx + 31. https://twitter.com/danstroot + 32. https://www.linkedin.com/in/danstroot + 33. https://github.com/dstroot/blog-next + 34. file:///analytics + + Hidden links: + 36. file://localhost/search + 37. file://localhost/about