From 4b5e7ba111f2aeda188cbbb27fcbd9a5552fb271 Mon Sep 17 00:00:00 2001 From: David Eisinger Date: Thu, 30 Nov 2023 09:52:42 -0500 Subject: [PATCH] Add 'good tests' post --- .../maintenance-matters-good-tests/index.md | 227 ++++++++++++++++++ 1 file changed, 227 insertions(+) create mode 100644 content/elsewhere/maintenance-matters-good-tests/index.md diff --git a/content/elsewhere/maintenance-matters-good-tests/index.md b/content/elsewhere/maintenance-matters-good-tests/index.md new file mode 100644 index 0000000..e818aaa --- /dev/null +++ b/content/elsewhere/maintenance-matters-good-tests/index.md @@ -0,0 +1,227 @@ +--- +title: "Maintenance Matters: Good Tests" +date: 2023-11-29T09:41:18-05:00 +draft: false +canonical_url: https://www.viget.com/articles/maintenance-matters-good-tests/ +references: +- title: "A year of Rails - macwright.com" + url: https://macwright.com/2021/02/18/a-year-of-rails.html + date: 2023-07-03T02:52:03Z + file: macwright-com-o4dndf.txt +--- + +*This article is part of a series focusing on how developers can center +and streamline software maintenance. The other articles in the +Maintenance Matters series are: [Continuous +Integration](/elsewhere/maintenance-matters-continuous-integration/), +[Code +Coverage](https://www.viget.com/articles/maintenance-matters-code-coverage/), +[Documentation](https://www.viget.com/articles/maintenance-matters-documentation/), +[Default +Formatting](https://www.viget.com/articles/maintenance-matters-default-formatting/), [Building +Helpful +Logs](https://www.viget.com/articles/maintenance-matters-helpful-logs/), +[Timely +Upgrades](https://www.viget.com/articles/maintenance-matters-timely-upgrades/), +and [Code +Reviews](https://www.viget.com/articles/maintenance-matters-code-reviews/).* + +In this latest entry to our [Maintenance +Matters](https://www.viget.com/articles/maintenance-matters/) series, I +want to talk about automated testing. Annie said it well in her intro +post: + +> There is a lot to say about testing, but from a maintainer's +> perspective, let's define good tests as tests that prevent +> regressions. Unit tests should have clear expectations and fail when +> behavior changes, so a developer can either update the expectations or +> fix their code. Feature tests should pass when features work and break +> when features break. + +This is a topic better suited to a book than a blog post (and indeed +[there are +many](https://bookshop.org/search?keywords=software+testing)), but I do +think there are a few high-level concepts that are important to +internalize in order to build robust, long-lasting software --- I hope +to cover them here. + +My first exposure to automated testing was with Ruby on Rails. Since +then, I've written production software in many different languages, but +nothing matches the Rails testing story. Tom Macwright said it well in +["A year of +Rails"](https://macwright.com/2021/02/18/a-year-of-rails.html): + +> Testing fully-server-rendered applications, on the other hand, is +> amazing. A vanilla testing setup with Rails & RSpec can give you fast, +> stable, concise, and actually-useful test coverage. You can actually +> assert for behavior and navigate through an application like a user +> would. These tests are solving a simpler problem - making requests and +> parsing responses, without the need for a full browser or headless +> browser, without multiple kinds of state to track. + +Partly, I think Rails testing is so good because it's baked into the +framework: run `rails generate` to create a new model or controller and +the relevant test files are generated automatically. This helped +establish a community focus on testing, which led to a robust +third-party ecosystem around it. Additionally, Ruby is such a flexible +language that automated testing is really the only viable way to ensure +things are working as expected. + +This post isn't about Rails testing specifically, but I wanted to be +clear on my perspective before we really dive in. And with that out of +the way, here's what we'll cover: + +1. [Why Test?](#why-test) +2. [Types of Tests](#types-of-tests) +3. [Network Calls](#network-calls) +4. [Flaky Tests](#flaky-tests) +5. [Slow Tests](#slow-tests) +6. [App Code vs. Test Code](#app-code-vstest-code) + +------------------------------------------------------------------------ + +### Why Test? + +The single most important reason to make automated testing part of your +development process is that it **gives you confidence to make changes**. +This gets more and more important over time. With a reliable test suite +in place, you can refactor code, change functionality, and make upgrades +with reasonable certainty that you haven't broken anything. Without good +tests ... good luck. + +Secondarily, testing: + +- helps during the development process (testable code is correlated + with well-factored code, and it's a good way to review your work + before you ship it off); +- provides a guide to code reviewers; and +- serves as a kind of documentation (though not a particularly concise + one, and not as a replacement for proper written docs). + +### Types of Tests + +I write two main kinds of tests, which I call **unit tests** and +**integration tests**, though my definitions differ slightly from the +original meanings. + +- **Unit tests** call application code directly -- instantiate an + object, call a method on it, make assertions about the result. I + don't particularly care what the object under test does in the + course of doing its work -- calling off to other objects, performing + I/O, etc. (this is where I differ from the official definition). +- **Integration tests** test the entire system end-to-end, using a + framework like [Capybara](https://teamcapybara.github.io/capybara/) + or [Playwright](https://playwright.dev/). We sometimes refer to + these as "feature" tests in our codebases. + +End-to-end, black-box integration tests are absolutely critical and can +cover most of your application's functionality by themselves. But it +often makes sense to wrap complex logic in a module, test that directly +(this is where [test-driven +development](https://en.wikipedia.org/wiki/Test-driven_development) can +come into play), and then write a simple integration test to ensure that +the module is getting called correctly. I avoid [mocking and +stubbing](https://en.wikipedia.org/wiki/Mock_object) if at all possible +-- again, "tests should pass when features work and break when features +break" -- and really only reach for it when it's the only option to hit +100% [code +coverage](https://www.viget.com/articles/maintenance-matters-code-coverage/). +In all cases, each test case should run against an empty database to +avoid ordering issues. + +### Network Calls + +One important exception to the "avoid mocking" rule is third-party APIs: +your test suite should be entirely self-contained and shouldn't call out +to outside services. We use +[webmock](https://github.com/bblimke/webmock#real-requests-to-network-can-be-allowed-or-disabled) +in our Ruby apps to block access to the wider web entirely. Some +providers offer mock services that provide API-conformant responses you +can test against +(e.g., [stripe-mock](https://github.com/stripe/stripe-mock)). If that's +not an option, you can use something like +[VCR](https://github.com/vcr/vcr), which stores network responses as +files and returns cached values on subsequent calls. Beware, though: VCR +works impressively in small doses, but you can lose a lot of time +re-recording "cassettes" over time. + +Rather than leaning on VCR, I've instead adopted the following approach: + +1. Wrap the API integration into a standalone object/module +2. Create a second stub module with the same interface for use in tests +3. Create a [JSON Schema](https://json-schema.org/) that defines the + acceptable API responses +4. Use that schema to validate what comes back from your API modules + (both the real one and the stub) + +If ever the responses coming from the real API fail to match the schema, +that indicates that your app and your tests have fallen out of sync, and +you need to update both. + +### Flaky Tests + +Flaky tests (tests that fail intermittently, or only fail under certain +conditions) are bad. They eat up a lot of development time, especially +as build times increase. It's important to stay on top of them and +squash them as they arise. A single test that fails one time in five +maybe doesn't seem so bad, and it's easier to rerun the build than spend +time tracking it down. But five tests like that mean the build is +failing two-thirds of the time. + +Some frameworks have libraries that will retry a failing test a set +number of times before giving up +(e.g., [rspec-retry](https://github.com/NoRedInk/rspec-retry), +[pytest-rerunfailures](https://pypi.org/project/pytest-rerunfailures/)). +These can be helpful, but they're a bandage, not a cure. + +### Slow Tests + +The speed of your test suite is a much lower priority than the +performance of your application. All else being equal, faster is better, +but a slow test suite that fully exercises your application is vastly +preferable to a fast one that doesn't. Time spent performance-tuning +your tests can generally be better spent on other things. That said, it +*is* worth periodically looking for low-hanging speed-ups -- if +parallelizing your test runs cuts the build time in half, that's worth a +few hours' time investment. + +During local development, I'll often run a subset of tests, either by +invoking a test file or specific test case directly, or by using a +wildcard pattern[^1] to run all the relevant tests. Combining that with +running the full suite in +[CI](/elsewhere/maintenance-matters-continuous-integration/) +provides a good balance of flow and rigor. At some point, if your test +suite is getting so slow that it's meaningfully impacting your team's +work, it's probably a sign that your app has gotten too large and needs +to be broken up into multiple discrete services. + +### App Code vs. Test Code + +Tests are code, but they're not application code, and the way you +approach them should be slightly different. Some (or even a lot of) +repetition is OK; don't be too quick to refactor. Ideally, someone can +get a sense of what a test is doing by looking at a single screen of +code, as opposed to jumping around between early setup, shared examples, +complex factories with side-effects, etc. + +I think of a test case sort of like a page in a book. I don't expect to +be able to open any random page in any random book and immediately grasp +the material, but assuming I'm otherwise familiar with the book's +content, I should be able to look at a single page and have a pretty +good sense of what's going on. A book that frequently required me to +jump to multiple other pages to understand a concept would not be a very +good book, and a test that spreads its setup across multiple other files +is not a very good test. + +------------------------------------------------------------------------ + +Automated testing is a (perhaps **the**) critical component of +sustainable software development. It's not a replacement for human +testing, but with a reliable automated test suite in place, your testers +can focus on what's changed and not worry about regressions in other +parts of the system. It really doesn't add much time to the development +process (provided you know what you're doing), and any increase in +velocity you gain by forgoing testing is quickly erased by time spent +fixing bugs. + +[^1]: For example, if I'm working on the part of the system that deals with sending email, I'll run all the tests with `mail` in the filename with `rspec spec/{models,features,lib}/**/*mail*`.