Add 'good tests' post

2023-11-30 09:52:42 -05:00
parent d907390f2e
commit 4b5e7ba111
1 changed files with 227 additions and 0 deletions
--- a/content/elsewhere/maintenance-matters-good-tests/index.md
+++ b/content/elsewhere/maintenance-matters-good-tests/index.md
@@ -0,0 +1,227 @@
 ---
 title: "Maintenance Matters: Good Tests"
 date: 2023-11-29T09:41:18-05:00
 draft: false
 canonical_url: https://www.viget.com/articles/maintenance-matters-good-tests/
 references:
 - title: "A year of Rails - macwright.com"
  url: https://macwright.com/2021/02/18/a-year-of-rails.html
  date: 2023-07-03T02:52:03Z
  file: macwright-com-o4dndf.txt
 ---
 *This article is part of a series focusing on how developers can center
 and streamline software maintenance. The other articles in the
 Maintenance Matters series are: [Continuous
 Integration](/elsewhere/maintenance-matters-continuous-integration/),
 [Code
 Coverage](https://www.viget.com/articles/maintenance-matters-code-coverage/),
 [Documentation](https://www.viget.com/articles/maintenance-matters-documentation/),
 [Default
 Formatting](https://www.viget.com/articles/maintenance-matters-default-formatting/), [Building
 Helpful
 Logs](https://www.viget.com/articles/maintenance-matters-helpful-logs/),
 [Timely
 Upgrades](https://www.viget.com/articles/maintenance-matters-timely-upgrades/),
 and [Code
 Reviews](https://www.viget.com/articles/maintenance-matters-code-reviews/).*
 In this latest entry to our [Maintenance
 Matters](https://www.viget.com/articles/maintenance-matters/) series, I
 want to talk about automated testing. Annie said it well in her intro
 post:
 > There is a lot to say about testing, but from a maintainer's
 > perspective, let's define good tests as tests that prevent
 > regressions. Unit tests should have clear expectations and fail when
 > behavior changes, so a developer can either update the expectations or
 > fix their code. Feature tests should pass when features work and break
 > when features break.
 This is a topic better suited to a book than a blog post (and indeed
 [there are
 many](https://bookshop.org/search?keywords=software+testing)), but I do
 think there are a few high-level concepts that are important to
 internalize in order to build robust, long-lasting software --- I hope
 to cover them here.
 My first exposure to automated testing was with Ruby on Rails. Since
 then, I've written production software in many different languages, but
 nothing matches the Rails testing story. Tom Macwright said it well in
 ["A year of
 Rails"](https://macwright.com/2021/02/18/a-year-of-rails.html):
 > Testing fully-server-rendered applications, on the other hand, is
 > amazing. A vanilla testing setup with Rails & RSpec can give you fast,
 > stable, concise, and actually-useful test coverage. You can actually
 > assert for behavior and navigate through an application like a user
 > would. These tests are solving a simpler problem - making requests and
 > parsing responses, without the need for a full browser or headless
 > browser, without multiple kinds of state to track.
 Partly, I think Rails testing is so good because it's baked into the
 framework: run `rails generate` to create a new model or controller and
 the relevant test files are generated automatically. This helped
 establish a community focus on testing, which led to a robust
 third-party ecosystem around it. Additionally, Ruby is such a flexible
 language that automated testing is really the only viable way to ensure
 things are working as expected.
 This post isn't about Rails testing specifically, but I wanted to be
 clear on my perspective before we really dive in. And with that out of
 the way, here's what we'll cover:
 1.  [Why Test?](#why-test)
 2.  [Types of Tests](#types-of-tests)
 3.  [Network Calls](#network-calls)
 4.  [Flaky Tests](#flaky-tests)
 5.  [Slow Tests](#slow-tests)
 6.  [App Code vs. Test Code](#app-code-vstest-code)
 ------------------------------------------------------------------------
 ### Why Test?
 The single most important reason to make automated testing part of your
 development process is that it **gives you confidence to make changes**.
 This gets more and more important over time. With a reliable test suite
 in place, you can refactor code, change functionality, and make upgrades
 with reasonable certainty that you haven't broken anything. Without good
 tests ... good luck.
 Secondarily, testing:
 -   helps during the development process (testable code is correlated
    with well-factored code, and it's a good way to review your work
    before you ship it off);
 -   provides a guide to code reviewers; and
 -   serves as a kind of documentation (though not a particularly concise
    one, and not as a replacement for proper written docs).
 ### Types of Tests
 I write two main kinds of tests, which I call **unit tests** and
 **integration tests**, though my definitions differ slightly from the
 original meanings.
 -   **Unit tests** call application code directly -- instantiate an
    object, call a method on it, make assertions about the result. I
    don't particularly care what the object under test does in the
    course of doing its work -- calling off to other objects, performing
    I/O, etc. (this is where I differ from the official definition).
 -   **Integration tests** test the entire system end-to-end, using a
    framework like [Capybara](https://teamcapybara.github.io/capybara/)
    or [Playwright](https://playwright.dev/). We sometimes refer to
    these as "feature" tests in our codebases.
 End-to-end, black-box integration tests are absolutely critical and can
 cover most of your application's functionality by themselves. But it
 often makes sense to wrap complex logic in a module, test that directly
 (this is where [test-driven
 development](https://en.wikipedia.org/wiki/Test-driven_development) can
 come into play), and then write a simple integration test to ensure that
 the module is getting called correctly. I avoid [mocking and
 stubbing](https://en.wikipedia.org/wiki/Mock_object) if at all possible
 -- again, "tests should pass when features work and break when features
 break" -- and really only reach for it when it's the only option to hit
 100% [code
 coverage](https://www.viget.com/articles/maintenance-matters-code-coverage/).
 In all cases, each test case should run against an empty database to
 avoid ordering issues.
 ### Network Calls
 One important exception to the "avoid mocking" rule is third-party APIs:
 your test suite should be entirely self-contained and shouldn't call out
 to outside services. We use
 [webmock](https://github.com/bblimke/webmock#real-requests-to-network-can-be-allowed-or-disabled)
 in our Ruby apps to block access to the wider web entirely. Some
 providers offer mock services that provide API-conformant responses you
 can test against
 (e.g., [stripe-mock](https://github.com/stripe/stripe-mock)). If that's
 not an option, you can use something like
 [VCR](https://github.com/vcr/vcr), which stores network responses as
 files and returns cached values on subsequent calls. Beware, though: VCR
 works impressively in small doses, but you can lose a lot of time
 re-recording "cassettes" over time.
 Rather than leaning on VCR, I've instead adopted the following approach:
 1.  Wrap the API integration into a standalone object/module
 2.  Create a second stub module with the same interface for use in tests
 3.  Create a [JSON Schema](https://json-schema.org/) that defines the
    acceptable API responses
 4.  Use that schema to validate what comes back from your API modules
    (both the real one and the stub)
 If ever the responses coming from the real API fail to match the schema,
 that indicates that your app and your tests have fallen out of sync, and
 you need to update both.
 ### Flaky Tests
 Flaky tests (tests that fail intermittently, or only fail under certain
 conditions) are bad. They eat up a lot of development time, especially
 as build times increase. It's important to stay on top of them and
 squash them as they arise. A single test that fails one time in five
 maybe doesn't seem so bad, and it's easier to rerun the build than spend
 time tracking it down. But five tests like that mean the build is
 failing two-thirds of the time.
 Some frameworks have libraries that will retry a failing test a set
 number of times before giving up
 (e.g., [rspec-retry](https://github.com/NoRedInk/rspec-retry),
 [pytest-rerunfailures](https://pypi.org/project/pytest-rerunfailures/)).
 These can be helpful, but they're a bandage, not a cure.
 ### Slow Tests
 The speed of your test suite is a much lower priority than the
 performance of your application. All else being equal, faster is better,
 but a slow test suite that fully exercises your application is vastly
 preferable to a fast one that doesn't. Time spent performance-tuning
 your tests can generally be better spent on other things. That said, it
 *is* worth periodically looking for low-hanging speed-ups -- if
 parallelizing your test runs cuts the build time in half, that's worth a
 few hours' time investment.
 During local development, I'll often run a subset of tests, either by
 invoking a test file or specific test case directly, or by using a
 wildcard pattern[^1] to run all the relevant tests. Combining that with
 running the full suite in
 [CI](/elsewhere/maintenance-matters-continuous-integration/)
 provides a good balance of flow and rigor. At some point, if your test
 suite is getting so slow that it's meaningfully impacting your team's
 work, it's probably a sign that your app has gotten too large and needs
 to be broken up into multiple discrete services.
 ### App Code vs. Test Code
 Tests are code, but they're not application code, and the way you
 approach them should be slightly different. Some (or even a lot of)
 repetition is OK; don't be too quick to refactor. Ideally, someone can
 get a sense of what a test is doing by looking at a single screen of
 code, as opposed to jumping around between early setup, shared examples,
 complex factories with side-effects, etc.
 I think of a test case sort of like a page in a book. I don't expect to
 be able to open any random page in any random book and immediately grasp
 the material, but assuming I'm otherwise familiar with the book's
 content, I should be able to look at a single page and have a pretty
 good sense of what's going on. A book that frequently required me to
 jump to multiple other pages to understand a concept would not be a very
 good book, and a test that spreads its setup across multiple other files
 is not a very good test.
 ------------------------------------------------------------------------
 Automated testing is a (perhaps **the**) critical component of
 sustainable software development. It's not a replacement for human
 testing, but with a reliable automated test suite in place, your testers
 can focus on what's changed and not worry about regressions in other
 parts of the system. It really doesn't add much time to the development
 process (provided you know what you're doing), and any increase in
 velocity you gain by forgoing testing is quickly erased by time spent
 fixing bugs.
 [^1]: For example, if I'm working on the part of the system that deals with sending email, I'll run all the tests with `mail` in the filename with `rspec spec/{models,features,lib}/**/*mail*`.