228 lines
11 KiB
Markdown
228 lines
11 KiB
Markdown
---
|
||
title: "Maintenance Matters: Good Tests"
|
||
date: 2023-11-29T09:41:18-05:00
|
||
draft: false
|
||
canonical_url: https://www.viget.com/articles/maintenance-matters-good-tests/
|
||
references:
|
||
- title: "A year of Rails - macwright.com"
|
||
url: https://macwright.com/2021/02/18/a-year-of-rails.html
|
||
date: 2023-07-03T02:52:03Z
|
||
file: macwright-com-o4dndf.txt
|
||
---
|
||
|
||
*This article is part of a series focusing on how developers can center
|
||
and streamline software maintenance. The other articles in the
|
||
Maintenance Matters series are: [Continuous
|
||
Integration](/elsewhere/maintenance-matters-continuous-integration/),
|
||
[Code
|
||
Coverage](https://www.viget.com/articles/maintenance-matters-code-coverage/),
|
||
[Documentation](https://www.viget.com/articles/maintenance-matters-documentation/),
|
||
[Default
|
||
Formatting](https://www.viget.com/articles/maintenance-matters-default-formatting/), [Building
|
||
Helpful
|
||
Logs](https://www.viget.com/articles/maintenance-matters-helpful-logs/),
|
||
[Timely
|
||
Upgrades](https://www.viget.com/articles/maintenance-matters-timely-upgrades/),
|
||
and [Code
|
||
Reviews](https://www.viget.com/articles/maintenance-matters-code-reviews/).*
|
||
|
||
In this latest entry to our [Maintenance
|
||
Matters](https://www.viget.com/articles/maintenance-matters/) series, I
|
||
want to talk about automated testing. Annie said it well in her intro
|
||
post:
|
||
|
||
> There is a lot to say about testing, but from a maintainer's
|
||
> perspective, let's define good tests as tests that prevent
|
||
> regressions. Unit tests should have clear expectations and fail when
|
||
> behavior changes, so a developer can either update the expectations or
|
||
> fix their code. Feature tests should pass when features work and break
|
||
> when features break.
|
||
|
||
This is a topic better suited to a book than a blog post (and indeed
|
||
[there are
|
||
many](https://bookshop.org/search?keywords=software+testing)), but I do
|
||
think there are a few high-level concepts that are important to
|
||
internalize in order to build robust, long-lasting software --- I hope
|
||
to cover them here.
|
||
|
||
My first exposure to automated testing was with Ruby on Rails. Since
|
||
then, I've written production software in many different languages, but
|
||
nothing matches the Rails testing story. Tom Macwright said it well in
|
||
["A year of
|
||
Rails"](https://macwright.com/2021/02/18/a-year-of-rails.html):
|
||
|
||
> Testing fully-server-rendered applications, on the other hand, is
|
||
> amazing. A vanilla testing setup with Rails & RSpec can give you fast,
|
||
> stable, concise, and actually-useful test coverage. You can actually
|
||
> assert for behavior and navigate through an application like a user
|
||
> would. These tests are solving a simpler problem - making requests and
|
||
> parsing responses, without the need for a full browser or headless
|
||
> browser, without multiple kinds of state to track.
|
||
|
||
Partly, I think Rails testing is so good because it's baked into the
|
||
framework: run `rails generate` to create a new model or controller and
|
||
the relevant test files are generated automatically. This helped
|
||
establish a community focus on testing, which led to a robust
|
||
third-party ecosystem around it. Additionally, Ruby is such a flexible
|
||
language that automated testing is really the only viable way to ensure
|
||
things are working as expected.
|
||
|
||
This post isn't about Rails testing specifically, but I wanted to be
|
||
clear on my perspective before we really dive in. And with that out of
|
||
the way, here's what we'll cover:
|
||
|
||
1. [Why Test?](#why-test)
|
||
2. [Types of Tests](#types-of-tests)
|
||
3. [Network Calls](#network-calls)
|
||
4. [Flaky Tests](#flaky-tests)
|
||
5. [Slow Tests](#slow-tests)
|
||
6. [App Code vs. Test Code](#app-code-vstest-code)
|
||
|
||
------------------------------------------------------------------------
|
||
|
||
### Why Test?
|
||
|
||
The single most important reason to make automated testing part of your
|
||
development process is that it **gives you confidence to make changes**.
|
||
This gets more and more important over time. With a reliable test suite
|
||
in place, you can refactor code, change functionality, and make upgrades
|
||
with reasonable certainty that you haven't broken anything. Without good
|
||
tests ... good luck.
|
||
|
||
Secondarily, testing:
|
||
|
||
- helps during the development process (testable code is correlated
|
||
with well-factored code, and it's a good way to review your work
|
||
before you ship it off);
|
||
- provides a guide to code reviewers; and
|
||
- serves as a kind of documentation (though not a particularly concise
|
||
one, and not as a replacement for proper written docs).
|
||
|
||
### Types of Tests
|
||
|
||
I write two main kinds of tests, which I call **unit tests** and
|
||
**integration tests**, though my definitions differ slightly from the
|
||
original meanings.
|
||
|
||
- **Unit tests** call application code directly -- instantiate an
|
||
object, call a method on it, make assertions about the result. I
|
||
don't particularly care what the object under test does in the
|
||
course of doing its work -- calling off to other objects, performing
|
||
I/O, etc. (this is where I differ from the official definition).
|
||
- **Integration tests** test the entire system end-to-end, using a
|
||
framework like [Capybara](https://teamcapybara.github.io/capybara/)
|
||
or [Playwright](https://playwright.dev/). We sometimes refer to
|
||
these as "feature" tests in our codebases.
|
||
|
||
End-to-end, black-box integration tests are absolutely critical and can
|
||
cover most of your application's functionality by themselves. But it
|
||
often makes sense to wrap complex logic in a module, test that directly
|
||
(this is where [test-driven
|
||
development](https://en.wikipedia.org/wiki/Test-driven_development) can
|
||
come into play), and then write a simple integration test to ensure that
|
||
the module is getting called correctly. I avoid [mocking and
|
||
stubbing](https://en.wikipedia.org/wiki/Mock_object) if at all possible
|
||
-- again, "tests should pass when features work and break when features
|
||
break" -- and really only reach for it when it's the only option to hit
|
||
100% [code
|
||
coverage](https://www.viget.com/articles/maintenance-matters-code-coverage/).
|
||
In all cases, each test case should run against an empty database to
|
||
avoid ordering issues.
|
||
|
||
### Network Calls
|
||
|
||
One important exception to the "avoid mocking" rule is third-party APIs:
|
||
your test suite should be entirely self-contained and shouldn't call out
|
||
to outside services. We use
|
||
[webmock](https://github.com/bblimke/webmock#real-requests-to-network-can-be-allowed-or-disabled)
|
||
in our Ruby apps to block access to the wider web entirely. Some
|
||
providers offer mock services that provide API-conformant responses you
|
||
can test against
|
||
(e.g., [stripe-mock](https://github.com/stripe/stripe-mock)). If that's
|
||
not an option, you can use something like
|
||
[VCR](https://github.com/vcr/vcr), which stores network responses as
|
||
files and returns cached values on subsequent calls. Beware, though: VCR
|
||
works impressively in small doses, but you can lose a lot of time
|
||
re-recording "cassettes" over time.
|
||
|
||
Rather than leaning on VCR, I've instead adopted the following approach:
|
||
|
||
1. Wrap the API integration into a standalone object/module
|
||
2. Create a second stub module with the same interface for use in tests
|
||
3. Create a [JSON Schema](https://json-schema.org/) that defines the
|
||
acceptable API responses
|
||
4. Use that schema to validate what comes back from your API modules
|
||
(both the real one and the stub)
|
||
|
||
If ever the responses coming from the real API fail to match the schema,
|
||
that indicates that your app and your tests have fallen out of sync, and
|
||
you need to update both.
|
||
|
||
### Flaky Tests
|
||
|
||
Flaky tests (tests that fail intermittently, or only fail under certain
|
||
conditions) are bad. They eat up a lot of development time, especially
|
||
as build times increase. It's important to stay on top of them and
|
||
squash them as they arise. A single test that fails one time in five
|
||
maybe doesn't seem so bad, and it's easier to rerun the build than spend
|
||
time tracking it down. But five tests like that mean the build is
|
||
failing two-thirds of the time.
|
||
|
||
Some frameworks have libraries that will retry a failing test a set
|
||
number of times before giving up
|
||
(e.g., [rspec-retry](https://github.com/NoRedInk/rspec-retry),
|
||
[pytest-rerunfailures](https://pypi.org/project/pytest-rerunfailures/)).
|
||
These can be helpful, but they're a bandage, not a cure.
|
||
|
||
### Slow Tests
|
||
|
||
The speed of your test suite is a much lower priority than the
|
||
performance of your application. All else being equal, faster is better,
|
||
but a slow test suite that fully exercises your application is vastly
|
||
preferable to a fast one that doesn't. Time spent performance-tuning
|
||
your tests can generally be better spent on other things. That said, it
|
||
*is* worth periodically looking for low-hanging speed-ups -- if
|
||
parallelizing your test runs cuts the build time in half, that's worth a
|
||
few hours' time investment.
|
||
|
||
During local development, I'll often run a subset of tests, either by
|
||
invoking a test file or specific test case directly, or by using a
|
||
wildcard pattern[^1] to run all the relevant tests. Combining that with
|
||
running the full suite in
|
||
[CI](/elsewhere/maintenance-matters-continuous-integration/)
|
||
provides a good balance of flow and rigor. At some point, if your test
|
||
suite is getting so slow that it's meaningfully impacting your team's
|
||
work, it's probably a sign that your app has gotten too large and needs
|
||
to be broken up into multiple discrete services.
|
||
|
||
### App Code vs. Test Code
|
||
|
||
Tests are code, but they're not application code, and the way you
|
||
approach them should be slightly different. Some (or even a lot of)
|
||
repetition is OK; don't be too quick to refactor. Ideally, someone can
|
||
get a sense of what a test is doing by looking at a single screen of
|
||
code, as opposed to jumping around between early setup, shared examples,
|
||
complex factories with side-effects, etc.
|
||
|
||
I think of a test case sort of like a page in a book. I don't expect to
|
||
be able to open any random page in any random book and immediately grasp
|
||
the material, but assuming I'm otherwise familiar with the book's
|
||
content, I should be able to look at a single page and have a pretty
|
||
good sense of what's going on. A book that frequently required me to
|
||
jump to multiple other pages to understand a concept would not be a very
|
||
good book, and a test that spreads its setup across multiple other files
|
||
is not a very good test.
|
||
|
||
------------------------------------------------------------------------
|
||
|
||
Automated testing is a (perhaps **the**) critical component of
|
||
sustainable software development. It's not a replacement for human
|
||
testing, but with a reliable automated test suite in place, your testers
|
||
can focus on what's changed and not worry about regressions in other
|
||
parts of the system. It really doesn't add much time to the development
|
||
process (provided you know what you're doing), and any increase in
|
||
velocity you gain by forgoing testing is quickly erased by time spent
|
||
fixing bugs.
|
||
|
||
[^1]: For example, if I'm working on the part of the system that deals with sending email, I'll run all the tests with `mail` in the filename with `rspec spec/{models,features,lib}/**/*mail*`.
|