Add 'good tests' post
This commit is contained in:
227
content/elsewhere/maintenance-matters-good-tests/index.md
Normal file
227
content/elsewhere/maintenance-matters-good-tests/index.md
Normal file
@@ -0,0 +1,227 @@
|
||||
---
|
||||
title: "Maintenance Matters: Good Tests"
|
||||
date: 2023-11-29T09:41:18-05:00
|
||||
draft: false
|
||||
canonical_url: https://www.viget.com/articles/maintenance-matters-good-tests/
|
||||
references:
|
||||
- title: "A year of Rails - macwright.com"
|
||||
url: https://macwright.com/2021/02/18/a-year-of-rails.html
|
||||
date: 2023-07-03T02:52:03Z
|
||||
file: macwright-com-o4dndf.txt
|
||||
---
|
||||
|
||||
*This article is part of a series focusing on how developers can center
|
||||
and streamline software maintenance. The other articles in the
|
||||
Maintenance Matters series are: [Continuous
|
||||
Integration](/elsewhere/maintenance-matters-continuous-integration/),
|
||||
[Code
|
||||
Coverage](https://www.viget.com/articles/maintenance-matters-code-coverage/),
|
||||
[Documentation](https://www.viget.com/articles/maintenance-matters-documentation/),
|
||||
[Default
|
||||
Formatting](https://www.viget.com/articles/maintenance-matters-default-formatting/), [Building
|
||||
Helpful
|
||||
Logs](https://www.viget.com/articles/maintenance-matters-helpful-logs/),
|
||||
[Timely
|
||||
Upgrades](https://www.viget.com/articles/maintenance-matters-timely-upgrades/),
|
||||
and [Code
|
||||
Reviews](https://www.viget.com/articles/maintenance-matters-code-reviews/).*
|
||||
|
||||
In this latest entry to our [Maintenance
|
||||
Matters](https://www.viget.com/articles/maintenance-matters/) series, I
|
||||
want to talk about automated testing. Annie said it well in her intro
|
||||
post:
|
||||
|
||||
> There is a lot to say about testing, but from a maintainer's
|
||||
> perspective, let's define good tests as tests that prevent
|
||||
> regressions. Unit tests should have clear expectations and fail when
|
||||
> behavior changes, so a developer can either update the expectations or
|
||||
> fix their code. Feature tests should pass when features work and break
|
||||
> when features break.
|
||||
|
||||
This is a topic better suited to a book than a blog post (and indeed
|
||||
[there are
|
||||
many](https://bookshop.org/search?keywords=software+testing)), but I do
|
||||
think there are a few high-level concepts that are important to
|
||||
internalize in order to build robust, long-lasting software --- I hope
|
||||
to cover them here.
|
||||
|
||||
My first exposure to automated testing was with Ruby on Rails. Since
|
||||
then, I've written production software in many different languages, but
|
||||
nothing matches the Rails testing story. Tom Macwright said it well in
|
||||
["A year of
|
||||
Rails"](https://macwright.com/2021/02/18/a-year-of-rails.html):
|
||||
|
||||
> Testing fully-server-rendered applications, on the other hand, is
|
||||
> amazing. A vanilla testing setup with Rails & RSpec can give you fast,
|
||||
> stable, concise, and actually-useful test coverage. You can actually
|
||||
> assert for behavior and navigate through an application like a user
|
||||
> would. These tests are solving a simpler problem - making requests and
|
||||
> parsing responses, without the need for a full browser or headless
|
||||
> browser, without multiple kinds of state to track.
|
||||
|
||||
Partly, I think Rails testing is so good because it's baked into the
|
||||
framework: run `rails generate` to create a new model or controller and
|
||||
the relevant test files are generated automatically. This helped
|
||||
establish a community focus on testing, which led to a robust
|
||||
third-party ecosystem around it. Additionally, Ruby is such a flexible
|
||||
language that automated testing is really the only viable way to ensure
|
||||
things are working as expected.
|
||||
|
||||
This post isn't about Rails testing specifically, but I wanted to be
|
||||
clear on my perspective before we really dive in. And with that out of
|
||||
the way, here's what we'll cover:
|
||||
|
||||
1. [Why Test?](#why-test)
|
||||
2. [Types of Tests](#types-of-tests)
|
||||
3. [Network Calls](#network-calls)
|
||||
4. [Flaky Tests](#flaky-tests)
|
||||
5. [Slow Tests](#slow-tests)
|
||||
6. [App Code vs. Test Code](#app-code-vstest-code)
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
### Why Test?
|
||||
|
||||
The single most important reason to make automated testing part of your
|
||||
development process is that it **gives you confidence to make changes**.
|
||||
This gets more and more important over time. With a reliable test suite
|
||||
in place, you can refactor code, change functionality, and make upgrades
|
||||
with reasonable certainty that you haven't broken anything. Without good
|
||||
tests ... good luck.
|
||||
|
||||
Secondarily, testing:
|
||||
|
||||
- helps during the development process (testable code is correlated
|
||||
with well-factored code, and it's a good way to review your work
|
||||
before you ship it off);
|
||||
- provides a guide to code reviewers; and
|
||||
- serves as a kind of documentation (though not a particularly concise
|
||||
one, and not as a replacement for proper written docs).
|
||||
|
||||
### Types of Tests
|
||||
|
||||
I write two main kinds of tests, which I call **unit tests** and
|
||||
**integration tests**, though my definitions differ slightly from the
|
||||
original meanings.
|
||||
|
||||
- **Unit tests** call application code directly -- instantiate an
|
||||
object, call a method on it, make assertions about the result. I
|
||||
don't particularly care what the object under test does in the
|
||||
course of doing its work -- calling off to other objects, performing
|
||||
I/O, etc. (this is where I differ from the official definition).
|
||||
- **Integration tests** test the entire system end-to-end, using a
|
||||
framework like [Capybara](https://teamcapybara.github.io/capybara/)
|
||||
or [Playwright](https://playwright.dev/). We sometimes refer to
|
||||
these as "feature" tests in our codebases.
|
||||
|
||||
End-to-end, black-box integration tests are absolutely critical and can
|
||||
cover most of your application's functionality by themselves. But it
|
||||
often makes sense to wrap complex logic in a module, test that directly
|
||||
(this is where [test-driven
|
||||
development](https://en.wikipedia.org/wiki/Test-driven_development) can
|
||||
come into play), and then write a simple integration test to ensure that
|
||||
the module is getting called correctly. I avoid [mocking and
|
||||
stubbing](https://en.wikipedia.org/wiki/Mock_object) if at all possible
|
||||
-- again, "tests should pass when features work and break when features
|
||||
break" -- and really only reach for it when it's the only option to hit
|
||||
100% [code
|
||||
coverage](https://www.viget.com/articles/maintenance-matters-code-coverage/).
|
||||
In all cases, each test case should run against an empty database to
|
||||
avoid ordering issues.
|
||||
|
||||
### Network Calls
|
||||
|
||||
One important exception to the "avoid mocking" rule is third-party APIs:
|
||||
your test suite should be entirely self-contained and shouldn't call out
|
||||
to outside services. We use
|
||||
[webmock](https://github.com/bblimke/webmock#real-requests-to-network-can-be-allowed-or-disabled)
|
||||
in our Ruby apps to block access to the wider web entirely. Some
|
||||
providers offer mock services that provide API-conformant responses you
|
||||
can test against
|
||||
(e.g., [stripe-mock](https://github.com/stripe/stripe-mock)). If that's
|
||||
not an option, you can use something like
|
||||
[VCR](https://github.com/vcr/vcr), which stores network responses as
|
||||
files and returns cached values on subsequent calls. Beware, though: VCR
|
||||
works impressively in small doses, but you can lose a lot of time
|
||||
re-recording "cassettes" over time.
|
||||
|
||||
Rather than leaning on VCR, I've instead adopted the following approach:
|
||||
|
||||
1. Wrap the API integration into a standalone object/module
|
||||
2. Create a second stub module with the same interface for use in tests
|
||||
3. Create a [JSON Schema](https://json-schema.org/) that defines the
|
||||
acceptable API responses
|
||||
4. Use that schema to validate what comes back from your API modules
|
||||
(both the real one and the stub)
|
||||
|
||||
If ever the responses coming from the real API fail to match the schema,
|
||||
that indicates that your app and your tests have fallen out of sync, and
|
||||
you need to update both.
|
||||
|
||||
### Flaky Tests
|
||||
|
||||
Flaky tests (tests that fail intermittently, or only fail under certain
|
||||
conditions) are bad. They eat up a lot of development time, especially
|
||||
as build times increase. It's important to stay on top of them and
|
||||
squash them as they arise. A single test that fails one time in five
|
||||
maybe doesn't seem so bad, and it's easier to rerun the build than spend
|
||||
time tracking it down. But five tests like that mean the build is
|
||||
failing two-thirds of the time.
|
||||
|
||||
Some frameworks have libraries that will retry a failing test a set
|
||||
number of times before giving up
|
||||
(e.g., [rspec-retry](https://github.com/NoRedInk/rspec-retry),
|
||||
[pytest-rerunfailures](https://pypi.org/project/pytest-rerunfailures/)).
|
||||
These can be helpful, but they're a bandage, not a cure.
|
||||
|
||||
### Slow Tests
|
||||
|
||||
The speed of your test suite is a much lower priority than the
|
||||
performance of your application. All else being equal, faster is better,
|
||||
but a slow test suite that fully exercises your application is vastly
|
||||
preferable to a fast one that doesn't. Time spent performance-tuning
|
||||
your tests can generally be better spent on other things. That said, it
|
||||
*is* worth periodically looking for low-hanging speed-ups -- if
|
||||
parallelizing your test runs cuts the build time in half, that's worth a
|
||||
few hours' time investment.
|
||||
|
||||
During local development, I'll often run a subset of tests, either by
|
||||
invoking a test file or specific test case directly, or by using a
|
||||
wildcard pattern[^1] to run all the relevant tests. Combining that with
|
||||
running the full suite in
|
||||
[CI](/elsewhere/maintenance-matters-continuous-integration/)
|
||||
provides a good balance of flow and rigor. At some point, if your test
|
||||
suite is getting so slow that it's meaningfully impacting your team's
|
||||
work, it's probably a sign that your app has gotten too large and needs
|
||||
to be broken up into multiple discrete services.
|
||||
|
||||
### App Code vs. Test Code
|
||||
|
||||
Tests are code, but they're not application code, and the way you
|
||||
approach them should be slightly different. Some (or even a lot of)
|
||||
repetition is OK; don't be too quick to refactor. Ideally, someone can
|
||||
get a sense of what a test is doing by looking at a single screen of
|
||||
code, as opposed to jumping around between early setup, shared examples,
|
||||
complex factories with side-effects, etc.
|
||||
|
||||
I think of a test case sort of like a page in a book. I don't expect to
|
||||
be able to open any random page in any random book and immediately grasp
|
||||
the material, but assuming I'm otherwise familiar with the book's
|
||||
content, I should be able to look at a single page and have a pretty
|
||||
good sense of what's going on. A book that frequently required me to
|
||||
jump to multiple other pages to understand a concept would not be a very
|
||||
good book, and a test that spreads its setup across multiple other files
|
||||
is not a very good test.
|
||||
|
||||
------------------------------------------------------------------------
|
||||
|
||||
Automated testing is a (perhaps **the**) critical component of
|
||||
sustainable software development. It's not a replacement for human
|
||||
testing, but with a reliable automated test suite in place, your testers
|
||||
can focus on what's changed and not worry about regressions in other
|
||||
parts of the system. It really doesn't add much time to the development
|
||||
process (provided you know what you're doing), and any increase in
|
||||
velocity you gain by forgoing testing is quickly erased by time spent
|
||||
fixing bugs.
|
||||
|
||||
[^1]: For example, if I'm working on the part of the system that deals with sending email, I'll run all the tests with `mail` in the filename with `rspec spec/{models,features,lib}/**/*mail*`.
|
||||
Reference in New Issue
Block a user