finish spelling post

This commit is contained in:
David Eisinger
2024-11-20 18:05:39 -05:00
parent 5b2a13daf1
commit db9cfa7ecd

View File

@@ -1,20 +1,27 @@
--- ---
title: "Spellcheck Your Hugo Site With CSpell" title: "Spellcheck Your Hugo Site With CSpell"
date: 2024-11-20T09:49:51-05:00 date: 2024-11-20T18:03:32-05:00
draft: false draft: false
tags: tags:
- meta - meta
references:
- title: "The Static Site Paradox | Loris Cro's Blog"
url: https://kristoff.it/blog/static-site-paradox/
date: 2024-10-31T03:33:40Z
file: kristoff-it-edtlns.txt
--- ---
Bla bla bla I edit these posts pretty carefully before publishing, but I inevitably find a misspelling or two after the fact. In the spirit of continuous improvement, I decided to see what kind of automated solutions are out there for spellchecking Markdown files, and found [CSpell][1]. It works well, but its default configuration found a ton of false positives that I had to scroll past to find the actual errors.
[5]: https://cspell.org/ [1]: https://cspell.org/
<!--more--> <!--more-->
Fortunately, it's quite configurable, and I've gotten it to where it only flags actual misspelled words. Here's how.
### 1. Install CSpell ### 1. Install CSpell
Assuming a modern version of Node.js (>= 18), you can use [npx][1] to download and run CSpell in a single command: Assuming a modern version of Node.js (>= 18), you can use [npx][2] to download and run CSpell in a single command:
```sh ```sh
npx cspell content/**/*.md npx cspell content/**/*.md
@@ -22,7 +29,7 @@ npx cspell content/**/*.md
You'll see a ton of spelling errors -- ignore them for now. You'll see a ton of spelling errors -- ignore them for now.
[1]: https://docs.npmjs.com/cli/v10/commands/npx [2]: https://docs.npmjs.com/cli/v10/commands/npx
### 2. Add config file ### 2. Add config file
@@ -40,13 +47,15 @@ Next, let's create a basic config file. In the root of your site, put the follow
### 3. Add additional languages ### 3. Add additional languages
My site (especially the stuff in [/elsewhere][2] that I've mirrored from my company's website) has code snippets that the English dictionary doesn't recognize. Fortunately, CSpell ships with a bunch of [additional dictionaries][3]. Adding `"ruby"`, `"golang"`, and `"java"` to the `"dictionaries"` array makes a bunch of misspellings go away. My site (especially the stuff in [/elsewhere][3] that I've mirrored from my company's website) has code snippets that the English dictionary doesn't recognize. Fortunately, CSpell ships with a bunch of [additional dictionaries][4]. Adding `"ruby"`, `"golang"`, and `"java"` to the `"dictionaries"` array makes a bunch of misspellings go away.
[2]: /elsewhere [3]: /elsewhere
[3]: https://github.com/streetsidesoftware/cspell-dicts/tree/main/dictionaries [4]: https://github.com/streetsidesoftware/cspell-dicts/tree/main/dictionaries
### 4. Ignore front matter ### 4. Ignore front matter
This first one may or may not apply to your site, so feel free to ignore, but I see a lot of false positives in the [front matter][5] of my posts, mostly around the lists of [references][6]. To ignore the front matter section entirely, add the following to your config file (credit to [this helpful GitHub comment][7]):
```json ```json
"patterns": [ "patterns": [
{ {
@@ -64,11 +73,16 @@ My site (especially the stuff in [/elsewhere][2] that I've mirrored from my comp
] ]
``` ```
[6]: https://gohugo.io/content-management/front-matter/ Note that you'll no longer catch misspellings in post titles, so it might make sense to use a more targeted regular expression.
[5]: https://gohugo.io/content-management/front-matter/
[6]: https://git.sr.ht/~dce/davideisinger.com/tree/main/item/content/journal/dispatch-21-november-2024/index.md?view-source#L7-11
[7]: https://github.com/streetsidesoftware/cspell/discussions/3456#discussioncomment-3438647 [7]: https://github.com/streetsidesoftware/cspell/discussions/3456#discussioncomment-3438647
### 5. Ignore proper nouns ### 5. Ignore proper nouns
I also see a lot of proper nouns being flagged as misspellings, so I decided to just ignore any word that begins with a capital letter. Create a new entry in the `"patterns"` array:
```json ```json
{ {
"name": "proper_nouns", "name": "proper_nouns",
@@ -76,6 +90,8 @@ My site (especially the stuff in [/elsewhere][2] that I've mirrored from my comp
} }
``` ```
That's any non-word character (or an underscore), followed by a capital letter, followed by one or more non-space characters. I'm sure that's not perfect, but it's good enough for my content. Add the new pattern to the `"ignoreRegExpList"`:
```json ```json
"languageSettings": [ "languageSettings": [
{ {
@@ -90,12 +106,18 @@ My site (especially the stuff in [/elsewhere][2] that I've mirrored from my comp
### 6. Fix spelling ### 6. Fix spelling
### 7. Create custom dictionary Now comes the hard part: run CSpell again (`npx cspell content/**/*.md`), look at all the misspellings it finds, and fix all the ones you consider to be valid. Computers can't help us here, friend.
### 7. Create a custom dictionary
Now we'll add all the unrecognized words to a custom dictionary so that CSpell will stop flagging them. First, create the list of words:
```sh ```sh
npx cspell --words-only --unique content/**/*.md >> .dictionary npx cspell --words-only --unique content/**/*.md | sort > .dictionary
``` ```
Then add a new `"dictionaryDefinitions"` array in your config file:
```json ```json
"dictionaryDefinitions": [ "dictionaryDefinitions": [
{ {
@@ -106,68 +128,23 @@ npx cspell --words-only --unique content/**/*.md >> .dictionary
], ],
``` ```
```json Finally, add `"exceptions"` to the `"dictionaries"` array. At this point, CSpell should find zero misspellings. To add new exceptions to the list in the future, you can run:
"dictionaries": [
"english",
"ruby",
"golang",
"exceptions"
]
```
```sh ```sh
npx cspell --words-only --unique content/**/*.md >> .dictionary npx cspell --words-only --unique content/**/*.md >> .dictionary
sort -o .dictionary .dictionary sort -o .dictionary .dictionary
``` ```
### 8. Add to build pipeline ### 8. Add to build pipeline
With all this stuff set up, it's dead simple to add spellchecking to the build pipeline to ensure you never publish misspellings. As long as your job runner has `npx` available, you can just run the same `npx cspell content/**/*.md` command you've been running locally in a build step. [Here's where I do it.][8]
[8]: https://git.sr.ht/~dce/davideisinger.com/tree/main/item/.build.yml#L23-24 [8]: https://git.sr.ht/~dce/davideisinger.com/tree/main/item/.build.yml#L23-24
--- ---
[Here's the final `.cspell.json` config file.][4] [Here's the final `.cspell.json` config file.][9] I'm super happy with this setup -- it's already catching misspellings in the process of writing these words. I'm reminded of [a post][10] I read a few weeks ago, about the irony of how good and simple website publishing has become for technical people, and how complex it is for the less technically-inclined. Imagine trying to accomplish this same functionality in a typical CMS -- [it would not work well, if it worked at all][11].
[4]: https://git.sr.ht/~dce/davideisinger.com/tree/main/item/.cspell.json [9]: https://git.sr.ht/~dce/davideisinger.com/tree/main/item/.cspell.json
[10]: https://kristoff.it/blog/static-site-paradox/
--- [11]: https://wordpress.org/support/topic/garbage-170/
```json
{
"$schema": "https://raw.githubusercontent.com/streetsidesoftware/cspell/main/cspell.schema.json",
"version": "0.2",
"dictionaryDefinitions": [
{
"name": "exceptions",
"path": ".dictionary",
"addWords": true
}
],
"dictionaries": [
"english",
"ruby",
"golang",
"exceptions"
],
"patterns": [
{
"name": "front_matter",
"pattern": "/^(-{3}|[+]{3})$(\\s|\\S)*?^\\1$/gm"
},
{
"name": "proper_nouns",
"pattern": "/[\\W_][A-Z][\\S]+/g"
}
],
"languageSettings": [
{
"languageId": "markdown",
"ignoreRegExpList": [
"front_matter",
"proper_nouns"
]
}
]
}
```