finish spelling post
This commit is contained in:
@@ -1,20 +1,27 @@
|
|||||||
---
|
---
|
||||||
title: "Spellcheck Your Hugo Site With CSpell"
|
title: "Spellcheck Your Hugo Site With CSpell"
|
||||||
date: 2024-11-20T09:49:51-05:00
|
date: 2024-11-20T18:03:32-05:00
|
||||||
draft: false
|
draft: false
|
||||||
tags:
|
tags:
|
||||||
- meta
|
- meta
|
||||||
|
references:
|
||||||
|
- title: "The Static Site Paradox | Loris Cro's Blog"
|
||||||
|
url: https://kristoff.it/blog/static-site-paradox/
|
||||||
|
date: 2024-10-31T03:33:40Z
|
||||||
|
file: kristoff-it-edtlns.txt
|
||||||
---
|
---
|
||||||
|
|
||||||
Bla bla bla
|
I edit these posts pretty carefully before publishing, but I inevitably find a misspelling or two after the fact. In the spirit of continuous improvement, I decided to see what kind of automated solutions are out there for spellchecking Markdown files, and found [CSpell][1]. It works well, but its default configuration found a ton of false positives that I had to scroll past to find the actual errors.
|
||||||
|
|
||||||
[5]: https://cspell.org/
|
[1]: https://cspell.org/
|
||||||
|
|
||||||
<!--more-->
|
<!--more-->
|
||||||
|
|
||||||
|
Fortunately, it's quite configurable, and I've gotten it to where it only flags actual misspelled words. Here's how.
|
||||||
|
|
||||||
### 1. Install CSpell
|
### 1. Install CSpell
|
||||||
|
|
||||||
Assuming a modern version of Node.js (>= 18), you can use [npx][1] to download and run CSpell in a single command:
|
Assuming a modern version of Node.js (>= 18), you can use [npx][2] to download and run CSpell in a single command:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
npx cspell content/**/*.md
|
npx cspell content/**/*.md
|
||||||
@@ -22,7 +29,7 @@ npx cspell content/**/*.md
|
|||||||
|
|
||||||
You'll see a ton of spelling errors -- ignore them for now.
|
You'll see a ton of spelling errors -- ignore them for now.
|
||||||
|
|
||||||
[1]: https://docs.npmjs.com/cli/v10/commands/npx
|
[2]: https://docs.npmjs.com/cli/v10/commands/npx
|
||||||
|
|
||||||
### 2. Add config file
|
### 2. Add config file
|
||||||
|
|
||||||
@@ -40,13 +47,15 @@ Next, let's create a basic config file. In the root of your site, put the follow
|
|||||||
|
|
||||||
### 3. Add additional languages
|
### 3. Add additional languages
|
||||||
|
|
||||||
My site (especially the stuff in [/elsewhere][2] that I've mirrored from my company's website) has code snippets that the English dictionary doesn't recognize. Fortunately, CSpell ships with a bunch of [additional dictionaries][3]. Adding `"ruby"`, `"golang"`, and `"java"` to the `"dictionaries"` array makes a bunch of misspellings go away.
|
My site (especially the stuff in [/elsewhere][3] that I've mirrored from my company's website) has code snippets that the English dictionary doesn't recognize. Fortunately, CSpell ships with a bunch of [additional dictionaries][4]. Adding `"ruby"`, `"golang"`, and `"java"` to the `"dictionaries"` array makes a bunch of misspellings go away.
|
||||||
|
|
||||||
[2]: /elsewhere
|
[3]: /elsewhere
|
||||||
[3]: https://github.com/streetsidesoftware/cspell-dicts/tree/main/dictionaries
|
[4]: https://github.com/streetsidesoftware/cspell-dicts/tree/main/dictionaries
|
||||||
|
|
||||||
### 4. Ignore front matter
|
### 4. Ignore front matter
|
||||||
|
|
||||||
|
This first one may or may not apply to your site, so feel free to ignore, but I see a lot of false positives in the [front matter][5] of my posts, mostly around the lists of [references][6]. To ignore the front matter section entirely, add the following to your config file (credit to [this helpful GitHub comment][7]):
|
||||||
|
|
||||||
```json
|
```json
|
||||||
"patterns": [
|
"patterns": [
|
||||||
{
|
{
|
||||||
@@ -64,11 +73,16 @@ My site (especially the stuff in [/elsewhere][2] that I've mirrored from my comp
|
|||||||
]
|
]
|
||||||
```
|
```
|
||||||
|
|
||||||
[6]: https://gohugo.io/content-management/front-matter/
|
Note that you'll no longer catch misspellings in post titles, so it might make sense to use a more targeted regular expression.
|
||||||
|
|
||||||
|
[5]: https://gohugo.io/content-management/front-matter/
|
||||||
|
[6]: https://git.sr.ht/~dce/davideisinger.com/tree/main/item/content/journal/dispatch-21-november-2024/index.md?view-source#L7-11
|
||||||
[7]: https://github.com/streetsidesoftware/cspell/discussions/3456#discussioncomment-3438647
|
[7]: https://github.com/streetsidesoftware/cspell/discussions/3456#discussioncomment-3438647
|
||||||
|
|
||||||
### 5. Ignore proper nouns
|
### 5. Ignore proper nouns
|
||||||
|
|
||||||
|
I also see a lot of proper nouns being flagged as misspellings, so I decided to just ignore any word that begins with a capital letter. Create a new entry in the `"patterns"` array:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"name": "proper_nouns",
|
"name": "proper_nouns",
|
||||||
@@ -76,6 +90,8 @@ My site (especially the stuff in [/elsewhere][2] that I've mirrored from my comp
|
|||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
|
That's any non-word character (or an underscore), followed by a capital letter, followed by one or more non-space characters. I'm sure that's not perfect, but it's good enough for my content. Add the new pattern to the `"ignoreRegExpList"`:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
"languageSettings": [
|
"languageSettings": [
|
||||||
{
|
{
|
||||||
@@ -90,12 +106,18 @@ My site (especially the stuff in [/elsewhere][2] that I've mirrored from my comp
|
|||||||
|
|
||||||
### 6. Fix spelling
|
### 6. Fix spelling
|
||||||
|
|
||||||
### 7. Create custom dictionary
|
Now comes the hard part: run CSpell again (`npx cspell content/**/*.md`), look at all the misspellings it finds, and fix all the ones you consider to be valid. Computers can't help us here, friend.
|
||||||
|
|
||||||
|
### 7. Create a custom dictionary
|
||||||
|
|
||||||
|
Now we'll add all the unrecognized words to a custom dictionary so that CSpell will stop flagging them. First, create the list of words:
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
npx cspell --words-only --unique content/**/*.md >> .dictionary
|
npx cspell --words-only --unique content/**/*.md | sort > .dictionary
|
||||||
```
|
```
|
||||||
|
|
||||||
|
Then add a new `"dictionaryDefinitions"` array in your config file:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
"dictionaryDefinitions": [
|
"dictionaryDefinitions": [
|
||||||
{
|
{
|
||||||
@@ -106,68 +128,23 @@ npx cspell --words-only --unique content/**/*.md >> .dictionary
|
|||||||
],
|
],
|
||||||
```
|
```
|
||||||
|
|
||||||
```json
|
Finally, add `"exceptions"` to the `"dictionaries"` array. At this point, CSpell should find zero misspellings. To add new exceptions to the list in the future, you can run:
|
||||||
"dictionaries": [
|
|
||||||
"english",
|
|
||||||
"ruby",
|
|
||||||
"golang",
|
|
||||||
"exceptions"
|
|
||||||
]
|
|
||||||
```
|
|
||||||
|
|
||||||
```sh
|
```sh
|
||||||
npx cspell --words-only --unique content/**/*.md >> .dictionary
|
npx cspell --words-only --unique content/**/*.md >> .dictionary
|
||||||
sort -o .dictionary .dictionary
|
sort -o .dictionary .dictionary
|
||||||
```
|
```
|
||||||
|
|
||||||
|
|
||||||
### 8. Add to build pipeline
|
### 8. Add to build pipeline
|
||||||
|
|
||||||
|
With all this stuff set up, it's dead simple to add spellchecking to the build pipeline to ensure you never publish misspellings. As long as your job runner has `npx` available, you can just run the same `npx cspell content/**/*.md` command you've been running locally in a build step. [Here's where I do it.][8]
|
||||||
|
|
||||||
[8]: https://git.sr.ht/~dce/davideisinger.com/tree/main/item/.build.yml#L23-24
|
[8]: https://git.sr.ht/~dce/davideisinger.com/tree/main/item/.build.yml#L23-24
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
[Here's the final `.cspell.json` config file.][4]
|
[Here's the final `.cspell.json` config file.][9] I'm super happy with this setup -- it's already catching misspellings in the process of writing these words. I'm reminded of [a post][10] I read a few weeks ago, about the irony of how good and simple website publishing has become for technical people, and how complex it is for the less technically-inclined. Imagine trying to accomplish this same functionality in a typical CMS -- [it would not work well, if it worked at all][11].
|
||||||
|
|
||||||
[4]: https://git.sr.ht/~dce/davideisinger.com/tree/main/item/.cspell.json
|
[9]: https://git.sr.ht/~dce/davideisinger.com/tree/main/item/.cspell.json
|
||||||
|
[10]: https://kristoff.it/blog/static-site-paradox/
|
||||||
---
|
[11]: https://wordpress.org/support/topic/garbage-170/
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"$schema": "https://raw.githubusercontent.com/streetsidesoftware/cspell/main/cspell.schema.json",
|
|
||||||
"version": "0.2",
|
|
||||||
"dictionaryDefinitions": [
|
|
||||||
{
|
|
||||||
"name": "exceptions",
|
|
||||||
"path": ".dictionary",
|
|
||||||
"addWords": true
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"dictionaries": [
|
|
||||||
"english",
|
|
||||||
"ruby",
|
|
||||||
"golang",
|
|
||||||
"exceptions"
|
|
||||||
],
|
|
||||||
"patterns": [
|
|
||||||
{
|
|
||||||
"name": "front_matter",
|
|
||||||
"pattern": "/^(-{3}|[+]{3})$(\\s|\\S)*?^\\1$/gm"
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "proper_nouns",
|
|
||||||
"pattern": "/[\\W_][A-Z][\\S]+/g"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"languageSettings": [
|
|
||||||
{
|
|
||||||
"languageId": "markdown",
|
|
||||||
"ignoreRegExpList": [
|
|
||||||
"front_matter",
|
|
||||||
"proper_nouns"
|
|
||||||
]
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|||||||
Reference in New Issue
Block a user