224 lines
8.2 KiB
Markdown
224 lines
8.2 KiB
Markdown
---
|
|
title: "Pandoc: A Tool I Use and Like"
|
|
date: 2022-05-25T00:00:00+00:00
|
|
draft: false
|
|
canonical_url: https://www.viget.com/articles/pandoc-a-tool-i-use-and-like/
|
|
---
|
|
|
|
Today I want to talk to you about one of my favorite command-line tools,
|
|
[Pandoc](https://pandoc.org/). From the project website:
|
|
|
|
> If you need to convert files from one markup format into another,
|
|
> pandoc is your swiss-army knife.
|
|
|
|
I spend a lot of time writing, and I love [Vim](https://www.vim.org/),
|
|
[Markdown](https://daringfireball.net/projects/markdown/), and the
|
|
command line (and avoid browser-based WYSIWYG editors when I can), so
|
|
that's where a lot of my Pandoc use comes in, but it has a ton of
|
|
utility outside of that -- really, anywhere you need to move between
|
|
different text-based formats, Pandoc can probably help. A few examples
|
|
from recent memory:
|
|
|
|
### Markdown ➞ Craft Blog Post
|
|
|
|
This website you're reading presently uses [Craft
|
|
CMS](https://craftcms.com/), a flexible and powerful content management
|
|
system that doesn't perfectly match my writing
|
|
process[^1]. Rather
|
|
than composing directly in Craft, I prefer to write locally, pipe the
|
|
output through Pandoc, and put the resulting HTML into a text block in
|
|
the CMS. This gets me a few things I really like:
|
|
|
|
- Curly quotes in place of straight ones and en-dashes in place of
|
|
`--` (from the [`smart`
|
|
extension](https://pandoc.org/MANUAL.html#extension-smart))
|
|
- [Daring
|
|
Fireball-style](https://daringfireball.net/2005/07/footnotes)
|
|
footnotes with return links
|
|
|
|
By default, Pandoc uses [Pandoc
|
|
Markdown](https://garrettgman.github.io/rmarkdown/authoring_pandoc_markdown.html)
|
|
when converting Markdown docs to other formats, an "extended and
|
|
slightly revised version" of the original syntax, which is how footnotes
|
|
and a bunch of other things work.
|
|
|
|
### Markdown ➞ Rich Text (Basecamp)
|
|
|
|
I also sometimes find myself writing decently long
|
|
[Basecamp](https://basecamp.com/) posts. Basecamp 3 has a fine WYSIWYG
|
|
editor (🪦 Textile), but again, I'd rather be in Vim. Pasting HTML into
|
|
Basecamp doesn't work (just shows the code verbatim), but I've found
|
|
that if I convert my Markdown notes to HTML and open the HTML in a
|
|
browser, I can copy and paste that directly into Basecamp with good
|
|
results. Leveraging MacOS' `open` command, this one-liner does the
|
|
trick[^2]:
|
|
|
|
```sh
|
|
cat [filename.md] \
|
|
| pandoc -t html \
|
|
> /tmp/output.html \
|
|
&& open /tmp/output.html \
|
|
&& read -n 1 \
|
|
&& rm /tmp/output.html
|
|
```
|
|
|
|
This will convert the contents to HTML, save that to a file, open the
|
|
file in a browser, wait for the user to hit enter, and the remove the
|
|
file. Without that `read -n 1`, it'll remove the file before the browser
|
|
has a chance to open it.
|
|
|
|
### HTML ➞ Text
|
|
|
|
We built an app for one of our clients that takes in news articles (in
|
|
HTML) via an API and sends them as emails to *their* clients (think big
|
|
brands) if certain criteria are met. Recently, we were making
|
|
improvements to the plain text version of the emails, and we noticed
|
|
that some of the articles were coming in without any linebreaks in the
|
|
content. When we removed the HTML (via Rails' [`strip_tags`
|
|
helper](https://apidock.com/rails/ActionView/Helpers/SanitizeHelper/strip_tags)),
|
|
the resulting content was all on one line, which wasn't very readable.
|
|
So imagine an article like this:
|
|
|
|
```html
|
|
<h1>Headline</h1> <p>A paragraph.</p> <ul><li>List item #1</li> <li>List item #2</li></ul>
|
|
````
|
|
|
|
Our initial approach (with `strip_tags`) gives us this:
|
|
|
|
```
|
|
Headline A paragraph. List item #1 List item #2
|
|
```
|
|
|
|
Not great! But fortunately, some bright fellow had the idea to pull in
|
|
Pandoc, and some even brighter person packaged up some [Ruby
|
|
bindings](https://github.com/xwmx/pandoc-ruby) for it. Taking that same
|
|
content and running it through `PandocRuby.html(content).to_plain` gives
|
|
us:
|
|
|
|
```
|
|
Headline
|
|
|
|
A paragraph.
|
|
|
|
- List item #1
|
|
- List item #2
|
|
```
|
|
|
|
Much better, and though you can't tell from this basic example, Pandoc
|
|
does a great job with spacing and wrapping to generate really
|
|
nice-looking plain text from HTML.
|
|
|
|
### HTML Element ➞ Text
|
|
|
|
A few months ago, we were doing Pointless Weekend and needed a domain
|
|
for our
|
|
[Thrillr](https://www.viget.com/articles/plan-a-killer-party-with-thrillr/)
|
|
app. A few of us were looking through lists of fun top-level domains,
|
|
but we realized that AWS Route 53 only supports a limited set of them.
|
|
In order to get everyone the actual list, I needed a way to get all the
|
|
content out of an HTML `<select>` element, and you'll never guess what I
|
|
did (unless you guessed "use Pandoc"). In Firefox:
|
|
|
|
- Right click the select element, then click "Inspect"
|
|
- Find the `<select>` in the DOM view that pops up
|
|
- Right click it, then go to "Copy", then "Inner HTML"
|
|
- You'll now have all of the `<option>` elements on your clipboard
|
|
- In your terminal, run `pbpaste | pandoc -t plain`
|
|
|
|
The result is something like this:
|
|
|
|
```
|
|
.ac - $76.00
|
|
.academy - $12.00
|
|
.accountants - $94.00
|
|
.agency - $19.00
|
|
.apartments - $47.00
|
|
.associates - $29.00
|
|
.au - $15.00
|
|
.auction - $29.00
|
|
...
|
|
```
|
|
|
|
### Preview Mermaid/Markdown (`--standalone`)
|
|
|
|
A different client recently asked for an architecture diagram of a
|
|
complex system that [Andrew](https://www.viget.com/about/team/athomas/)
|
|
and I were working on, and we opted to use
|
|
[Mermaid](https://mermaid-js.github.io/mermaid/#/) (which is rad BTW) to
|
|
create sequence diagrams to illustrate all of the interactions. Both
|
|
GitHub and GitLab support Mermaid natively, which is really neat, but we
|
|
wanted a way to quickly iterate on our diagrams without having to push
|
|
changes to the remote repo.
|
|
|
|
We devised a simple build chain ([demo version available
|
|
here](https://github.com/dce/mermaid-js-demo)) that watches for changes
|
|
to a Markdown file, converts the Mermaid blocks to SVG, and then uses
|
|
Pandoc to take the resulting document and convert it to a styled HTML
|
|
page using the `--standalone` option ([here's the key
|
|
line](https://github.com/dce/mermaid-js-demo/blob/main/bin/build#L7=)).
|
|
Then we could simply make our changes and refresh the page to see our
|
|
progress.
|
|
|
|
### Generate a PDF
|
|
|
|
Finally, and this is not something I need to do very often, but Pandoc
|
|
also includes several ways to create PDF documents. The simplest (IMO)
|
|
is to install `wkhtmltopdf`, then instruct Pandoc to convert its input
|
|
to HTML but use `.pdf` in the output filename, so something like:
|
|
|
|
```sh
|
|
echo "# Hello\n\nIs it me you're looking for?" \
|
|
| pandoc -t html -o hello.pdf
|
|
```
|
|
|
|
[The result is quite nice.](hello.pdf)
|
|
|
|
------------------------------------------------------------------------
|
|
|
|
I think that's about all I have to say about Pandoc for today. A couple
|
|
final thoughts:
|
|
|
|
- Pandoc is incredibly powerful -- I've really only scratched the
|
|
surface here. Look at the [man page](https://manpages.org/pandoc) to
|
|
get a sense of everything it can do.
|
|
- Pandoc is written in Haskell, and [the
|
|
source](https://github.com/jgm/pandoc/blob/master/src/Text/Pandoc/Readers/Markdown.hs)
|
|
is pretty fun to look through if you're a certain kind of person.
|
|
|
|
So install Pandoc with your package manager of choice and give it a
|
|
shot. I think you'll find it unexpectedly useful.
|
|
|
|
*[Swiss army knife icons created by smalllikeart -
|
|
Flaticon](https://www.flaticon.com/free-icons/swiss-army-knife "swiss army knife icons")*
|
|
|
|
[^1]: My writing process is (generally):
|
|
|
|
1. Write down an idea in my notebook
|
|
2. Gradually add a series of bullet points (this can sometimes take
|
|
awhile)
|
|
3. Once I feel like I have a solid outline, copy that into a
|
|
Markdown file
|
|
4. Start collecting links (in the `[1]:` footnote style)
|
|
5. Write a intro
|
|
6. Convert the bullet points to headers, edit + rearrange
|
|
7. Fill in all the sections, write jokes, etc.
|
|
8. Write a conclusion
|
|
9. Create a Gist, get feedback from the team
|
|
10. Convert Markdown to HTML, copy to clipboard
|
|
(`cat [file] | pandoc -t html | pbcopy`)
|
|
11. Create a new post in Craft, add a text section, flip to code
|
|
view, paste clipboard contents
|
|
12. Fill in the rest of the post metadata
|
|
13. 🚢
|
|
|
|
[^2]: I've actually got this wired up as a Vim command in `.vimrc`:
|
|
|
|
```vim
|
|
command Mdpreview ! cat %
|
|
\ | pandoc -t html
|
|
\ > /tmp/output.html
|
|
\ && open /tmp/output.html
|
|
\ && read -n 1
|
|
\ && rm /tmp/output.html
|
|
```
|