--- title: "Pandoc: A Tool I Use and Like" date: 2022-05-25T00:00:00+00:00 draft: false canonical_url: https://www.viget.com/articles/pandoc-a-tool-i-use-and-like/ --- Today I want to talk to you about one of my favorite command-line tools, [Pandoc](https://pandoc.org/). From the project website: > If you need to convert files from one markup format into another, > pandoc is your swiss-army knife. I spend a lot of time writing, and I love [Vim](https://www.vim.org/), [Markdown](https://daringfireball.net/projects/markdown/), and the command line (and avoid browser-based WYSIWYG editors when I can), so that's where a lot of my Pandoc use comes in, but it has a ton of utility outside of that -- really, anywhere you need to move between different text-based formats, Pandoc can probably help. A few examples from recent memory: ### Markdown ➞ Craft Blog Post This website you're reading presently uses [Craft CMS](https://craftcms.com/), a flexible and powerful content management system that doesn't perfectly match my writing process[^1]. Rather than composing directly in Craft, I prefer to write locally, pipe the output through Pandoc, and put the resulting HTML into a text block in the CMS. This gets me a few things I really like: - Curly quotes in place of straight ones and en-dashes in place of `--` (from the [`smart` extension](https://pandoc.org/MANUAL.html#extension-smart)) - [Daring Fireball-style](https://daringfireball.net/2005/07/footnotes) footnotes with return links By default, Pandoc uses [Pandoc Markdown](https://garrettgman.github.io/rmarkdown/authoring_pandoc_markdown.html) when converting Markdown docs to other formats, an "extended and slightly revised version" of the original syntax, which is how footnotes and a bunch of other things work. ### Markdown ➞ Rich Text (Basecamp) I also sometimes find myself writing decently long [Basecamp](https://basecamp.com/) posts. Basecamp 3 has a fine WYSIWYG editor (ðŸŠĶ Textile), but again, I'd rather be in Vim. Pasting HTML into Basecamp doesn't work (just shows the code verbatim), but I've found that if I convert my Markdown notes to HTML and open the HTML in a browser, I can copy and paste that directly into Basecamp with good results. Leveraging MacOS' `open` command, this one-liner does the trick[^2]: ```sh cat [filename.md] \ | pandoc -t html \ > /tmp/output.html \ && open /tmp/output.html \ && read -n 1 \ && rm /tmp/output.html ``` This will convert the contents to HTML, save that to a file, open the file in a browser, wait for the user to hit enter, and the remove the file. Without that `read -n 1`, it'll remove the file before the browser has a chance to open it. ### HTML ➞ Text We built an app for one of our clients that takes in news articles (in HTML) via an API and sends them as emails to *their* clients (think big brands) if certain criteria are met. Recently, we were making improvements to the plain text version of the emails, and we noticed that some of the articles were coming in without any linebreaks in the content. When we removed the HTML (via Rails' [`strip_tags` helper](https://apidock.com/rails/ActionView/Helpers/SanitizeHelper/strip_tags)), the resulting content was all on one line, which wasn't very readable. So imagine an article like this: ```html

Headline

A paragraph.

``` Our initial approach (with `strip_tags`) gives us this: ``` Headline A paragraph. List item #1 List item #2 ``` Not great! But fortunately, some bright fellow had the idea to pull in Pandoc, and some even brighter person packaged up some [Ruby bindings](https://github.com/xwmx/pandoc-ruby) for it. Taking that same content and running it through `PandocRuby.html(content).to_plain` gives us: ``` Headline A paragraph. - List item #1 - List item #2 ``` Much better, and though you can't tell from this basic example, Pandoc does a great job with spacing and wrapping to generate really nice-looking plain text from HTML. ### HTML Element ➞ Text A few months ago, we were doing Pointless Weekend and needed a domain for our [Thrillr](https://www.viget.com/articles/plan-a-killer-party-with-thrillr/) app. A few of us were looking through lists of fun top-level domains, but we realized that AWS Route 53 only supports a limited set of them. In order to get everyone the actual list, I needed a way to get all the content out of an HTML `` in the DOM view that pops up - Right click it, then go to "Copy", then "Inner HTML" - You'll now have all of the `