267 lines
12 KiB
Plaintext
267 lines
12 KiB
Plaintext
[1]Piotr Migdał[2]Blog[3]Projects[4]Publications[5]Resume
|
||
|
||
If it is worth keeping, save it in Markdown
|
||
|
||
17 Feb 2025 | by Piotr Migdał
|
||
|
||
• [6]r/DataHoarder thread
|
||
• [7]r/ObisdianMD thread
|
||
• [8]Hacker News front page
|
||
|
||
One of Stanisław Lem's stories, [9]The Memoirs Found in a Bathtub, begins with
|
||
a strange phenomenon that turns all written materials into dust. While this is
|
||
science fiction, something similar happens in our digital world.
|
||
|
||
[10]Digital memento mori
|
||
|
||
If you publish something online, sooner or later, it will vanish.^[11]1
|
||
|
||
In the best-case scenario, a link changes during website restructuring. More
|
||
commonly, the content is lost. The only hope is that someone saved it from
|
||
oblivion in the [12]Internet Archive Wayback Machine.
|
||
|
||
Walled gardens requiring login are even worse - when they go down, everything
|
||
within them vanishes forever. If you haven't saved it yourself, it's gone.
|
||
Moreover, any service (free or paid) may restrict access to content at any time
|
||
- either completely or practically, by making it impossible to find what you're
|
||
looking for. The same content you posted on Twitter a few years ago, now is on
|
||
X, and in a few years might be available after login, paid subscription, or -
|
||
not at all .
|
||
|
||
Even self-hosting isn't foolproof - your content can vanish when you forget to
|
||
pay for hosting or after a server crash. And even if your data survives,
|
||
accessing it can be tricky: WordPress blogs store posts in databases that
|
||
server updates can break. I learned this lesson when my PHP photo gallery went
|
||
down - thankfully, I had kept all photos as simple JPGs organized by date.
|
||
|
||
The only reliable solution is to store content in formats that can be opened
|
||
without specialized software - formats that will remain accessible for decades
|
||
to come.
|
||
|
||
[galadriel-]
|
||
Galadriel in "the Lord of the Rings" opening scene ([13]video, [14]transcript).
|
||
|
||
[15]Why things are worth saving
|
||
|
||
There are many motivations for preserving content, ranging from a digital "non
|
||
omnis moriar" through practical arguments, to archiving as a goal in itself^
|
||
[16]2.
|
||
|
||
For me, the key reasons are:
|
||
|
||
• I want to keep and own things I wrote - they are parts of me, my history,
|
||
my lived experience
|
||
• I want to have everything in one place and easily searchable
|
||
• I want to use it with AI tools (looking for similar notes, summarizing,
|
||
using as context)
|
||
• I want to be able to reuse or share things however I want (email, blog
|
||
post, ebook, anything)
|
||
|
||
[17]Plaintext
|
||
|
||
As a data scientist, [18]I turn things into vectors.
|
||
As an unabashed archivist, I turn things into Markdown.
|
||
|
||
The most durable solution would be carving things in stone - it would last for
|
||
millennia. But that's hardly practical, and it wouldn't make things easily
|
||
searchable or shareable.
|
||
|
||
The second best option is plaintext files with UTF-8 encoding and Markdown
|
||
formatting^[19]3. As long as computers exist, we'll be able to read plaintext
|
||
files with ease.
|
||
|
||
Markdown files are essentially plaintext with some extra syntax for common
|
||
elements like sections, bullet points, and links. The format deliberately
|
||
avoids precise control over display details like font selection^[20]4.
|
||
Following [21]the rule of least power, I consider this limitation a feature.
|
||
For contrast, consider PDF - a format so powerful that [22]it can run Doom.
|
||
|
||
For personal notes, I use [23]Obsidian, a note-taking app I love and use daily.
|
||
While it's a powerful tool with great plugins, what keeps me loyal is its
|
||
simplicity - it stores everything in plain files. The lack of a proprietary
|
||
format moat is precisely what makes it so compelling.
|
||
|
||
For blogging, most [24]static site generators embrace Markdown. This very blog
|
||
post is written in Markdown^[25]5. Using the same markup for note-taking and
|
||
publishing makes sharing smooth.
|
||
|
||
[26]How I do it
|
||
|
||
I dream of automatically converting everything I write or encounter into
|
||
Markdown. The reality is messier - there's a constant tension between my
|
||
autistic urge to archive everything and my ADHD that makes maintaining such
|
||
systems challenging.
|
||
|
||
So I take a pragmatic approach - when I find content worth keeping, I copy it
|
||
to a markdown file, adding frontmatter with its publication date, source, and
|
||
relevant tags:
|
||
|
||
[sauna-post]
|
||
|
||
I particularly save things I post that might be useful later. Conference talk
|
||
abstracts, sauna event descriptions, technical explanations - in the future,
|
||
they're much easier to find and reuse.
|
||
|
||
When I catch myself searching for old content (like a Facebook post I want to
|
||
share or reread), I save it immediately. If I discover a blog post has
|
||
vanished, I retrieve it from the Wayback Machine and preserve it. When
|
||
forwarding an email with a detailed explanation - you guessed it, I save it.
|
||
|
||
Content worth searching for once is content worth preserving forever.
|
||
|
||
Worried about saving too much? Well, disk storage is cheap - and for text
|
||
files, it's practically free.
|
||
|
||
[27]Tools that help
|
||
|
||
Sometimes manual copying suffices. For trickier formatting, AI tools are
|
||
invaluable - being trained on Markdown, they excel at processing and extracting
|
||
content. You can use them to convert online text or parse PDFs (like slides),
|
||
as shown in [28]Ingesting Millions of PDFs and why Gemini 2.0 Changes
|
||
Everything.
|
||
|
||
For some sources, I've created semi-automated solutions. For instance, I wrote
|
||
a [29]Python script to convert my Kindle highlights and notes into Markdown.
|
||
|
||
Many tools exist to help with format conversion. The most versatile is [30]
|
||
pandoc, which can convert between dozens of formats - from Word documents to
|
||
LaTeX, and everything in between.
|
||
|
||
The community has also created specialized tools for specific platforms. You
|
||
can find tools for converting [31]Medium posts to Markdown (either from export
|
||
or [32]directly by URL), [33]archiving Reddit threads, and many other use
|
||
cases.
|
||
|
||
Since we're dealing with lightweight text files, there are many for backing it
|
||
up. Git is particularly well-suited for version-controlling and syncing this
|
||
content.
|
||
|
||
Additionally, in each service I own, I periodically download my data. Even if
|
||
it's a mesh of JSON, XML, HTML, CSV and other formats, I have it. Even if at a
|
||
given moment I have no time to process it into Markdown, at least the data is
|
||
there.
|
||
|
||
[34]Next steps
|
||
|
||
I would love to have a comprehensive tool for exporting everything - especially
|
||
from social media. Both the posts that resonated with many people and those
|
||
that hold personal significance deserve preservation.
|
||
|
||
While Facebook offers limited data export capabilities, they're incomplete.
|
||
Most notably, there's no way to preserve entire discussion threads - often the
|
||
most valuable part of a post.
|
||
|
||
And you - what content do you find yourself searching for? What have you
|
||
archived, and what do you wish you had saved?
|
||
|
||
Discuss this post on [35]Hacker News, [36]Mastodon, [37]Reddit, or [38]LinkedIn
|
||
.
|
||
|
||
[39]Footnotes
|
||
|
||
1. [40]Link rot can be addressed using services like [41]Perma.cc - though
|
||
they too could eventually disappear. Studies show that for legal documents,
|
||
half of links die within 5 years. My focus here is on preserving and
|
||
searching personal content. [42]↩
|
||
2. But for practical reasons, and hoarding for its own sake, I gathered over
|
||
14k links in [43]Pinboard. Yes, downloaded data in JSON. [44]↩
|
||
3. I don't claim Markdown is the only solution. There are valid reasons to use
|
||
other formats. My focus is on plaintext in UTF-8. If you prefer other
|
||
markup languages (like reStructuredText, AsciiDoc, Org-Mode) or just plain
|
||
text without formatting - the principles still apply. In some cases
|
||
original format works - e.g. if it is JSON or code. [45]↩
|
||
4. Consider HTML (Hypertext Markup Language) as a counterexample. It was meant
|
||
to enrich text with semantics, but now serves primarily as a tool for
|
||
building UIs. While this evolution brought many benefits, typical end-user
|
||
HTML is no longer suitable for pure content storage. At the same time, if
|
||
you can use simple HTML with actual semantic <strong> and <em> tags, go for
|
||
it. But it's often a slippery slope - from "just add a few colors," through
|
||
"add tables," to creating a full-fledged app. [46]↩
|
||
5. This blog uses [47]Nuxt 3 Content (source: [48]github.com/stared/
|
||
stared.github.io). It follows my previous versions in [49]Jekyll and [50]
|
||
Gridsome. Thanks to Markdown, migration between platforms has been seamless
|
||
- see [51]New blog - moving from Medium to Gridsome. For the latest
|
||
migration from Gridsome to Nuxt 3 Content, [52]Cursor IDE was a great help.
|
||
[53]Astro is another static site generator gaining significant traction.
|
||
[54]↩
|
||
|
||
See also cosine-similar posts
|
||
|
||
• 0.617[55]New blog - moving from Medium to Gridsome
|
||
• 0.604[56]How I learned to stop worrying and love the types & tests
|
||
• 0.598[57]AI won’t make artists redundant - thanks to information theory
|
||
• 0.591[58]ADHD tech stack: auto time tracking
|
||
• 0.589[59]The first post: why Jekyll?
|
||
|
||
By [60]Piotr Migdał, a curious being, doctor of sorcery. See [61]my other blog
|
||
posts.
|
||
|
||
Keep in the loop with the [62]RSS feed or join the [63]newsletter.
|
||
|
||
|
||
References:
|
||
|
||
[1] https://p.migdal.pl/
|
||
[2] https://p.migdal.pl/blog
|
||
[3] https://p.migdal.pl/projects
|
||
[4] https://p.migdal.pl/publications
|
||
[5] https://p.migdal.pl/resume
|
||
[6] https://www.reddit.com/r/DataHoarder/comments/1is1wbn/if_it_is_worth_keeping_save_it_in_markdown/
|
||
[7] https://www.reddit.com/r/ObsidianMD/comments/1is1snu/if_it_is_worth_keeping_save_it_in_markdown/
|
||
[8] https://news.ycombinator.com/item?id=43137616
|
||
[9] https://en.wikipedia.org/wiki/Memoirs_Found_in_a_Bathtub
|
||
[10] https://p.migdal.pl/blog/2025/02/markdown-saves/#digital-memento-mori
|
||
[11] https://p.migdal.pl/blog/2025/02/markdown-saves#user-content-fn-link-rot
|
||
[12] https://web.archive.org/
|
||
[13] https://www.youtube.com/watch?v=qj139dE7tFI
|
||
[14] https://www.tk421.net/lotr/film/fotr/01.html
|
||
[15] https://p.migdal.pl/blog/2025/02/markdown-saves/#why-things-are-worth-saving
|
||
[16] https://p.migdal.pl/blog/2025/02/markdown-saves#user-content-fn-pinboard
|
||
[17] https://p.migdal.pl/blog/2025/02/markdown-saves/#plaintext
|
||
[18] https://p.migdal.pl/blog/2025/01/dont-use-cosine-similarity
|
||
[19] https://p.migdal.pl/blog/2025/02/markdown-saves#user-content-fn-plaintext
|
||
[20] https://p.migdal.pl/blog/2025/02/markdown-saves#user-content-fn-html
|
||
[21] https://en.wikipedia.org/wiki/Rule_of_least_power
|
||
[22] https://www.reddit.com/r/itrunsdoom/comments/1i02c6b/doom_in_a_pdf_file/
|
||
[23] https://obsidian.md/
|
||
[24] https://jamstack.org/generators/
|
||
[25] https://p.migdal.pl/blog/2025/02/markdown-saves#user-content-fn-blog
|
||
[26] https://p.migdal.pl/blog/2025/02/markdown-saves/#how-i-do-it
|
||
[27] https://p.migdal.pl/blog/2025/02/markdown-saves/#tools-that-help
|
||
[28] https://www.sergey.fyi/articles/gemini-flash-2
|
||
[29] https://gist.github.com/stared/ce732ef27d97d559b34d7e294481f1b0
|
||
[30] https://github.com/jgm/pandoc
|
||
[31] https://github.com/gautamdhameja/medium-2-md
|
||
[32] https://medium2md.nabilmansour.com/
|
||
[33] https://farnots.github.io/RedditToMarkdown/
|
||
[34] https://p.migdal.pl/blog/2025/02/markdown-saves/#next-steps
|
||
[35] https://news.ycombinator.com/item?id=43137616
|
||
[36] https://mathstodon.xyz/@pmigdal/114021315189570737
|
||
[37] https://www.reddit.com/r/DataHoarder/comments/1is1wbn/if_it_is_worth_keeping_save_it_in_markdown/
|
||
[38] https://www.linkedin.com/posts/piotrmigdal_if-it-is-worth-keeping-save-it-in-markdown-activity-7299139148634841089-_Xe3
|
||
[39] https://p.migdal.pl/blog/2025/02/markdown-saves/#footnote-label
|
||
[40] https://en.wikipedia.org/wiki/Link_rot
|
||
[41] https://perma.cc/
|
||
[42] https://p.migdal.pl/blog/2025/02/markdown-saves#user-content-fnref-link-rot
|
||
[43] https://pinboard.in/
|
||
[44] https://p.migdal.pl/blog/2025/02/markdown-saves#user-content-fnref-pinboard
|
||
[45] https://p.migdal.pl/blog/2025/02/markdown-saves#user-content-fnref-plaintext
|
||
[46] https://p.migdal.pl/blog/2025/02/markdown-saves#user-content-fnref-html
|
||
[47] https://content.nuxt.com/
|
||
[48] https://github.com/stared/stared.github.io
|
||
[49] https://jekyllrb.com/
|
||
[50] https://gridsome.org/
|
||
[51] https://p.migdal.pl/blog/2022/12/medium-to-markdown
|
||
[52] https://www.cursor.com/
|
||
[53] https://astro.build/
|
||
[54] https://p.migdal.pl/blog/2025/02/markdown-saves#user-content-fnref-blog
|
||
[55] https://p.migdal.pl/blog/2022/12/medium-to-markdown
|
||
[56] https://p.migdal.pl/blog/2020/03/types-tests-typescript
|
||
[57] https://p.migdal.pl/blog/2023/02/ai-artists-information-theory
|
||
[58] https://p.migdal.pl/blog/2020/05/adhd-tech-stack-auto-time-tracking
|
||
[59] https://p.migdal.pl/blog/2015/12/first-post
|
||
[60] https://p.migdal.pl/
|
||
[61] https://p.migdal.pl/blog
|
||
[62] https://p.migdal.pl/feed.xml
|
||
[63] https://eepurl.com/bVJlgL
|