mdrenum post
This commit is contained in:
286
static/archive/leancrew-com-7l5uqs.txt
Normal file
286
static/archive/leancrew-com-7l5uqs.txt
Normal file
@@ -0,0 +1,286 @@
|
||||
#[1]RSS Feed for ANIAT [2]JSON Feed for ANIAT
|
||||
|
||||
[snowman-200.jpg]
|
||||
|
||||
[3]And now it’s all this
|
||||
|
||||
I just said what I said and it was wrong
|
||||
Or was taken wrong
|
||||
|
||||
[4]Next post [5]Previous post
|
||||
|
||||
[6]Tidying Markdown reference links
|
||||
|
||||
September 17, 2012 at 9:15 PM by Dr. Drang
|
||||
|
||||
Oscar Wilde—who would have been great on Twitter—[7]said “I couldn’t
|
||||
help it. I can resist everything except temptation.” That’s my excuse
|
||||
for this post.
|
||||
|
||||
Several days ago I got an email from a reader, asking if I knew of a
|
||||
script that would tidy up [8]Markdown reference links in a document.
|
||||
She wanted them reordered and renumbered at the end of the document to
|
||||
match the order in which they appear in the body of the text. I didn’t
|
||||
know of one^[9]1 and suggested she write it herself and let me know
|
||||
when it’s done. I’ve been getting progress reports, but her script
|
||||
isn’t finished yet.
|
||||
|
||||
There’s certainly no need to tidy the links up that way. Markdown
|
||||
doesn’t care what order the reference links appear in or the labels
|
||||
that are assigned to them. I’ve written dozens of posts in which the
|
||||
order of the references at the end of the Markdown source were way off
|
||||
from the order of the links in body. But…
|
||||
|
||||
But there is an attraction to putting everything in apple pie order,
|
||||
even when no one but me will ever see it. Last night I succumbed and
|
||||
wrote a script to tidy up the links. Sorry, Phaedra.
|
||||
|
||||
Here’s an example of a short Markdown document with out-of-order
|
||||
reference links:
|
||||
Species and their hybrids, How simply are these facts! How
|
||||
strange that the pollen of each But we may thus have
|
||||
[succeeded][2] in selecting so many exceptions to this rule.
|
||||
but the species would not all the same species living on the
|
||||
White Mountains, in the arctic regions of that large island.
|
||||
The exceptions which are now large, and triumphant, and
|
||||
which are known to every naturalist: scarcely a single
|
||||
[character][4] in the descendants of the Glacial period,
|
||||
would have been of use to the plants, have been accumulated
|
||||
and if, in both regions.
|
||||
|
||||
Supposed to be extinct and unknown, form. We have seen that
|
||||
it yields readily, when subjected as [under confinement][3],
|
||||
to new and improved varieties will have been much
|
||||
compressed, we may assume that the species, which are
|
||||
already present in the ordinary spines serve as a prehensile
|
||||
or snapping apparatus. Thus every gradation, from animals
|
||||
with true lungs are descended from a marsupial form), "and
|
||||
if so, there can be followed by which viscid matter, such as
|
||||
that of making [slaves][1]. Let it be remembered that
|
||||
selection may be extended--to the stigma of.
|
||||
|
||||
[1]: http://daringfireball.net/markdown/
|
||||
[2]: http://www.google.com/
|
||||
[3]: http://docs.python.org/library/index.html
|
||||
[4]: http://www.kungfugrippe.com/
|
||||
|
||||
Note that the references are numbered 1, 2, 3, 4 at the bottom of the
|
||||
document, but that they appear in the body in the order 2, 4, 3, 1. The
|
||||
purpose of the script is to change the document to
|
||||
Species and their hybrids, How simply are these facts! How
|
||||
strange that the pollen of each But we may thus have
|
||||
[succeeded][1] in selecting so many exceptions to this rule.
|
||||
but the species would not all the same species living on the
|
||||
White Mountains, in the arctic regions of that large island.
|
||||
The exceptions which are now large, and triumphant, and
|
||||
which are known to every naturalist: scarcely a single
|
||||
[character][2] in the descendants of the Glacial period,
|
||||
would have been of use to the plants, have been accumulated
|
||||
and if, in both regions.
|
||||
|
||||
Supposed to be extinct and unknown, form. We have seen that
|
||||
it yields readily, when subjected as [under confinement][3],
|
||||
to new and improved varieties will have been much
|
||||
compressed, we may assume that the species, which are
|
||||
already present in the ordinary spines serve as a prehensile
|
||||
or snapping apparatus. Thus every gradation, from animals
|
||||
with true lungs are descended from a marsupial form), "and
|
||||
if so, there can be followed by which viscid matter, such as
|
||||
that of making [slaves][4]. Let it be remembered that
|
||||
selection may be extended--to the stigma of.
|
||||
|
||||
|
||||
[1]: http://www.google.com/
|
||||
[2]: http://docs.python.org/library/index.html
|
||||
[3]: http://www.kungfugrippe.com/
|
||||
[4]: http://daringfireball.net/markdown/
|
||||
|
||||
Now the links are numbered 1, 2, 3, 4 in both the text and the end
|
||||
references. The HTML produced when this document is run through a
|
||||
Markdown processor will be the same as the previous one—the links will
|
||||
still go to the right places—but the Markdown source looks better.
|
||||
|
||||
Here’s the script that does it:
|
||||
python:
|
||||
1: #!/usr/bin/python
|
||||
2:
|
||||
3: import sys
|
||||
4: import re
|
||||
5:
|
||||
6: '''Read a Markdown file via standard input and tidy its
|
||||
7: reference links. The reference links will be numbered in
|
||||
8: the order they appear in the text and placed at the bottom
|
||||
9: of the file.'''
|
||||
10:
|
||||
11: # The regex for finding reference links in the text. Don't find
|
||||
12: # footnotes by mistake.
|
||||
13: link = re.compile(r'\[([^\]]+)\]\[([^^\]]+)\]')
|
||||
14:
|
||||
15: # The regex for finding the label. Again, don't find footnotes
|
||||
16: # by mistake.
|
||||
17: label = re.compile(r'^\[([^^\]]+)\]:\s+(.+)$', re.MULTILINE)
|
||||
18:
|
||||
19: def refrepl(m):
|
||||
20: 'Rewrite reference links with the reordered link numbers.'
|
||||
21: return '[%s][%d]' % (m.group(1), order.index(m.group(2)) + 1)
|
||||
22:
|
||||
23: # Read in the file and find all the links and references.
|
||||
24: text = sys.stdin.read()
|
||||
25: links = link.findall(text)
|
||||
26: labels = dict(label.findall(text))
|
||||
27:
|
||||
28: # Determine the order of the links in the text. If a link is used
|
||||
29: # more than once, its order is its first position.
|
||||
30: order = []
|
||||
31: for i in links:
|
||||
32: if order.count(i[1]) == 0:
|
||||
33: order.append(i[1])
|
||||
34:
|
||||
35: # Make a list of the references in order of appearance.
|
||||
36: newlabels = [ '[%d]: %s' % (i + 1, labels[j]) for (i, j) in enumerate(order
|
||||
) ]
|
||||
37:
|
||||
38: # Remove the old references and put the new ones at the end of the text.
|
||||
39: text = label.sub('', text).rstrip() + '\n'*3 + '\n'.join(newlabels)
|
||||
40:
|
||||
41: # Rewrite the links with the new reference numbers.
|
||||
42: text = link.sub(refrepl, text)
|
||||
43:
|
||||
44: print text
|
||||
|
||||
The regular expressions in Lines 13 and 17 are fairly easy to
|
||||
understand. The first one looks for the links in the body of the text
|
||||
and the second looks for the labels.
|
||||
|
||||
The key to the script are the four data structures: links, labels,
|
||||
order, and newlabels. For our example document, links is the list of
|
||||
tuples
|
||||
[('succeeded', '2'),
|
||||
('single character', '4'),
|
||||
('under confinement', '3'),
|
||||
('slaves', '1')]
|
||||
|
||||
labels is the dictionary
|
||||
{'1': 'http://daringfireball.net/markdown/',
|
||||
'3': 'http://docs.python.org/library/index.html',
|
||||
'2': 'http://www.google.com/',
|
||||
'4': 'http://www.kungfugrippe.com/'}
|
||||
|
||||
order is the list
|
||||
['2', '4', '3', '1']
|
||||
|
||||
and newlabels is the list of strings
|
||||
['[1]: http://www.google.com/',
|
||||
'[2]: http://docs.python.org/library/index.html',
|
||||
'[3]: http://www.kungfugrippe.com/',
|
||||
'[4]: http://daringfireball.net/markdown/']
|
||||
|
||||
links and labels are built via the regex findall method in Lines 25-26.
|
||||
links is the direct output of the method and maintains the order in
|
||||
which the links appear in the text. labels is that same output, but
|
||||
converted to a dictionary. Its order, which we don’t care about, is
|
||||
lost in the conversion, but it can be used to easily access the URL
|
||||
from the link label.
|
||||
|
||||
order is the order in which the link labels first appear in the text.
|
||||
The if statement in Line 32 ensures that repeated links don’t overwrite
|
||||
each other.
|
||||
|
||||
newlabels is built from labels and order in Line 36. It’s the list of
|
||||
labels after the renumbering. Line 39 deletes the original label lines
|
||||
and puts the new ones at the end of the document.
|
||||
|
||||
Finally, Line 42 replaces all the link labels in the body of the text
|
||||
with the new values. Rather than a replacement string, it uses a simple
|
||||
replacement function defined in Lines 19-21 to do so.
|
||||
|
||||
Barring any bugs I haven’t found yet, this script (or filter) will work
|
||||
on any Markdown document and can be used either directly from the
|
||||
command line or through whatever system your text editor uses to call
|
||||
external scripts. I have it stored in BBEdit’s Text Filters folder
|
||||
under the name “Tidy Markdown Reference Links.py,” so I can call it
|
||||
from the Text ‣ Apply Text Filter submenu.
|
||||
|
||||
I should mention that although this script is fairly compact and
|
||||
simple, it didn’t spring from my head fully formed. There were starts
|
||||
and stops as I figured out which data structures were needed and how
|
||||
they could be built. Each little subsection of the script was tested as
|
||||
I went along. The order list was originally a list of tuples; it wasn’t
|
||||
until I had a working version of the entire script that I realized that
|
||||
it could be simplified down to a list of link labels. That change
|
||||
shortened the script by five lines or so and, more importantly,
|
||||
clarified its logic.
|
||||
|
||||
Despite these improvements, the script is hardly foolproof. The
|
||||
Markdown source of this very post confuses the hell out it. Not only
|
||||
does it think there are links in the sample document (which you’d
|
||||
probably guess), it also thinks the [%s][%d] in Line 21 of the script
|
||||
is a link (and the one in this sentence, too). And why wouldn’t it? To
|
||||
distinguish between real links and things that look like links in
|
||||
embedded source code, the script would have to be able to parse
|
||||
Markdown, not just match a couple of short regular expressions. This is
|
||||
a variant on what Hamish Sanderson said in the comments on [10]an
|
||||
earlier post.
|
||||
|
||||
At the moment, I’m not willing to sacrifice the simplicity of the Tidy
|
||||
script to get it to handle weird posts like this one. But if I find
|
||||
that it fails often with the kind of input I commonly give it, I’ll
|
||||
have to revisit that decision.
|
||||
|
||||
As Wilde also said, “Experience is the name everyone gives to their
|
||||
mistakes.”
|
||||
__________________________________________________________________
|
||||
|
||||
1. I didn’t think [11]Seth Brown’s formd did that, but [12]this tweet
|
||||
from Brett Terpsta says I was wrong about that. [13]↩
|
||||
|
||||
[14]Next post [15]Previous post
|
||||
|
||||
Site search
|
||||
|
||||
____________________ Go!
|
||||
|
||||
Meta
|
||||
|
||||
* drdrang at leancrew
|
||||
* [16]Blog archive
|
||||
* [17]RSS feed
|
||||
* [18]JSON feed
|
||||
* [19]Mastodon
|
||||
* [20]GitHub repositories
|
||||
|
||||
Recent posts
|
||||
|
||||
Credits
|
||||
|
||||
[21]Powered by MathJax
|
||||
|
||||
This work is licensed under a [22]Creative Commons Attribution-Share
|
||||
Alike 3.0 Unported License.
|
||||
|
||||
© 2005–2023, Dr. Drang
|
||||
|
||||
References
|
||||
|
||||
1. https://leancrew.com/all-this/feed/
|
||||
2. https://leancrew.com/all-this/feed.json
|
||||
3. https://leancrew.com/all-this/
|
||||
4. https://leancrew.com/all-this/2012/09/some-kind-of-druid-dudes-lifting-the-veil/
|
||||
5. https://leancrew.com/all-this/2012/09/implementing-pubsubhubbub/
|
||||
6. https://leancrew.com/all-this/2012/09/tidying-markdown-reference-links/
|
||||
7. http://www.gutenberg.org/dirs/etext97/lwfan10h.htm
|
||||
8. http://daringfireball.net/projects/markdown/syntax#link
|
||||
9. file:///var/folders/q9/qlz2w5251kzdfgn0np7z2s4c0000gn/T/L97479-6275TMP.html#fn:formd
|
||||
10. http://www.leancrew.com/all-this/2012/09/applescript-syntax-highlighting-finally/
|
||||
11. http://www.drbunsen.org/formd-a-markdown-formatting-tool.html
|
||||
12. https://twitter.com/ttscoff/status/247398632377184256
|
||||
13. file:///var/folders/q9/qlz2w5251kzdfgn0np7z2s4c0000gn/T/L97479-6275TMP.html#fnref:formd
|
||||
14. https://leancrew.com/all-this/2012/09/some-kind-of-druid-dudes-lifting-the-veil/
|
||||
15. https://leancrew.com/all-this/2012/09/implementing-pubsubhubbub/
|
||||
16. https://leancrew.com/all-this/archive/
|
||||
17. https://leancrew.com/all-this/feed/
|
||||
18. https://leancrew.com/all-this/feed.json
|
||||
19. https://fosstodon.org/@drdrang
|
||||
20. http://github.com/drdrang
|
||||
21. http://www.mathjax.org/
|
||||
22. http://creativecommons.org/licenses/by-sa/3.0/
|
||||
Reference in New Issue
Block a user