Use w3m for archiving
This commit is contained in:
@@ -1,42 +1,39 @@
|
||||
#[1]RSS Feed for ANIAT [2]JSON Feed for ANIAT
|
||||
[snowman-20]
|
||||
|
||||
[snowman-200.jpg]
|
||||
|
||||
[3]And now it’s all this
|
||||
[1]And now it’s all this
|
||||
|
||||
I just said what I said and it was wrong
|
||||
Or was taken wrong
|
||||
|
||||
[4]Next post [5]Previous post
|
||||
[2]Next post [3]Previous post
|
||||
|
||||
[6]Tidying Markdown reference links
|
||||
[4]Tidying Markdown reference links
|
||||
|
||||
September 17, 2012 at 9:15 PM by Dr. Drang
|
||||
September 17, 2012 at 9:15 PM by Dr. Drang
|
||||
|
||||
Oscar Wilde—who would have been great on Twitter—[7]said “I couldn’t
|
||||
help it. I can resist everything except temptation.” That’s my excuse
|
||||
for this post.
|
||||
Oscar Wilde—who would have been great on Twitter—[5]said “I couldn’t help it. I
|
||||
can resist everything except temptation.” That’s my excuse for this post.
|
||||
|
||||
Several days ago I got an email from a reader, asking if I knew of a
|
||||
script that would tidy up [8]Markdown reference links in a document.
|
||||
She wanted them reordered and renumbered at the end of the document to
|
||||
match the order in which they appear in the body of the text. I didn’t
|
||||
know of one^[9]1 and suggested she write it herself and let me know
|
||||
when it’s done. I’ve been getting progress reports, but her script
|
||||
isn’t finished yet.
|
||||
Several days ago I got an email from a reader, asking if I knew of a script
|
||||
that would tidy up [6]Markdown reference links in a document. She wanted them
|
||||
reordered and renumbered at the end of the document to match the order in which
|
||||
they appear in the body of the text. I didn’t know of one^[7]1 and suggested
|
||||
she write it herself and let me know when it’s done. I’ve been getting progress
|
||||
reports, but her script isn’t finished yet.
|
||||
|
||||
There’s certainly no need to tidy the links up that way. Markdown
|
||||
doesn’t care what order the reference links appear in or the labels
|
||||
that are assigned to them. I’ve written dozens of posts in which the
|
||||
order of the references at the end of the Markdown source were way off
|
||||
from the order of the links in body. But…
|
||||
There’s certainly no need to tidy the links up that way. Markdown doesn’t care
|
||||
what order the reference links appear in or the labels that are assigned to
|
||||
them. I’ve written dozens of posts in which the order of the references at the
|
||||
end of the Markdown source were way off from the order of the links in body.
|
||||
But…
|
||||
|
||||
But there is an attraction to putting everything in apple pie order,
|
||||
even when no one but me will ever see it. Last night I succumbed and
|
||||
wrote a script to tidy up the links. Sorry, Phaedra.
|
||||
But there is an attraction to putting everything in apple pie order, even when
|
||||
no one but me will ever see it. Last night I succumbed and wrote a script to
|
||||
tidy up the links. Sorry, Phaedra.
|
||||
|
||||
Here’s an example of a short Markdown document with out-of-order reference
|
||||
links:
|
||||
|
||||
Here’s an example of a short Markdown document with out-of-order
|
||||
reference links:
|
||||
Species and their hybrids, How simply are these facts! How
|
||||
strange that the pollen of each But we may thus have
|
||||
[succeeded][2] in selecting so many exceptions to this rule.
|
||||
@@ -64,9 +61,10 @@ selection may be extended--to the stigma of.
|
||||
[3]: http://docs.python.org/library/index.html
|
||||
[4]: http://www.kungfugrippe.com/
|
||||
|
||||
Note that the references are numbered 1, 2, 3, 4 at the bottom of the
|
||||
document, but that they appear in the body in the order 2, 4, 3, 1. The
|
||||
purpose of the script is to change the document to
|
||||
Note that the references are numbered 1, 2, 3, 4 at the bottom of the document,
|
||||
but that they appear in the body in the order 2, 4, 3, 1. The purpose of the
|
||||
script is to change the document to
|
||||
|
||||
Species and their hybrids, How simply are these facts! How
|
||||
strange that the pollen of each But we may thus have
|
||||
[succeeded][1] in selecting so many exceptions to this rule.
|
||||
@@ -95,12 +93,13 @@ selection may be extended--to the stigma of.
|
||||
[3]: http://www.kungfugrippe.com/
|
||||
[4]: http://daringfireball.net/markdown/
|
||||
|
||||
Now the links are numbered 1, 2, 3, 4 in both the text and the end
|
||||
references. The HTML produced when this document is run through a
|
||||
Markdown processor will be the same as the previous one—the links will
|
||||
still go to the right places—but the Markdown source looks better.
|
||||
Now the links are numbered 1, 2, 3, 4 in both the text and the end references.
|
||||
The HTML produced when this document is run through a Markdown processor will
|
||||
be the same as the previous one—the links will still go to the right places—but
|
||||
the Markdown source looks better.
|
||||
|
||||
Here’s the script that does it:
|
||||
|
||||
Here’s the script that does it:
|
||||
python:
|
||||
1: #!/usr/bin/python
|
||||
2:
|
||||
@@ -137,8 +136,7 @@ python:
|
||||
33: order.append(i[1])
|
||||
34:
|
||||
35: # Make a list of the references in order of appearance.
|
||||
36: newlabels = [ '[%d]: %s' % (i + 1, labels[j]) for (i, j) in enumerate(order
|
||||
) ]
|
||||
36: newlabels = [ '[%d]: %s' % (i + 1, labels[j]) for (i, j) in enumerate(order) ]
|
||||
37:
|
||||
38: # Remove the old references and put the new ones at the end of the text.
|
||||
39: text = label.sub('', text).rstrip() + '\n'*3 + '\n'.join(newlabels)
|
||||
@@ -148,139 +146,135 @@ python:
|
||||
43:
|
||||
44: print text
|
||||
|
||||
The regular expressions in Lines 13 and 17 are fairly easy to
|
||||
understand. The first one looks for the links in the body of the text
|
||||
and the second looks for the labels.
|
||||
The regular expressions in Lines 13 and 17 are fairly easy to understand. The
|
||||
first one looks for the links in the body of the text and the second looks for
|
||||
the labels.
|
||||
|
||||
The key to the script are the four data structures: links, labels, order, and
|
||||
newlabels. For our example document, links is the list of tuples
|
||||
|
||||
The key to the script are the four data structures: links, labels,
|
||||
order, and newlabels. For our example document, links is the list of
|
||||
tuples
|
||||
[('succeeded', '2'),
|
||||
('single character', '4'),
|
||||
('under confinement', '3'),
|
||||
('slaves', '1')]
|
||||
|
||||
labels is the dictionary
|
||||
labels is the dictionary
|
||||
|
||||
{'1': 'http://daringfireball.net/markdown/',
|
||||
'3': 'http://docs.python.org/library/index.html',
|
||||
'2': 'http://www.google.com/',
|
||||
'4': 'http://www.kungfugrippe.com/'}
|
||||
|
||||
order is the list
|
||||
order is the list
|
||||
|
||||
['2', '4', '3', '1']
|
||||
|
||||
and newlabels is the list of strings
|
||||
and newlabels is the list of strings
|
||||
|
||||
['[1]: http://www.google.com/',
|
||||
'[2]: http://docs.python.org/library/index.html',
|
||||
'[3]: http://www.kungfugrippe.com/',
|
||||
'[4]: http://daringfireball.net/markdown/']
|
||||
|
||||
links and labels are built via the regex findall method in Lines 25-26.
|
||||
links is the direct output of the method and maintains the order in
|
||||
which the links appear in the text. labels is that same output, but
|
||||
converted to a dictionary. Its order, which we don’t care about, is
|
||||
lost in the conversion, but it can be used to easily access the URL
|
||||
from the link label.
|
||||
links and labels are built via the regex findall method in Lines 25-26. links
|
||||
is the direct output of the method and maintains the order in which the links
|
||||
appear in the text. labels is that same output, but converted to a dictionary.
|
||||
Its order, which we don’t care about, is lost in the conversion, but it can be
|
||||
used to easily access the URL from the link label.
|
||||
|
||||
order is the order in which the link labels first appear in the text.
|
||||
The if statement in Line 32 ensures that repeated links don’t overwrite
|
||||
each other.
|
||||
order is the order in which the link labels first appear in the text. The if
|
||||
statement in Line 32 ensures that repeated links don’t overwrite each other.
|
||||
|
||||
newlabels is built from labels and order in Line 36. It’s the list of
|
||||
labels after the renumbering. Line 39 deletes the original label lines
|
||||
and puts the new ones at the end of the document.
|
||||
newlabels is built from labels and order in Line 36. It’s the list of labels
|
||||
after the renumbering. Line 39 deletes the original label lines and puts the
|
||||
new ones at the end of the document.
|
||||
|
||||
Finally, Line 42 replaces all the link labels in the body of the text
|
||||
with the new values. Rather than a replacement string, it uses a simple
|
||||
replacement function defined in Lines 19-21 to do so.
|
||||
Finally, Line 42 replaces all the link labels in the body of the text with the
|
||||
new values. Rather than a replacement string, it uses a simple replacement
|
||||
function defined in Lines 19-21 to do so.
|
||||
|
||||
Barring any bugs I haven’t found yet, this script (or filter) will work
|
||||
on any Markdown document and can be used either directly from the
|
||||
command line or through whatever system your text editor uses to call
|
||||
external scripts. I have it stored in BBEdit’s Text Filters folder
|
||||
under the name “Tidy Markdown Reference Links.py,” so I can call it
|
||||
from the Text ‣ Apply Text Filter submenu.
|
||||
Barring any bugs I haven’t found yet, this script (or filter) will work on any
|
||||
Markdown document and can be used either directly from the command line or
|
||||
through whatever system your text editor uses to call external scripts. I have
|
||||
it stored in BBEdit’s Text Filters folder under the name “Tidy Markdown
|
||||
Reference Links.py,” so I can call it from the Text ‣ Apply Text Filter
|
||||
submenu.
|
||||
|
||||
I should mention that although this script is fairly compact and
|
||||
simple, it didn’t spring from my head fully formed. There were starts
|
||||
and stops as I figured out which data structures were needed and how
|
||||
they could be built. Each little subsection of the script was tested as
|
||||
I went along. The order list was originally a list of tuples; it wasn’t
|
||||
until I had a working version of the entire script that I realized that
|
||||
it could be simplified down to a list of link labels. That change
|
||||
shortened the script by five lines or so and, more importantly,
|
||||
clarified its logic.
|
||||
I should mention that although this script is fairly compact and simple, it
|
||||
didn’t spring from my head fully formed. There were starts and stops as I
|
||||
figured out which data structures were needed and how they could be built. Each
|
||||
little subsection of the script was tested as I went along. The order list was
|
||||
originally a list of tuples; it wasn’t until I had a working version of the
|
||||
entire script that I realized that it could be simplified down to a list of
|
||||
link labels. That change shortened the script by five lines or so and, more
|
||||
importantly, clarified its logic.
|
||||
|
||||
Despite these improvements, the script is hardly foolproof. The
|
||||
Markdown source of this very post confuses the hell out it. Not only
|
||||
does it think there are links in the sample document (which you’d
|
||||
probably guess), it also thinks the [%s][%d] in Line 21 of the script
|
||||
is a link (and the one in this sentence, too). And why wouldn’t it? To
|
||||
distinguish between real links and things that look like links in
|
||||
embedded source code, the script would have to be able to parse
|
||||
Markdown, not just match a couple of short regular expressions. This is
|
||||
a variant on what Hamish Sanderson said in the comments on [10]an
|
||||
earlier post.
|
||||
Despite these improvements, the script is hardly foolproof. The Markdown source
|
||||
of this very post confuses the hell out it. Not only does it think there are
|
||||
links in the sample document (which you’d probably guess), it also thinks the
|
||||
[%s][%d] in Line 21 of the script is a link (and the one in this sentence,
|
||||
too). And why wouldn’t it? To distinguish between real links and things that
|
||||
look like links in embedded source code, the script would have to be able to
|
||||
parse Markdown, not just match a couple of short regular expressions. This is a
|
||||
variant on what Hamish Sanderson said in the comments on [8]an earlier post.
|
||||
|
||||
At the moment, I’m not willing to sacrifice the simplicity of the Tidy
|
||||
script to get it to handle weird posts like this one. But if I find
|
||||
that it fails often with the kind of input I commonly give it, I’ll
|
||||
have to revisit that decision.
|
||||
At the moment, I’m not willing to sacrifice the simplicity of the Tidy script
|
||||
to get it to handle weird posts like this one. But if I find that it fails
|
||||
often with the kind of input I commonly give it, I’ll have to revisit that
|
||||
decision.
|
||||
|
||||
As Wilde also said, “Experience is the name everyone gives to their
|
||||
mistakes.”
|
||||
__________________________________________________________________
|
||||
As Wilde also said, “Experience is the name everyone gives to their mistakes.”
|
||||
|
||||
1. I didn’t think [11]Seth Brown’s formd did that, but [12]this tweet
|
||||
from Brett Terpsta says I was wrong about that. [13]↩
|
||||
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
|
||||
|
||||
[14]Next post [15]Previous post
|
||||
1. I didn’t think [9]Seth Brown’s formd did that, but [10]this tweet from
|
||||
Brett Terpsta says I was wrong about that. [11]↩
|
||||
|
||||
[12]Next post [13]Previous post
|
||||
|
||||
Site search
|
||||
|
||||
____________________ Go!
|
||||
|
||||
[21][ ] [22][Go!]
|
||||
Meta
|
||||
|
||||
* drdrang at leancrew
|
||||
* [16]Blog archive
|
||||
* [17]RSS feed
|
||||
* [18]JSON feed
|
||||
* [19]Mastodon
|
||||
* [20]GitHub repositories
|
||||
• drdrang at leancrew
|
||||
• [23]Blog archive
|
||||
• [24]RSS feed
|
||||
• [25]JSON feed
|
||||
• [26]Mastodon
|
||||
• [27]GitHub repositories
|
||||
|
||||
Recent posts
|
||||
|
||||
Credits
|
||||
|
||||
[21]Powered by MathJax
|
||||
[28] Powered by MathJax
|
||||
|
||||
This work is licensed under a [22]Creative Commons Attribution-Share
|
||||
Alike 3.0 Unported License.
|
||||
This work is licensed under a [29]Creative Commons Attribution-Share Alike 3.0
|
||||
Unported License.
|
||||
|
||||
© 2005–2023, Dr. Drang
|
||||
© 2005–2023, Dr. Drang
|
||||
|
||||
References
|
||||
|
||||
1. https://leancrew.com/all-this/feed/
|
||||
2. https://leancrew.com/all-this/feed.json
|
||||
3. https://leancrew.com/all-this/
|
||||
4. https://leancrew.com/all-this/2012/09/some-kind-of-druid-dudes-lifting-the-veil/
|
||||
5. https://leancrew.com/all-this/2012/09/implementing-pubsubhubbub/
|
||||
6. https://leancrew.com/all-this/2012/09/tidying-markdown-reference-links/
|
||||
7. http://www.gutenberg.org/dirs/etext97/lwfan10h.htm
|
||||
8. http://daringfireball.net/projects/markdown/syntax#link
|
||||
9. https://leancrew.com/all-this/2012/09/tidying-markdown-reference-links/#fn:formd
|
||||
10. http://www.leancrew.com/all-this/2012/09/applescript-syntax-highlighting-finally/
|
||||
11. http://www.drbunsen.org/formd-a-markdown-formatting-tool.html
|
||||
12. https://twitter.com/ttscoff/status/247398632377184256
|
||||
13. https://leancrew.com/all-this/2012/09/tidying-markdown-reference-links/#fnref:formd
|
||||
14. https://leancrew.com/all-this/2012/09/some-kind-of-druid-dudes-lifting-the-veil/
|
||||
15. https://leancrew.com/all-this/2012/09/implementing-pubsubhubbub/
|
||||
16. https://leancrew.com/all-this/archive/
|
||||
17. https://leancrew.com/all-this/feed/
|
||||
18. https://leancrew.com/all-this/feed.json
|
||||
19. https://fosstodon.org/@drdrang
|
||||
20. http://github.com/drdrang
|
||||
21. http://www.mathjax.org/
|
||||
22. http://creativecommons.org/licenses/by-sa/3.0/
|
||||
References:
|
||||
|
||||
[1] https://leancrew.com/all-this/
|
||||
[2] https://leancrew.com/all-this/2012/09/some-kind-of-druid-dudes-lifting-the-veil/
|
||||
[3] https://leancrew.com/all-this/2012/09/implementing-pubsubhubbub/
|
||||
[4] https://leancrew.com/all-this/2012/09/tidying-markdown-reference-links/
|
||||
[5] http://www.gutenberg.org/dirs/etext97/lwfan10h.htm
|
||||
[6] http://daringfireball.net/projects/markdown/syntax#link
|
||||
[7] https://leancrew.com/all-this/2012/09/tidying-markdown-reference-links/#fn:formd
|
||||
[8] http://www.leancrew.com/all-this/2012/09/applescript-syntax-highlighting-finally/
|
||||
[9] http://www.drbunsen.org/formd-a-markdown-formatting-tool.html
|
||||
[10] https://twitter.com/ttscoff/status/247398632377184256
|
||||
[11] https://leancrew.com/all-this/2012/09/tidying-markdown-reference-links/#fnref:formd
|
||||
[12] https://leancrew.com/all-this/2012/09/some-kind-of-druid-dudes-lifting-the-veil/
|
||||
[13] https://leancrew.com/all-this/2012/09/implementing-pubsubhubbub/
|
||||
[23] https://leancrew.com/all-this/archive/
|
||||
[24] https://leancrew.com/all-this/feed/
|
||||
[25] https://leancrew.com/all-this/feed.json
|
||||
[26] https://fosstodon.org/@drdrang
|
||||
[27] http://github.com/drdrang
|
||||
[28] http://www.mathjax.org/
|
||||
[29] http://creativecommons.org/licenses/by-sa/3.0/
|
||||
|
||||
Reference in New Issue
Block a user