Use w3m for archiving

This commit is contained in:
David Eisinger
2024-01-17 12:04:56 -05:00
parent c5f0c6161a
commit ae64f3eb0a
80 changed files with 28830 additions and 29811 deletions

View File

@@ -1,42 +1,39 @@
#[1]RSS Feed for ANIAT [2]JSON Feed for ANIAT
[snowman-20]
[snowman-200.jpg]
[3]And now its all this
[1]And now its all this
I just said what I said and it was wrong
Or was taken wrong
[4]Next post [5]Previous post
[2]Next post [3]Previous post
[6]Tidying Markdown reference links
[4]Tidying Markdown reference links
September 17, 2012 at 9:15 PM by Dr. Drang
September 17, 2012 at 9:15 PM by Dr. Drang
Oscar Wilde—who would have been great on Twitter—[7]said “I couldnt
help it. I can resist everything except temptation.” Thats my excuse
for this post.
Oscar Wilde—who would have been great on Twitter—[5]said “I couldnt help it. I
can resist everything except temptation.” Thats my excuse for this post.
Several days ago I got an email from a reader, asking if I knew of a
script that would tidy up [8]Markdown reference links in a document.
She wanted them reordered and renumbered at the end of the document to
match the order in which they appear in the body of the text. I didnt
know of one^[9]1 and suggested she write it herself and let me know
when its done. Ive been getting progress reports, but her script
isnt finished yet.
Several days ago I got an email from a reader, asking if I knew of a script
that would tidy up [6]Markdown reference links in a document. She wanted them
reordered and renumbered at the end of the document to match the order in which
they appear in the body of the text. I didnt know of one^[7]1 and suggested
she write it herself and let me know when its done. Ive been getting progress
reports, but her script isnt finished yet.
Theres certainly no need to tidy the links up that way. Markdown
doesnt care what order the reference links appear in or the labels
that are assigned to them. Ive written dozens of posts in which the
order of the references at the end of the Markdown source were way off
from the order of the links in body. But…
Theres certainly no need to tidy the links up that way. Markdown doesnt care
what order the reference links appear in or the labels that are assigned to
them. Ive written dozens of posts in which the order of the references at the
end of the Markdown source were way off from the order of the links in body.
But…
But there is an attraction to putting everything in apple pie order,
even when no one but me will ever see it. Last night I succumbed and
wrote a script to tidy up the links. Sorry, Phaedra.
But there is an attraction to putting everything in apple pie order, even when
no one but me will ever see it. Last night I succumbed and wrote a script to
tidy up the links. Sorry, Phaedra.
Heres an example of a short Markdown document with out-of-order reference
links:
Heres an example of a short Markdown document with out-of-order
reference links:
Species and their hybrids, How simply are these facts! How
strange that the pollen of each But we may thus have
[succeeded][2] in selecting so many exceptions to this rule.
@@ -64,9 +61,10 @@ selection may be extended--to the stigma of.
[3]: http://docs.python.org/library/index.html
[4]: http://www.kungfugrippe.com/
Note that the references are numbered 1, 2, 3, 4 at the bottom of the
document, but that they appear in the body in the order 2, 4, 3, 1. The
purpose of the script is to change the document to
Note that the references are numbered 1, 2, 3, 4 at the bottom of the document,
but that they appear in the body in the order 2, 4, 3, 1. The purpose of the
script is to change the document to
Species and their hybrids, How simply are these facts! How
strange that the pollen of each But we may thus have
[succeeded][1] in selecting so many exceptions to this rule.
@@ -95,12 +93,13 @@ selection may be extended--to the stigma of.
[3]: http://www.kungfugrippe.com/
[4]: http://daringfireball.net/markdown/
Now the links are numbered 1, 2, 3, 4 in both the text and the end
references. The HTML produced when this document is run through a
Markdown processor will be the same as the previous one—the links will
still go to the right places—but the Markdown source looks better.
Now the links are numbered 1, 2, 3, 4 in both the text and the end references.
The HTML produced when this document is run through a Markdown processor will
be the same as the previous one—the links will still go to the right places—but
the Markdown source looks better.
Heres the script that does it:
Heres the script that does it:
python:
1: #!/usr/bin/python
2:
@@ -137,8 +136,7 @@ python:
33: order.append(i[1])
34:
35: # Make a list of the references in order of appearance.
36: newlabels = [ '[%d]: %s' % (i + 1, labels[j]) for (i, j) in enumerate(order
) ]
36: newlabels = [ '[%d]: %s' % (i + 1, labels[j]) for (i, j) in enumerate(order) ]
37:
38: # Remove the old references and put the new ones at the end of the text.
39: text = label.sub('', text).rstrip() + '\n'*3 + '\n'.join(newlabels)
@@ -148,139 +146,135 @@ python:
43:
44: print text
The regular expressions in Lines 13 and 17 are fairly easy to
understand. The first one looks for the links in the body of the text
and the second looks for the labels.
The regular expressions in Lines 13 and 17 are fairly easy to understand. The
first one looks for the links in the body of the text and the second looks for
the labels.
The key to the script are the four data structures: links, labels, order, and
newlabels. For our example document, links is the list of tuples
The key to the script are the four data structures: links, labels,
order, and newlabels. For our example document, links is the list of
tuples
[('succeeded', '2'),
('single character', '4'),
('under confinement', '3'),
('slaves', '1')]
labels is the dictionary
labels is the dictionary
{'1': 'http://daringfireball.net/markdown/',
'3': 'http://docs.python.org/library/index.html',
'2': 'http://www.google.com/',
'4': 'http://www.kungfugrippe.com/'}
order is the list
order is the list
['2', '4', '3', '1']
and newlabels is the list of strings
and newlabels is the list of strings
['[1]: http://www.google.com/',
'[2]: http://docs.python.org/library/index.html',
'[3]: http://www.kungfugrippe.com/',
'[4]: http://daringfireball.net/markdown/']
links and labels are built via the regex findall method in Lines 25-26.
links is the direct output of the method and maintains the order in
which the links appear in the text. labels is that same output, but
converted to a dictionary. Its order, which we dont care about, is
lost in the conversion, but it can be used to easily access the URL
from the link label.
links and labels are built via the regex findall method in Lines 25-26. links
is the direct output of the method and maintains the order in which the links
appear in the text. labels is that same output, but converted to a dictionary.
Its order, which we dont care about, is lost in the conversion, but it can be
used to easily access the URL from the link label.
order is the order in which the link labels first appear in the text.
The if statement in Line 32 ensures that repeated links dont overwrite
each other.
order is the order in which the link labels first appear in the text. The if
statement in Line 32 ensures that repeated links dont overwrite each other.
newlabels is built from labels and order in Line 36. Its the list of
labels after the renumbering. Line 39 deletes the original label lines
and puts the new ones at the end of the document.
newlabels is built from labels and order in Line 36. Its the list of labels
after the renumbering. Line 39 deletes the original label lines and puts the
new ones at the end of the document.
Finally, Line 42 replaces all the link labels in the body of the text
with the new values. Rather than a replacement string, it uses a simple
replacement function defined in Lines 19-21 to do so.
Finally, Line 42 replaces all the link labels in the body of the text with the
new values. Rather than a replacement string, it uses a simple replacement
function defined in Lines 19-21 to do so.
Barring any bugs I havent found yet, this script (or filter) will work
on any Markdown document and can be used either directly from the
command line or through whatever system your text editor uses to call
external scripts. I have it stored in BBEdits Text Filters folder
under the name “Tidy Markdown Reference Links.py,” so I can call it
from the Text ‣ Apply Text Filter submenu.
Barring any bugs I havent found yet, this script (or filter) will work on any
Markdown document and can be used either directly from the command line or
through whatever system your text editor uses to call external scripts. I have
it stored in BBEdits Text Filters folder under the name “Tidy Markdown
Reference Links.py,” so I can call it from the Text ‣ Apply Text Filter
submenu.
I should mention that although this script is fairly compact and
simple, it didnt spring from my head fully formed. There were starts
and stops as I figured out which data structures were needed and how
they could be built. Each little subsection of the script was tested as
I went along. The order list was originally a list of tuples; it wasnt
until I had a working version of the entire script that I realized that
it could be simplified down to a list of link labels. That change
shortened the script by five lines or so and, more importantly,
clarified its logic.
I should mention that although this script is fairly compact and simple, it
didnt spring from my head fully formed. There were starts and stops as I
figured out which data structures were needed and how they could be built. Each
little subsection of the script was tested as I went along. The order list was
originally a list of tuples; it wasnt until I had a working version of the
entire script that I realized that it could be simplified down to a list of
link labels. That change shortened the script by five lines or so and, more
importantly, clarified its logic.
Despite these improvements, the script is hardly foolproof. The
Markdown source of this very post confuses the hell out it. Not only
does it think there are links in the sample document (which youd
probably guess), it also thinks the [%s][%d] in Line 21 of the script
is a link (and the one in this sentence, too). And why wouldnt it? To
distinguish between real links and things that look like links in
embedded source code, the script would have to be able to parse
Markdown, not just match a couple of short regular expressions. This is
a variant on what Hamish Sanderson said in the comments on [10]an
earlier post.
Despite these improvements, the script is hardly foolproof. The Markdown source
of this very post confuses the hell out it. Not only does it think there are
links in the sample document (which youd probably guess), it also thinks the
[%s][%d] in Line 21 of the script is a link (and the one in this sentence,
too). And why wouldnt it? To distinguish between real links and things that
look like links in embedded source code, the script would have to be able to
parse Markdown, not just match a couple of short regular expressions. This is a
variant on what Hamish Sanderson said in the comments on [8]an earlier post.
At the moment, Im not willing to sacrifice the simplicity of the Tidy
script to get it to handle weird posts like this one. But if I find
that it fails often with the kind of input I commonly give it, Ill
have to revisit that decision.
At the moment, Im not willing to sacrifice the simplicity of the Tidy script
to get it to handle weird posts like this one. But if I find that it fails
often with the kind of input I commonly give it, Ill have to revisit that
decision.
As Wilde also said, “Experience is the name everyone gives to their
mistakes.”
__________________________________________________________________
As Wilde also said, “Experience is the name everyone gives to their mistakes.”
1. I didnt think [11]Seth Browns formd did that, but [12]this tweet
from Brett Terpsta says I was wrong about that. [13]↩
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
[14]Next post [15]Previous post
1. I didnt think [9]Seth Browns formd did that, but [10]this tweet from
Brett Terpsta says I was wrong about that. [11]↩
[12]Next post [13]Previous post
Site search
____________________ Go!
[21][ ] [22][Go!]
Meta
* drdrang at leancrew
* [16]Blog archive
* [17]RSS feed
* [18]JSON feed
* [19]Mastodon
* [20]GitHub repositories
drdrang at leancrew
• [23]Blog archive
• [24]RSS feed
• [25]JSON feed
• [26]Mastodon
[27]GitHub repositories
Recent posts
Credits
[21]Powered by MathJax
[28] Powered by MathJax
This work is licensed under a [22]Creative Commons Attribution-Share
Alike 3.0 Unported License.
This work is licensed under a [29]Creative Commons Attribution-Share Alike 3.0
Unported License.
© 20052023, Dr. Drang
© 20052023, Dr. Drang
References
1. https://leancrew.com/all-this/feed/
2. https://leancrew.com/all-this/feed.json
3. https://leancrew.com/all-this/
4. https://leancrew.com/all-this/2012/09/some-kind-of-druid-dudes-lifting-the-veil/
5. https://leancrew.com/all-this/2012/09/implementing-pubsubhubbub/
6. https://leancrew.com/all-this/2012/09/tidying-markdown-reference-links/
7. http://www.gutenberg.org/dirs/etext97/lwfan10h.htm
8. http://daringfireball.net/projects/markdown/syntax#link
9. https://leancrew.com/all-this/2012/09/tidying-markdown-reference-links/#fn:formd
10. http://www.leancrew.com/all-this/2012/09/applescript-syntax-highlighting-finally/
11. http://www.drbunsen.org/formd-a-markdown-formatting-tool.html
12. https://twitter.com/ttscoff/status/247398632377184256
13. https://leancrew.com/all-this/2012/09/tidying-markdown-reference-links/#fnref:formd
14. https://leancrew.com/all-this/2012/09/some-kind-of-druid-dudes-lifting-the-veil/
15. https://leancrew.com/all-this/2012/09/implementing-pubsubhubbub/
16. https://leancrew.com/all-this/archive/
17. https://leancrew.com/all-this/feed/
18. https://leancrew.com/all-this/feed.json
19. https://fosstodon.org/@drdrang
20. http://github.com/drdrang
21. http://www.mathjax.org/
22. http://creativecommons.org/licenses/by-sa/3.0/
References:
[1] https://leancrew.com/all-this/
[2] https://leancrew.com/all-this/2012/09/some-kind-of-druid-dudes-lifting-the-veil/
[3] https://leancrew.com/all-this/2012/09/implementing-pubsubhubbub/
[4] https://leancrew.com/all-this/2012/09/tidying-markdown-reference-links/
[5] http://www.gutenberg.org/dirs/etext97/lwfan10h.htm
[6] http://daringfireball.net/projects/markdown/syntax#link
[7] https://leancrew.com/all-this/2012/09/tidying-markdown-reference-links/#fn:formd
[8] http://www.leancrew.com/all-this/2012/09/applescript-syntax-highlighting-finally/
[9] http://www.drbunsen.org/formd-a-markdown-formatting-tool.html
[10] https://twitter.com/ttscoff/status/247398632377184256
[11] https://leancrew.com/all-this/2012/09/tidying-markdown-reference-links/#fnref:formd
[12] https://leancrew.com/all-this/2012/09/some-kind-of-druid-dudes-lifting-the-veil/
[13] https://leancrew.com/all-this/2012/09/implementing-pubsubhubbub/
[23] https://leancrew.com/all-this/archive/
[24] https://leancrew.com/all-this/feed/
[25] https://leancrew.com/all-this/feed.json
[26] https://fosstodon.org/@drdrang
[27] http://github.com/drdrang
[28] http://www.mathjax.org/
[29] http://creativecommons.org/licenses/by-sa/3.0/