I’ve been moving my website from a homegrown static system based on BBEdit and some Python scripts to one using a homegrown Python page generation. The script is takes some inspiration from ikiwiki and PyBlosxom, both of which use Markdown for the content. In the previous homegrown system, I wrote pages in BBEdit and then piped them through a python script that ran py markdown and py SmartyPants on the text, and the glued result into some html templates which were then updated using BBEdit’s update function. At some point, I used PyBlosxom briefly, which had it’s own Markdown.
Comparison between several Markdown implementations
Three implementations were tested. I like Python and the Unix command line, so two of the implementations were python ones, while the third was what appears to be a super fast C implementation called Discount.
py markdown2 claims to be more standards compliant and faster than py Markdown. The tests I ran on all three suites had some failures. However, upon inspection of the output, all three seemed acceptable. py markdown2 tidies the output html a bit better, removing superfluous spaces.
All three support some extensions to Markdown; the two python ones are configurable, while Discount appears to be fixed in this manner, though if you use the library form you have more control. I think that Discount does footnote processing. It is mentioned on the website but I couldn’t get it to work with the command line binary – maybe it is accessible through the library interface. On the other hand, it could just be how links are referred to.
Discount was always faster. Much faster.
With basic syntax, py markdown was a bit faster on the short tests I ran. Once you threw in an extension like footnotes, py markdown2 was faster. To be precise, it didn’t slow down compared to the basic syntax run, while py markdown got slower. We aren’t talking orders of magnitude here, just 10’s of ms.
Since I’m calling Markdown from a Python program, I decided not to use Discount. I’d also like to use SmartyPants, so that factors into my decision. Speed ultimately isn’t a huge factor. My script generates pages for static serving, so nothing is run on the fly. And since it only generates pages that have been modified, the extra milliseconds saved by using one implementation over another is minor.
To make a long story short, py markdown was not converting some markdown markup (headings with links embedding in them) that were interspersed amongst some pretty complicated HTML tables. The page in question was the Kodak technical documents page and the markup in question is the last headline, “Other chemicals.” To be fair, I had problems reproducing this with test markup. py markdown2 didn’t have this problem.
However, with SmartyPants switched on, py markdown2 was butchering some images and links that had been run through the markdown part of the process already. It was substituting HTML entities for quotes and double quotes inside tags. That’s a no-no. It’s a known issue for py markdown2, but I don’t see it getting fixed anytime soon.
After doinking around with it for a bit, I just decided to not use py markdown2’s SmartyPants extension, and instead just run my markdowned text through py SmartyPants. This gives me more control anyway and doesn’t seem to incur a huge speed hit.