Brendan's Flavor of Markdown

2021-08-03

BFoM

Beware: this is long and in some areas a tad ranty.
Good luck!

Background

Originally much of the content of this post was in the repo for the converter, however I decided that it was not a good place to have it.
Too tangential to the repo itself, I can link to this page if needed.

For quite some time I had been looking to get a hold of brendan.ie.
It had been registered years prior but had sat unused.
It took some detective work and cold calling a lot of Brendan's but eventually about a year ago I finally succeeded.

I have previously used React.js to build functional sites, mostly skinned with Bootstrap, but lately with IBM's Carbon which is clean looking.
However, for my own place I didn't need interactivity, I needed something that represented me.
I also wanted to create something using rust.
Knowing that a rust backend would be overkill for my own uses (not to mention I wanted to keep dependencies as low as possible) I didn't go this route. I had previously created a cli tool to download items from the humble bundle Trove so figured a similar tool would work well.
For these reasons I decided to create a markdown converter.

Is that a rabbit-hole I see dear Alice?

On the surface that seems like a reasonable idea.
Sure isn't markdown used basically everywhere these days, GitHub/GitLab, Reddit, Discord.
There are some minor differences between the different places that use it, but it's all mostly the same, right?

So I started looking into existing implementations, Daring Fireball, Github, Reddit and Discord. All seemed to be roughly the same, few minor differences between them.
Learnt quite a bit actually, like how amazing reference links are.

Well the deeper you go the worse it gets.
Discord for example has a different Markdown implementation for mobile and desktop.
On desktop, you can use both single and double backticks for inline code spans.
On mobile only a single backtick is permitted.

If a single platform has issues with being cohesive with itself you can only imagine the differences across different implementations.

Then a friend pointed me to Commonmark and that is the point where things went downhill.

Of course he is mad too.

In the beginning Commonmark seemed pretty good, well documented, plenty of examples etc.
However as I pressed on I could see it was lacking in certain areas.

from what I cna gather Commonmark took the existing implementations and used them as a starting point, in essence saying that they are all valid.
While this may have kept everyone happy it did nothing to fix some underlying flaws.

One identifier being used in multiple blocks and spans.
Lazy Continuation.
How html is handled.

Indented code blocks

In the original implementation code blocks were identified by four spaces.
This works well enough until other blocks use indentation, namely lists.
other blocks can use indentation for styling such as headers.

Thankfully there is a solution to this: fenced code blocks.
However, Commonmark has adopted both formats.

Other examples of one identifier being used multiple times is * being used for both strong and em tags.

Lazy Continuation

lazy continuation is where you can start off a block with its identifier and on the next line either omit the leading spaces or the identifier.

for example:


> Blockquote
Still the same blockquote


1. List item
Same list item
2. Next list item.

Not only does this hamper readability it also makes it a lot more complex tpo parse markdown.

Handling HTML

At this point I was in freefall down the rabbit-hole.

Html is an interesting part of markdown since markdown eventually is converted into html.

DaringFireball (original creator) handled it by having no markdopwn in html block but permitting it in html spans.
For the most part this is fine.

Commonmark took it a bit further by allowing markdown inside html blocks and spans.
Which is where I found an example of how not to do it.

Input:


<table><tr><td>
<pre>
**Hello**,

_world_.
</pre>
</td></tr></table>

Commonmark Result:


<table><tr><td>
<pre>
**Hello**,
<p><em>world</em>.
</pre></p>
</td></tr></table>

This is a html block that contains preformatted text.
Because of the space in the middle commonmark deemed it the end of the internal content and so began to apply markdown from there on.

This breaks the user's assumption that preformatted text will remain preformatted.
Secondly the html generated actually breaks the html spec there is a closing pre tag in the paragraph, but not an opening, likewise the original opening pre is unclosed.

The expected output would be the exact same as the input.
Up to this point I was going to be doing a stricter subset of commonmark with some extensions for tables etc.
This behaviour is what broke me as there is no way to reconcile with something so fundamentally twisted.

It was at this point I said: "I'll make my own version of Markdown, with Tables and Underlines!"

BFoM

So what do you do when the existing n standards are not good enough?
Well you need n+1 standards!

Creating a new standard after all this time has the advantage of hindsight, knowing what has worked and what has not.
This has given me the opportunity to fix, or at least patch up many of the issues.

Below you can find the highlights, full examples are on the BFoM repo itself.

Inline Spans

All inline spans now come in matched pairs.
Most are identical to existing implementations, but some have been changed.


//emphases//
**strong**
``code``
<<example.com>>
__underline__
~~strikethrough~~
>!spoiler!<

Code Blocks

Only fenced codeblocks allowed, indented ones cause a lot of issues with parsing.
Fencing it also allows the language to be added which would allow the use of external libraries like highlight.js.

The fence is formed by a matched pair of 3+ backticks:

Due to the removal of indented code blocks Headers and Lists can be formatted better.

Headers

You can align a h1 header with a h6 header like so:


     # h1
    ## h2
   ### h3
  #### h4
 ##### h5
###### h6

Lists

Due to the removal of lazy continuation lists can be aligned like so:


    1. Item 1 Line 1
       Item 1 Line 2
99999. Item 99999 Line 1
       Item 99999 Line 2

Blockquotes

Due to the removal of lazy continuation blockquotes can correctly parse:


>>> foo
> bar
>> baz

Into:


<blockquote>
  <blockquote><blockquote><p>foo</p></blockquote></blockquote>
  <p>bar</p>
  <blockquote><p>baz</p></blockquote>
</blockquote>

Tables

Have been added and are near identical to Github's implementation.

HTML

No markdown is permitted inside HTML blocks or spans.
They are passed straight through (unless the converter wants to remove specific blocks).

Conclusion

Thank you for reading this far, future posts aught to be shorter and far less ranty.