Raw HTML is a low-level language, and it’s starting to bum me out. I’m working on a project that has me writing a large number of relatively simply marked up pages. (We’ll see the structure below.) In this post, I’m going to implement a Domain-Specific Markup Language, using PHP. It’ll use nouns relevant to my subject as I write it, and output good old HTML when it’s time to render to a user.
(It looks like I’m not the first person to use the name DSML, but I promise to keep this article a little less academic by adding code samples and not using the word “indeed.”)
What does low-level mean?
When I fire up my editor, my programming language knows nothing about my problem. One way to construct a program is to build up your language, to make it easier to express your problem in your problem’s language. When Ivan Jovanovic says programming languages are simply not powerful enough, he means it’s your job to make them more powerful, by shaping the primitives to fit your problem domain.
Likewise, HTML is full of wonderful primitives to mark up just about any human written knowledge. But I have a concrete problem that is a tiny specialized subset of human written knowledge, and so do you. Building a DSML is a way to build up HTML into the domain I’m working in. And, because HTML is the lingua franca of display, we’ll convert to it when it’s time to display.
Getting Concrete
I’m writing a course on MySQL administration. The largest volume of actual lines-typed-into-editors in the project is the content of the lessons. Every lesson is an HTML page, with one ordered list of steps, and every step has one or more tests that we perform to make sure the student did the work correctly.
Here’s how we used to write the lessons:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
This presents a few problems:
- Local neighborhoods can be difficult to navigate. What does this string tell you:
</li></ol>
? What did I just close? Where am I? - In what way is
<pre class='cli'>
superior to<cli>
? What do I get in return for 10 more characters? - PRE blocks make attractive indentation futile.
How does a DSML help?
Here’s what I want to write:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
|
This is starting to look like XML, without accidentally becoming XHTML. The joy of my DSML is that I’m writing a language that knows what I mean, not that I want a new layer of quoting rules.
Rewriting Custom Tags into HTML
Let’s start with the easy work, let’s convert the <steps>
list and the <step>
elements back into <ol>
and <li>
s. I’m using the QueryPath library. It’s a very similar API to jQuery, and because I do the transform server-side, I can provide well-formed pages to clients without JavaScript (like search engine spiders).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
|
Adding Application Logic and Error Checking
Of course, nobody’s perfect, so let’s add some rules to catch operator error:
1 2 3 4 5 6 7 8 9 |
|
While I’m writing lessons, my warn()
function adds bold red error messages to the top of the parsed document. In production, warn()
will quietly log them.
Tags that are Smarter than HTML
Those were simple replacements, you can do that with a regular expression and some duck tape. Let’s make this <cli>
tag fix my problems with HTML’s <pre>
:
- I want to be able to indent the content for easier editing.
- I want the
</pre>
tag on its own line, without showing the student an empty line at the bottom of the code block.
In other words, I want it to work like this:
1 2 3 4 |
|
But let me edit it like this:
1 2 3 4 5 |
|
Here’s the code that does it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
DSML my Users Care About
In lessons, when we introduce new terms, the student can hover over them to get a Bootstrap Popover that loads the definition from our glossary, dynamically. Here’s how that used to look in our code:
1 2 3 |
|
Here’s how I want it to look:
1 2 3 4 |
|
And here’s how we do it:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
|
Now we get warnings about terms we haven’t written glossary entries for (and we don’t call attention to them, to avoid embarrassment in front of students). Tagging glossary entries is easier (so we’ll do it more). And we’re free to make dramatic changes to the way we present glossary terms without touching a zillion lesson files. For example:
- We could slipstream in all the definitions into data attributes instead of fetching via AJAX.
- We could paste all the definitions as numbered footnotes on the page.
- We could switch the HTML we emit to the browser to use the
<dfn>
tag instead of<abbr>
.
Now go forth, and HTML no more.
HTML is pretty great, but I wouldn’t want to write in it.
- A DSML can get you closer to your problem domain, not just in code, but in presentation.
- A DSML can free you to write content without bogging you down in implementation details.
- A DSML can even make it easier to develop and update features spread across content.
Got comments? Head over to Reddit