Programming Langauges and Compilers (CS 164) with Paul Hilfinger was the hardest class I took in college, bar none. I recommend it to anyone studying CS at Cal: the correspondence between formal grammars and state machines-as-parsers has been pretty personally impactful. State machine diagrams were an indispensible part of Hilfinger’s notes on those subjects, and drawing out a state machine diagram to tap out the state transitions is how I recall those notes now.
State machine diagrams are indespensible to State Machines via Jorge Luis Borges because, without all my mnemonic drawing and tapping, I wouldn’t have recognized the connection between text and theory. The diagrams are also an indespensable way to concretize the examples in that post. I’m not prepared to explain (non)determinism without converting a specific graph, and I don’t have the patience to squeeze all ambiguity out of mock-Borges prose.
Diagramming also presented an opportunity to go further than the CS 164 notes: rather than just presenting diagrams and counting on readers to understand the rules governing state transitions, I wanted the diagrams to demonstrate those rules by illustrating dynamic user input.
I did all of this with TikZ, a LaTeX extension for specifying diagrams (probably the same extension Hilfinger used). This might seem like an odd choice for interactive demos, but there’s a logic behind the bodge!
pandoc-blog
, the static site generator I use, is full of opinions. It uses Pandoc to convert Markdown and LaTeX source to HTML documents in order to
Avoid big dependencies.1
Prefer print-friendliness.
The primary motivation in using Pandoc was to be able to swap out converters: rendering a blog post as a PDF should be as simple as pandoc -s source.md -o out.pdf
. Avoid drawing content with Javascript: it shouldn’t take PhantomJS to produce a PDF.
Prefer semantic markup.
In practice, this mostly means not abusing HTML elements to do things they aren’t intended to do, but it’s also reason to prefer a diagram specified in the DOM over a raster image (when possible).
State Machines imposed particular requirements. All of my diagrams would have a great deal in common; they should represent states and transitions, with labels, in a consistent style. Given that similarity, I wanted a tool that’d let me define the state machines as minimally as possible. Each diagram definition should be easy to read and version.
Finally, I needed something I could make interactive: node appearances need to change according to user input.
Avoiding big dependencies and preferring print-friendliness meant eschewing Javascript libraries for drawing graphs, even though a JS-controlled canvas
element would probably serve well for interactivity.2
That interactivity requirement, along with the preferences for semantic markup and for concise and versionable definitions, rules out tools for drawing graphs that only output raster images.
HTML canvas
and svg
elements are natural fits for the technical requirements. They’re lightweight, reasonably print-friendly, and semantic (which, in turn, means they’re scriptable for interactivity). The trouble is SVG is just too flexible—I want to be quickly defining labeled states and edges rather than individually arranging SVG elements.
That’s where TikZ comes in: it’s concise.3 The problem with TikZ is that it’s a LaTeX extension meant for generating PostScript; I need it to generate SVG.
Defining a state machine diagram in TikZ is fairly straightforward:
These steps are just a matter of using styles and macros provided by the tikz-automata
TikZ library; for an in-depth explanation, check out Drawing Finite State Machines in LaTeX using tikz
. In practice, a TikZ state machine diagram is defined like this:
\begin{tikzpicture}
% 1. Define the states, with their positions, labels, and IDs.
\node[state, accepting, initial] (Rc) {$C$};
\node[state, below=of Rc] (Rb) {$B$};
\node[state, right=of Rc, yshift=-2cm] (Ra) {$A$};
% 2. Define the edges, with labels, between states.
\draw
(Rc) edge[bend right, left] node{1} (Rb)
(Rb) edge[bend right, right] node{0} (Rc)
(Rb) edge[bend right, below] node{0} (Ra)
(Ra) edge[loop right, right] node{1} (Ra)
(Ra) edge[bend right, above] node{1} (Rc);
\end{tikzpicture}
This syntax isn’t perfect—in particular, I found positioning nodes pretty fiddly—but it’s concise (each node and edge on its own line) and readable (as readable as the labels and IDs one chooses).
Normally this tikzpicture
would be embedded into a LaTeX document; how can it be turned into a standalone SVG? There’s an answer on the TeX Stack Exchange:
Use the standalone
TeX package to generate a PDF of just the TikZ figure, without any other document cruft.
Use pdf2svg
to convert that PDF to an SVG file. Miraculously—I have no idea how this works—that SVG file is reasonably human-readable, if verbose: each element in the PDF is converted to an SVG path
element.
Using standalone
is as simple as wrapping our tikzpicture
in a document with the standalone
class. Doing so (and adding some TikZ style directives to the preamble) gives us our final definition of our figure:
% This preamble is reused for every diagram.
\documentclass{standalone}
\usepackage{xcolor}
\usepackage{tikz}
\usetikzlibrary{automata, positioning, arrows}
\tikzstyle{defaults}=[
->,
>=stealth',
every state/.style={thick, fill=gray!10},
]
\begin{document}
\nopagecolor
% The same tikzpicture as before.
\begin{tikzpicture}
[defaults, node distance = 3cm]
\node[state, accepting, initial] (Rc) {$C$};
\node[state, below=of Rc] (Rb) {$B$};
\node[state, right=of Rc, yshift=-2cm] (Ra) {$A$};
\draw
(Rc) edge[bend right, left] node{1} (Rb)
(Rb) edge[bend right, right] node{0} (Rc)
(Rb) edge[bend right, below] node{0} (Ra)
(Ra) edge[loop right, right] node{1} (Ra)
(Ra) edge[bend right, above] node{1} (Rc);
\end{tikzpicture}
\end{document}
I wrote a short script, tikz2svg
, which handles the TikZ → PDF → SVG conversion. Running tikz2svg nfa-conversion-pre.tex
yields the SVG we’re after:
pandoc-blog
pandoc-blog
’s build processes are managed with make
: if one modifies a post’s Markdown source, the gen/%.html
target rebuilds that post’s generated HTML document.
Converting .tex
precursors to .svg
images manually is a hassle. This reconversion should happen automatically, the same way Markdown is reconverted to HTML: whenever a tex
precursor is modified, rebuild the corresponding SVG.
Doing this for State Machines involved adding one Makefile and modifying another. The relevant directories are structured like this:
Most of this is typical of pandoc-blog
:
blog/Makefile
(re)builds the full site.blog/posts
directory contains Markdown source files for each post; make
generates corresponding HTML files in blog/gen
.blog/img
directory holds static assets that can be referenced in post markup. Those assets are organized into subdirectories for tidiness: one for each post.What’s atypical is the inclusion of our precursor (nfa-conversion-pre.tex
) and its build product (nfa-conversion-pre.svg
) in blog/img/borges-automata
: generated content outside of blog/gen
!
blog/img/borges-automata/Makefile
builds all the diagrams in blog/img/borges-automata
:
SOURCES = $(wildcard *.tex)
TARGETS = $(patsubst %.tex,%.svg,$(SOURCES))
all: $(TARGETS)
%.svg: %.tex
tikz2svg $<
.PHONY: clean
clean:
rm *.svg
You can imagine any blog/img/*/Makefile
exposing the same interface. The main blog/Makefile
can delegate figure-generation to the img
subdirectory for each blog post; this one happens to be TikZ-specific, but for future blog posts I can write image-generating Makefiles
that encapsulate whatever other behavior I need.
We can update blog/Makefile
to run make
in each blog post’s subdirectory:
POSTS=$(shell find posts/*)
# OUT contains all names of static HTML targets corresponding to markdown files
# in the posts directory.
OUT=$(patsubst posts/%.md, gen/%.html, $(POSTS))
# IMAGEDIRS is a list of img/ subdirectories containing makefiles.
IMAGEDIRS=$(shell find img/*/Makefile | xargs -0 -n1 dirname)
all: $(OUT) $(IMAGEDIRS) index.html
# ... most targets unchanged from pandoc-blog/Makefile.
# Runs `make` in the target directory.
img/%: img/%/*.tex img/%/Makefile
$(MAKE) -C "$@"
@touch "$@"
Now the make watch
loop provided by blog/Makefile
automatically rebuilds the diagrams whenever we save a change to the TikZ source!
This isn’t a super novel way of composing Makefiles, but it’s a fun complication of pandoc-blog
. I didn’t originally expect to be generating images, and I’m pleased the root Makefile
isn’t sullied with blog-post-specific targets.
Diagrams are well and good, but you can’t build an interactive state machine demo without, well, implementing a state machine. Generically, this means turning a topology (all the various states and transitions between them) and initial state into some object that exposes an interface for feeding it input.
I do this with an Automaton
class, which I won’t discuss in depth. The key to an Automaton
’s connection to an SVG diagram is buried in the nodes
constructor argument: each state is defined with two callback functions, enter
and exit
, which are called when input causes the Automaton
to enter and leave that state, respectively. When it enters state A
, it calls A.enter()
; when it receives any subsequent input, it calls A.exit()
before proceeding.
To represent the Automaton
’s state in the diagram, A.enter()
should turn the SVG element corresponding to A
blue; A.exit()
should reverse the change. All we need is a reference to that SVG element.
The problem: those SVG elements were defined by pdf2svg
. Some of them have automatically-assigned IDs, but (for some reason) the circular paths demarcating states do not. I found the right paths with Chrome’s dev tools and added id
s manually to match the TikZ IDs:
<path
id="R0"
style="fill-rule:nonzero;fill:rgb(94.999695%,94.999695%,94.999695%);fill-opacity:1;stroke-width:0.79701;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;"
d="M 123.198937 42.518875 C 123.198937 49.397781 117.620812 54.972 110.745812 54.972 C 103.866906 54.972 98.292687 49.397781 98.292687 42.518875 C 98.292687 35.643875 103.866906 30.06575 110.745812 30.06575 C 117.620812 30.06575 123.198937 35.643875 123.198937 42.518875 Z M 123.198937 42.518875 "
transform="matrix(1,0,0,-1,53.137,83.722)"
/>
Which leaves us to define enter
and exit
:
const enter = () => {
document.getElementById("R0").style.fill = "cyan";
}
const exit = () => {
// The original fill is a light grey.
document.getElementById("R0").style.fill = "rgb(94.999695%,94.999695%,94.999695%)";
}
…and so on for each node in the graph; see the deterministic-binary-fsa.html source for a full example of a modified SVG and Automaton
configuration bundled into an HTML document.
The final demo—of a nondeterministic finite state machine handling input alongside its deterministic equivalent—introduces a last complication. There’s no structural issue with plopping two generated SVGs into an HTML document side-by-side, but there’s a semantic one: ID collisions.
SVG’s <use>
element lets you duplicate another element by its ID. This is a neat little optimization: if you have a complex path you need to use twice, better to define it once and reuse it by reference! pdf2svg
seizes on this optimization. The form of the digit 1
is only defined once; all the other 1
s are defined, in source, by <use>
elements referencing the original. The browser looks up the original element by its ID and re-renders it.
Remember how some of the pdf2svg
-generated elements have IDs? Those are of the form glyph0-0
, glyph0-1
, and so on, which means that every pdf2svg
-generated SVG has (and perhaps use
s) a glyph0-0
.
This wouldn’t be an issue but for the way we’re combining them: plopping the SVG source unceremoniously into an HTML document. When the browser looks up the DFA’s glyph0-0
by reference… it finds the NFA’s glyph0-0
instead. The first diagram is fine; the second one is a garbled mess of glyphs from the first.
The solution: prefix each SVG’s IDs. Find-and-replace is robust enough:
id="
with id="{prefix}
."#
with "#{prefix}
.I’ve since updated tikz2svg
to do this ID replacement automatically; now it prefixes generated SVG element IDs with the source filename. If you want to reuse filenames, insert a random component instead.
The Javascript animating this compound diagram is somewhat more complicated: it instantiates two Automaton
s, and feeds inputs to them simultaneously. See the nfa-conversion.html source for details.
Ideally, I’d like to have a script to convert a TikZ state machine definition directly to an encapsulated demo.
TikZ itself is an academic diagramming syntax; it offers a great deal of flexibility that must be excluded if the script is to extract a working state machine definition (that is, a topology).
The multi-step process for producing an SVG—renderinga PDF, then running that through pdf2svg
—means TikZ-specified data like state IDs are missing from the generated SVG. Remember: we re-added these IDs manually.
These issues are solved by avoiding TikZ, and instead building diagrams from some other domain-specific format. Javascript representations of the automaton topologies, like those configuring the interactive demos, would work. On the other hand, that means resigning TikZ’s advantages. Honestly? I don’t think I’ll bother!
I’d like to make some other incidental improvements (avoiding embedded iframe
s; referencing a single LaTeX header file between diagrams; things like that).
For now, I have what I need. Blogging with TikZ allowed me to avoid big dependencies, to keep my blog print-friendly, to keep its markup semantic, and to share interactive demos with the next generation terrorized by CS 164.
If my commitment to the ideological purity of my Pandoc-generated blog seems extreme, check out phiresky’s “Hosting SQLite databases on GitHub Pages”.
That blog’s written as Pandoc Markdown; the Pandoc artifact is postprocessed with React to generate charts and runnable code blocks (including queries of a statically-hosted SQL database). That’s commitment.
Relevant: Maciej Cegłowski’s evergreen slides on Chickenshit Minimalism. I’m proud to keep this site hard-to-read without any help from npm.↩︎
Graph-drawing libraries are smaller than I expected, but not so small. The entirety of State Machines (complete with its diagrams and demos) is about the same size as minified d3.js: 94.5 and 83.6 kB, respectively.↩︎
There was another draw to TikZ which I didn’t ever ultimately achieve: I figured that, for the static diagrams, I could define TikZ diagrams inline (in a LaTeX environment) in my post’s Markdown source. I still think this is possible, but it’d take writing a custom Pandoc extension to extract the TikZ picture, convert it to an SVG, and replace it with an HTML img
referencing that SVG.
I decided it’d take too much fiddling to make that Pandoc extension reusable (especially outside of pandoc-blog
), and that I might not actually have reason to reuse it.↩︎