Programming Langauges and Compilers (CS 164) with Paul Hilfinger was the hardest class I took in college, bar none. I recommend it to anyone studying CS at Cal: the correspondence between formal grammars and state machines-as-parsers has been pretty personally impactful. State machine diagrams were an indispensible part of Hilfinger’s notes on those subjects, and drawing out a state machine diagram to tap out the state transitions is how I recall those notes now.
State machine diagrams are indespensible to State Machines via Jorge Luis Borges because, without all my mnemonic drawing and tapping, I wouldn’t have recognized the connection between text and theory. The diagrams are also an indespensable way to concretize the examples in that post. I’m not prepared to explain (non)determinism without converting a specific graph, and I don’t have the patience to squeeze all ambiguity out of mock-Borges prose.
Diagramming also presented an opportunity to go further than the CS 164 notes: rather than just presenting diagrams and counting on readers to understand the rules governing state transitions, I wanted the diagrams to demonstrate those rules by illustrating dynamic user input.
I did all of this with TikZ, a LaTeX extension for specifying diagrams (probably the same extension Hilfinger used). This might seem like an odd choice for interactive demos, but there’s a logic behind the bodge!
pandoc-blog
,
the static site generator I use, is full of opinions. It uses Pandoc to
convert Markdown and LaTeX source to HTML documents in order to
Avoid big dependencies.1
Prefer print-friendliness.
The primary motivation in using
Pandoc was to be able to swap out converters: rendering a blog post as a
PDF should be as simple as pandoc -s source.md -o out.pdf
.
Avoid drawing content with Javascript: it shouldn’t take PhantomJS to
produce a PDF.
Prefer semantic markup.
In practice, this mostly means not
abusing HTML elements to do things they aren’t intended to do, but it’s
also reason to prefer a diagram specified in the DOM over a raster image
(when possible).
State Machines imposed particular requirements. All of my diagrams would have a great deal in common; they should represent states and transitions, with labels, in a consistent style. Given that similarity, I wanted a tool that’d let me define the state machines as minimally as possible. Each diagram definition should be easy to read and version.
Finally, I needed something I could make interactive: node appearances need to change according to user input.
Avoiding big dependencies and preferring print-friendliness meant
eschewing Javascript libraries for drawing graphs, even though a
JS-controlled canvas
element would probably serve well for
interactivity.2
That interactivity requirement, along with the preferences for semantic markup and for concise and versionable definitions, rules out tools for drawing graphs that only output raster images.
HTML canvas
and svg
elements are natural
fits for the technical requirements. They’re lightweight, reasonably
print-friendly, and semantic (which, in turn, means they’re scriptable
for interactivity). The trouble is SVG is just too flexible—I
want to be quickly defining labeled states and edges rather than
individually arranging SVG elements.
That’s where TikZ comes in: it’s concise.3 The problem with TikZ is that it’s a LaTeX extension meant for generating PostScript; I need it to generate SVG.
Defining a state machine diagram in TikZ is fairly straightforward:
These steps are just a matter of using styles and macros provided by
the tikz-automata
TikZ library; for an in-depth
explanation, check out Drawing
Finite State Machines in LaTeX using tikz
. In
practice, a TikZ state machine diagram is defined like this:
\begin{tikzpicture}
% 1. Define the states, with their positions, labels, and IDs.
\node[state, accepting, initial] (Rc) {$C$};
\node[state, below=of Rc] (Rb) {$B$};
\node[state, right=of Rc, yshift=-2cm] (Ra) {$A$};
% 2. Define the edges, with labels, between states.
\draw
(Rc) edge[bend right, left] node{1} (Rb)
(Rb) edge[bend right, right] node{0} (Rc)
(Rb) edge[bend right, below] node{0} (Ra)
(Ra) edge[loop right, right] node{1} (Ra)
(Ra) edge[bend right, above] node{1} (Rc);\end{tikzpicture}
This syntax isn’t perfect—in particular, I found positioning nodes pretty fiddly—but it’s concise (each node and edge on its own line) and readable (as readable as the labels and IDs one chooses).
Normally this tikzpicture
would be embedded into a LaTeX
document; how can it be turned into a standalone SVG? There’s an answer on the TeX
Stack Exchange:
Use the standalone
TeX
package to generate a PDF of just the TikZ figure, without any other
document cruft.
Use pdf2svg
to
convert that PDF to an SVG file. Miraculously—I have no idea how this
works—that SVG file is reasonably human-readable, if verbose: each
element in the PDF is converted to an SVG path
element.
Using standalone
is as simple as wrapping our
tikzpicture
in a document with the standalone
class. Doing so (and adding some TikZ style directives to the preamble)
gives us our final definition of our figure:
% This preamble is reused for every diagram.
\documentclass{standalone}
\usepackage{xcolor}
\usepackage{tikz}
\usetikzlibrary{automata, positioning, arrows}
\tikzstyle{defaults}=[
->,
>=stealth',
every state/.style={thick, fill=gray!10},
]
\begin{document}
\nopagecolor
% The same tikzpicture as before.
\begin{tikzpicture}
[defaults, node distance = 3cm]\node[state, accepting, initial] (Rc) {$C$};
\node[state, below=of Rc] (Rb) {$B$};
\node[state, right=of Rc, yshift=-2cm] (Ra) {$A$};
\draw
(Rc) edge[bend right, left] node{1} (Rb)
(Rb) edge[bend right, right] node{0} (Rc)
(Rb) edge[bend right, below] node{0} (Ra)
(Ra) edge[loop right, right] node{1} (Ra)
(Ra) edge[bend right, above] node{1} (Rc);\end{tikzpicture}
\end{document}
I wrote a short script, tikz2svg
,
which handles the TikZ → PDF → SVG conversion. Running
tikz2svg nfa-conversion-pre.tex
yields the SVG we’re
after:
pandoc-blog
pandoc-blog
’s build processes are managed
with make
: if one modifies a post’s Markdown source,
the gen/%.html
target rebuilds that post’s generated HTML
document.
Converting .tex
precursors to .svg
images
manually is a hassle. This reconversion should happen
automatically, the same way Markdown is reconverted to HTML: whenever a
tex
precursor is modified, rebuild the corresponding
SVG.
Doing this for State Machines involved adding one Makefile and modifying another. The relevant directories are structured like this:
Most of this is typical of pandoc-blog
:
blog/Makefile
(re)builds the full site.blog/posts
directory contains Markdown source files
for each post; make
generates corresponding HTML files in
blog/gen
.blog/img
directory holds static assets that can be
referenced in post markup. Those assets are organized into
subdirectories for tidiness: one for each post.What’s atypical is the inclusion of our precursor
(nfa-conversion-pre.tex
) and its build product
(nfa-conversion-pre.svg
) in
blog/img/borges-automata
: generated content outside of
blog/gen
!
blog/img/borges-automata/Makefile
builds all the
diagrams in blog/img/borges-automata
:
SOURCES = $(wildcard *.tex)
TARGETS = $(patsubst %.tex,%.svg,$(SOURCES))
all: $(TARGETS)
%.svg: %.tex
$<
tikz2svg
.PHONY: clean
clean:
rm *.svg
You can imagine any blog/img/*/Makefile
exposing the
same interface. The main blog/Makefile
can delegate
figure-generation to the img
subdirectory for each blog
post; this one happens to be TikZ-specific, but for future blog posts I
can write image-generating Makefiles
that encapsulate
whatever other behavior I need.
We can update blog/Makefile
to run make
in
each blog post’s subdirectory:
POSTS=$(shell find posts/*)
# OUT contains all names of static HTML targets corresponding to markdown files
# in the posts directory.
OUT=$(patsubst posts/%.md, gen/%.html, $(POSTS))
# IMAGEDIRS is a list of img/ subdirectories containing makefiles.
IMAGEDIRS=$(shell find img/*/Makefile | xargs -0 -n1 dirname)
all: $(OUT) $(IMAGEDIRS) index.html
# ... most targets unchanged from pandoc-blog/Makefile.
# Runs `make` in the target directory.
img/%: img/%/*.tex img/%/Makefile
$(MAKE) -C "$@"
@touch "$@"
Now the make watch
loop provided by
blog/Makefile
automatically rebuilds the diagrams whenever
we save a change to the TikZ source!
This isn’t a super novel way of composing Makefiles, but it’s a fun
complication of pandoc-blog
. I didn’t originally expect to
be generating images, and I’m pleased the root Makefile
isn’t sullied with blog-post-specific targets.
Diagrams are well and good, but you can’t build an interactive state machine demo without, well, implementing a state machine. Generically, this means turning a topology (all the various states and transitions between them) and initial state into some object that exposes an interface for feeding it input.
I do this with an Automaton
class, which I won’t discuss in depth. The key to an
Automaton
’s connection to an SVG diagram is buried in the
nodes
constructor argument: each state is defined with two
callback functions, enter
and exit
, which are
called when input causes the Automaton
to enter and leave
that state, respectively. When it enters state A
, it calls
A.enter()
; when it receives any subsequent input, it calls
A.exit()
before proceeding.
To represent the Automaton
’s state in the diagram,
A.enter()
should turn the SVG element corresponding to
A
blue; A.exit()
should reverse the change.
All we need is a reference to that SVG element.
The problem: those SVG elements were defined by pdf2svg
.
Some of them have automatically-assigned IDs, but (for some reason) the
circular paths demarcating states do not. I found the right paths with
Chrome’s dev tools and added id
s manually to match the TikZ
IDs:
path
< id="R0"
style="fill-rule:nonzero;fill:rgb(94.999695%,94.999695%,94.999695%);fill-opacity:1;stroke-width:0.79701;stroke-linecap:butt;stroke-linejoin:miter;stroke:rgb(0%,0%,0%);stroke-opacity:1;stroke-miterlimit:10;"
d="M 123.198937 42.518875 C 123.198937 49.397781 117.620812 54.972 110.745812 54.972 C 103.866906 54.972 98.292687 49.397781 98.292687 42.518875 C 98.292687 35.643875 103.866906 30.06575 110.745812 30.06575 C 117.620812 30.06575 123.198937 35.643875 123.198937 42.518875 Z M 123.198937 42.518875 "
transform="matrix(1,0,0,-1,53.137,83.722)"
/>
Which leaves us to define enter
and
exit
:
const enter = () => {
document.getElementById("R0").style.fill = "cyan";
}const exit = () => {
// The original fill is a light grey.
document.getElementById("R0").style.fill = "rgb(94.999695%,94.999695%,94.999695%)";
}
…and so on for each node in the graph; see the deterministic-binary-fsa.html
source for a full example of a modified SVG and Automaton
configuration bundled into an HTML document.
The final demo—of a nondeterministic finite state machine handling input alongside its deterministic equivalent—introduces a last complication. There’s no structural issue with plopping two generated SVGs into an HTML document side-by-side, but there’s a semantic one: ID collisions.
SVG’s <use>
element lets you duplicate another
element by its ID. This is a neat little optimization: if you have a
complex path you need to use twice, better to define it once and reuse
it by reference! pdf2svg
seizes on this optimization. The
form of the digit 1
is only defined once; all the other
1
s are defined, in source, by <use>
elements referencing the original. The browser looks up the original
element by its ID and re-renders it.
Remember how some of the pdf2svg
-generated elements have
IDs? Those are of the form glyph0-0
, glyph0-1
,
and so on, which means that every
pdf2svg
-generated SVG has (and perhaps use
s) a
glyph0-0
.
This wouldn’t be an issue but for the way we’re combining them:
plopping the SVG source unceremoniously into an HTML document. When the
browser looks up the DFA’s glyph0-0
by reference… it finds
the NFA’s glyph0-0
instead. The first diagram is
fine; the second one is a garbled mess of glyphs from the first.
The solution: prefix each SVG’s IDs. Find-and-replace is robust enough:
id="
with
id="{prefix}
."#
with
"#{prefix}
.I’ve since updated
tikz2svg
to do this ID replacement automatically; now it
prefixes generated SVG element IDs with the source filename. If you want
to reuse filenames, insert a random component instead.
The Javascript animating this compound diagram is somewhat more
complicated: it instantiates two Automaton
s, and feeds
inputs to them simultaneously. See the nfa-conversion.html
source for details.
Ideally, I’d like to have a script to convert a TikZ state machine definition directly to an encapsulated demo.
TikZ itself is an academic diagramming syntax; it offers a great deal of flexibility that must be excluded if the script is to extract a working state machine definition (that is, a topology).
The multi-step process for producing an SVG—renderinga PDF, then
running that through pdf2svg
—means TikZ-specified data like
state IDs are missing from the generated SVG. Remember: we re-added
these IDs manually.
These issues are solved by avoiding TikZ, and instead building diagrams from some other domain-specific format. Javascript representations of the automaton topologies, like those configuring the interactive demos, would work. On the other hand, that means resigning TikZ’s advantages. Honestly? I don’t think I’ll bother!
I’d like to make some other incidental improvements (avoiding
embedded iframe
s; referencing a single LaTeX header file
between diagrams; things like that).
For now, I have what I need. Blogging with TikZ allowed me to avoid big dependencies, to keep my blog print-friendly, to keep its markup semantic, and to share interactive demos with the next generation terrorized by CS 164.
If my commitment to the ideological purity of my Pandoc-generated blog seems extreme, check out phiresky’s “Hosting SQLite databases on GitHub Pages”.
That blog’s written as Pandoc Markdown; the Pandoc artifact is postprocessed with React to generate charts and runnable code blocks (including queries of a statically-hosted SQL database). That’s commitment.