« blog

Blogging with TikZ

Programming Langauges and Compilers (CS 164) with Paul Hilfinger was the hardest class I took in college, bar none. I recommend it to anyone studying CS at Cal: the correspondence between formal grammars and state machines-as-parsers has been pretty personally impactful. State machine diagrams were an indispensible part of Hilfinger’s notes on those subjects, and drawing out a state machine diagram to tap out the state transitions is how I recall those notes now.

State machine diagrams are indespensible to State Machines via Jorge Luis Borges because, without all my mnemonic drawing and tapping, I wouldn’t have recognized the connection between text and theory. The diagrams are also an indespensable way to concretize the examples in that post. I’m not prepared to explain (non)determinism without converting a specific graph, and I don’t have the patience to squeeze all ambiguity out of mock-Borges prose.

Diagramming also presented an opportunity to go further than the CS 164 notes: rather than just presenting diagrams and counting on readers to understand the rules governing state transitions, I wanted the diagrams to illustrate those rules by illustrating dynamic user input.

I did all of this with TikZ, a LaTeX extension for specifying diagrams (probably the same extension Hilfinger used). This might seem like an odd choice for interactive demos, but there’s a logic behind the bodge!

Arriving at TikZ

pandoc-blog, the static site generator I use, is full of opinions. It uses Pandoc to convert Markdown and LaTeX source to HTML documents in order to

State Machines imposed particular requirements. All of my diagrams would have a great deal in common; they should represent states and transitions, with labels, in a consistent style. Given that similarity, I wanted a tool that’d let me define the state machines as minimally as possible. Each diagram definition should be easy to read and version.

Finally, I needed something I could make interactive: node appearances need to change according to user input.

Avoiding big dependencies and preferring print-friendliness meant eschewing Javascript libraries for drawing graphs, even though a JS-controlled canvas element would probably serve well for interactivity.2

That interactivity requirement, along with the preferences for semantic markup and for concise and versionable definitions, rules out tools for drawing graphs that only output raster images.

HTML canvas and svg elements are natural fits for the technical requirements. They’re lightweight, reasonably print-friendly, and semantic (which, in turn, means they’re scriptable for interactivity). The trouble is SVG is just too flexible—I want to be quickly defining labeled states and edges rather than individually arranging SVG elements.

That’s where TikZ comes in: it’s concise.3 The problem with TikZ is that it’s a LaTeX extension meant for generating PostScript; I need it to generate SVG.


Converting TikZ to SVG

Defining a state machine diagram in TikZ is fairly straightforward:

  1. Define a set of states, with their positions, labels, and IDs.
  2. Define a set of edges, with labels, between states.

These steps are just a matter of using styles and macros provided by the tikz-automata TikZ library; for an in-depth explanation, check out Drawing Finite State Machines in LaTeX using tikz. In practice, a TikZ state machine diagram is defined like this:

  % 1. Define the states, with their positions, labels, and IDs.
  \node[state, accepting, initial] (Rc) {$C$};
  \node[state, below=of Rc] (Rb) {$B$};
  \node[state, right=of Rc, yshift=-2cm] (Ra) {$A$};
  % 2. Define the edges, with labels, between states.
    (Rc) edge[bend right, left] node{1} (Rb)
    (Rb) edge[bend right, right] node{0} (Rc)
    (Rb) edge[bend right, below] node{0} (Ra)
    (Ra) edge[loop right, right] node{1} (Ra)
    (Ra) edge[bend right, above] node{1} (Rc);

This syntax isn’t perfect—in particular, I found positioning nodes pretty fiddly—but it’s concise (each node and edge on its own line) and readable (as readable as the labels and IDs one chooses).

Normally this tikzpicture would be embedded into a LaTeX document; how can it be turned into a standalone SVG? There’s an answer on the TeX Stack Exchange:

  1. Use the standalone TeX package to generate a PDF of just the TikZ figure, without any other document cruft.

  2. Use pdf2svg to convert that PDF to an SVG file. Miraculously—I have no idea how this works—that SVG file is reasonably human-readable, if verbose: each element in the PDF is converted to an SVG path element.

Using standalone is as simple as wrapping our tikzpicture in a document with the standalone class. Doing so (and adding some TikZ style directives to the preamble) gives us our final definition of our figure:

% This preamble is reused for every diagram.
\usetikzlibrary{automata, positioning, arrows}
  every state/.style={thick, fill=gray!10},

  % The same tikzpicture as before.
    [defaults, node distance = 3cm]
    \node[state, accepting, initial] (Rc) {$C$};
    \node[state, below=of Rc] (Rb) {$B$};
    \node[state, right=of Rc, yshift=-2cm] (Ra) {$A$};
      (Rc) edge[bend right, left] node{1} (Rb)
      (Rb) edge[bend right, right] node{0} (Rc)
      (Rb) edge[bend right, below] node{0} (Ra)
      (Ra) edge[loop right, right] node{1} (Ra)
      (Ra) edge[bend right, above] node{1} (Rc);

I wrote a short script, tikz2svg, which handles the TikZ → PDF → SVG conversion. Running tikz2svg nfa-conversion-pre.tex yields the SVG we’re after:

nfa-conversion-pre.svg: the state machine diagram produced by tikz2svg nfa-conversion-pre.tex.

Building images in pandoc-blog

pandoc-blog’s build processes are managed with make: if one modifies a post’s Markdown source, the gen/%.html target rebuilds that post’s generated HTML document.

Converting .tex precursors to .svg images manually is a hassle. This reconversion should happen automatically, the same way Markdown is reconverted to HTML: whenever a tex precursor is modified, rebuild the corresponding SVG.

Doing this for State Machines involved adding one Makefile and modifying another. The relevant directories are structured like this:

Most of this is typical of pandoc-blog:

What’s atypical is the inclusion of our precursor (nfa-conversion-pre.tex) and its build product (nfa-conversion-pre.svg) in blog/img/borges-automata: generated content outside of blog/gen!

blog/img/borges-automata/Makefile builds all the diagrams in blog/img/borges-automata:

SOURCES = $(wildcard *.tex)
TARGETS = $(patsubst %.tex,%.svg,$(SOURCES))

all: $(TARGETS)

%.svg: %.tex
    tikz2svg $<

.PHONY: clean
    rm *.svg

You can imagine any blog/img/*/Makefile exposing the same interface. The main blog/Makefile can delegate figure-generation to the img subdirectory for each blog post; this one happens to be TikZ-specific, but for future blog posts I can write image-generating Makefiles that encapsulate whatever other behavior I need.

We can update blog/Makefile to run make in each blog post’s subdirectory:

POSTS=$(shell find posts/*)
# OUT contains all names of static HTML targets corresponding to markdown files
# in the posts directory.
OUT=$(patsubst posts/%.md, gen/%.html, $(POSTS))
# IMAGEDIRS is a list of img/ subdirectories containing makefiles.
IMAGEDIRS=$(shell find img/*/Makefile | xargs -0 -n1 dirname)

all: $(OUT) $(IMAGEDIRS) index.html

# ... most targets unchanged from pandoc-blog/Makefile.

# Runs `make` in the target directory.
img/%: img/%/*.tex img/%/Makefile
    $(MAKE) -C "$@"
    @touch "$@"

Now the make watch loop provided by blog/Makefile automatically rebuilds the diagrams whenever we save a change to the TikZ source!

This isn’t a super novel way of composing Makefiles, but it’s a fun complication of pandoc-blog. I didn’t originally expect to be generating images, and I’m pleased the root Makefile isn’t sullied with blog-post-specific targets.

Adding interactivity

Diagrams are well and good, but you can’t build an interactive state machine demo without, well, implementing a state machine. Generically, this means turning a topology (all the various states and transitions between them) and initial state into some object that exposes an interface for feeding it input.

I do this with an Automaton class, which I won’t discuss in depth. The key to an Automaton’s connection to an SVG diagram is buried in the nodes constructor argument: each state is defined with two callback functions, enter and exit, which are called when input causes the Automaton to enter and leave that state, respectively. When it enters state A, it calls A.enter(); when it receives any subsequent input, it calls A.exit() before proceeding.

To represent the Automaton’s state in the diagram, A.enter() should turn the SVG element corresponding to A blue; A.exit() should reverse the change. All we need is a reference to that SVG element.

The problem: those SVG elements were defined by pdf2svg. Some of them have automatically-assigned IDs, but (for some reason) the circular paths demarcating states do not. I found the right paths with Chrome’s dev tools and added ids manually to match the TikZ IDs:

  d="M 123.198937 42.518875 C 123.198937 49.397781 117.620812 54.972 110.745812 54.972 C 103.866906 54.972 98.292687 49.397781 98.292687 42.518875 C 98.292687 35.643875 103.866906 30.06575 110.745812 30.06575 C 117.620812 30.06575 123.198937 35.643875 123.198937 42.518875 Z M 123.198937 42.518875 "

Which leaves us to define enter and exit:

const enter = () => {
  document.getElementById("R0").style.fill = "cyan";
const exit = () => {
  // The original fill is a light grey.
  document.getElementById("R0").style.fill = "rgb(94.999695%,94.999695%,94.999695%)";

…and so on for each node in the graph; see the deterministic-binary-fsa.html source for a full example of a modified SVG and Automaton configuration bundled into an HTML document.

The final demo—of a nondeterministic finite state machine handling input alongside its deterministic equivalent—introduces a last complication. There’s no structural issue with plopping two generated SVGs into an HTML document side-by-side, but there’s a semantic one: ID collisions.

SVG’s <use> element lets you duplicate another element by its ID. This is a neat little optimization: if you have a complex path you need to use twice, better to define it once and reuse it by reference! pdf2svg seizes on this optimization. The form of the digit 1 is only defined once; all the other 1s are defined, in source, by <use> elements referencing the original. The browser looks up the original element by its ID and re-renders it.

Remember how some of the pdf2svg-generated elements have IDs? Those are of the form glyph0-0, glyph0-1, and so on, which means that every pdf2svg-generated SVG has (and perhaps uses) a glyph0-0.

This wouldn’t be an issue but for the way we’re combining them: plopping the SVG source unceremoniously into an HTML document. When the browser looks up the DFA’s glyph0-0 by reference… it finds the NFA’s glyph0-0 instead. The first diagram is fine; the second one is a garbled mess of glyphs from the first.

The solution: prefix each SVG’s IDs. Find-and-replace is robust enough:

  1. Replace all instances of id=" with id="{prefix}.
  2. Replace all instances of "# with "#{prefix}.

I’ve since updated tikz2svg to do this ID replacement automatically; now it prefixes generated SVG element IDs with the source filename. If you want to reuse filenames, insert a random component instead.

The Javascript animating this compound diagram is somewhat more complicated: it instantiates two Automatons, and feeds inputs to them simultaneously. See the nfa-conversion.html source for details.

Future work

Ideally, I’d like to have a script to convert a TikZ state machine definition directly to an encapsulated demo.

  1. TikZ itself is an academic diagramming syntax; it offers a great deal of flexibility that must be excluded if the script is to extract a working state machine definition (that is, a topology).

  2. The multi-step process for producing an SVG—renderinga PDF, then running that through pdf2svg—means TikZ-specified data like state IDs are missing from the generated SVG. Remember: we re-added these IDs manually.

These issues are solved by avoiding TikZ, and instead building diagrams from some other domain-specific format. Javascript representations of the automaton topologies, like those configuring the interactive demos, would work. On the other hand, that means resigning TikZ’s advantages. Honestly? I don’t think I’ll bother!

I’d like to make some other incidental improvements (avoiding embedded iframes; referencing a single LaTeX header file between diagrams; things like that).

For now, I have what I need. Blogging with TikZ allowed me to avoid big dependencies, to keep my blog print-friendly, to keep its markup semantic, and to share interactive demos with the next generation terrorized by CS 164.

If my commitment to the ideological purity of my Pandoc-generated blog seems extreme, check out phiresky’s “Hosting SQLite databases on GitHub Pages”.

That blog’s written as Pandoc Markdown; the Pandoc artifact is postprocessed with React to generate charts and runnable code blocks (including queries of a statically-hosted SQL database). That’s commitment.

  1. Relevant: Maciej Cegłowski’s evergreen slides on Chickenshit Minimalism. I’m proud to keep this site hard-to-read without any help from npm.↩︎

  2. Graph-drawing libraries are smaller than I expected, but not so small. The entirety of State Machines (complete with its diagrams and demos) is about the same size as minified d3.js: 94.5 and 83.6 kB, respectively.↩︎

  3. There was another draw to TikZ which I didn’t ever ultimately achieve: I figured that, for the static diagrams, I could define TikZ diagrams inline (in a LaTeX environment) in my post’s Markdown source. I still think this is possible, but it’d take writing a custom Pandoc extension to extract the TikZ picture, convert it to an SVG, and replace it with an HTML img referencing that SVG.

    I decided it’d take too much fiddling to make that Pandoc extension reusable (especially outside of pandoc-blog), and that I might not actually have reason to reuse it.↩︎