« blog

Deriving Safe ID Types in Go

This is entirely my advice, but I realized it while contracting for Alpaca
special thanks to that team for their patience and good ideas.

Suppose you’re applying Safety Through Incompatibility in Widget Corp.’s widget fulfillment system. You want compile-time incompatible ID types for different collections, but you don’t have the luxury of starting from scratch: Widget Corp made the sensible1 decision to use random UUIDs for all their canonical identifiers years ago, and has thousands of those identifiers in the wild. What’s the lightest-touch way to bring that living system up to speed?

Whereas my post on “Safety Through Incompatibility” was largely language-agnostic, this one will focus on a language-specific example: switching to UUID-based typed IDs in Go, using Google’s uuid.UUID. There are several ways to do the same thing — or so it seems — and I found scant guidance online on how to choose between them!

Assumptions

You already use UUIDs as IDs. You generate IDs for new records, but also exchange records with the outside world: customers order widgets through a JSON API, and you record widget IDs in your logs for debugging and accounting. Thus, in addition to the intrinsic safety of type incompatibility (that a Widget ID be unusable where an Order ID is needed), you want three things from your ID type:

I capture these requirements in this file of tests.

Defining a new string type

If you’re using raw strings to represent your UUID IDs, switching to a new type defined by string is dead-simple: serialization works out of the box.

type ID string

Update the places you construct IDs — somemthing like widget.ID(uuid.New().String()) — and boom, the widget fulfillment system has a dedicated widget ID type.

The downside is a different aspect of ID safety: not incompatibility, but validity. You might be really confident all your IDs are UUIDs, but (as for the raw string type you started with) there’s no type guarantee they always will be. Maybe you have a test that pretends "MY-PET-ID" is a valid UUID. Maybe you’ll get nonsense data over the wire. You can call uuid.Parse to validate your strings, but that’s a secondary process — you have to remember to do it — and one you can’t quite push to the system boundary.

It’s easier to call a spade a spade: just use uuid.UUID for UUIDs.

Embedding uuid.UUID

type ID struct { uuid.UUID }

This exemplifies a Go technique called Struct Embedding. ID is a struct — Go’s basic composite data type — with a single, anonymous component: a UUID.

Like aliasing string, serialization just works. Unlike aliasing string, UUID validity is enforced at the system edge: json.Unmarshal will error if you provide a string that’s too long, or too short, or includes UUID-incompatible characters.

To tell you the truth, my problems with a struct-embedding ID like this are all nitpicky. For one, the underlying uuid.UUID is a public field; you can access it (e.g. access widgetID.UUID). Strictly speaking, that kind of conversion back into a uuid.UUID is possible for all the approaches I describe here, but it’s a conversion to generally discourage: it’s a leaky abstraction, and it allows comparing incompatible UUID-derived IDs as if they were plain uuid.UUIDs. Why dangle a public field if you don’t want anyone to use it?

Moreover — and this is really a matter of taste — the construction ID{uuid.New()} just feels wrong. Are there more (unset) fields in the ID struct? Single-field struct embedding uses a nonatomic composite data type to represent a conceptually atomic identifier.

Defining a new uuid.UUID type

type ID uuid.UUID

Like the type ID string declaration considered earlier, type definition creates a wholly new type — ID — with the same memory layout as uuid.UUID. In fact, uuid.UUID itself is defined through a construction like this: type uuid.UUID [16]byte. Unlike in a public struct-embedding, the underlying type isn’t exposed.

The main draback here is the serialization. Out of the box, this thing serializes like a [16]byte: expect JSON like "[243,242,232,80,181,212,17,239,172,126,150,88,77,82,72,178]" instead of anything recognizable.

You solve this by explicitly reimplementing some common single-method interfaces already implemented by uuid.UUID, essentially by invoking those UUID implementations.

func (id ID) String() string { ... }
Implements fmt.Stringer.
func (id ID) MarshalJSON() ([]byte, error) { ... }
Implements json.Marshaler.
func (id *ID) UnmarshalJSON(data []byte) error { ... }
Implements json.Unmarshaler.
This implementation may be a little tricky for new Go programmers: you have to cast the pointer id without dereferencing it. Here’s my implementation.

These implementations take a little more effort on your part to make this approach pass the tests, and the implementations themselves are dense little typesystem tongue-twisters. Defining a new uuid.UUID type is less sloppy than struct embedding, but maybe we can find something less verbose?

Aliasing uuid.UUID

type ID = uuid.UUID

There’s a subtle difference between declaring a new type using another — as in the type ID uuid.UUID example above — and aliasing a type in Go. Syntactically, the difference is small: the = in the alias declaration. In practice, the difference is much bigger.

An aliased type (ID) preserves the receiver methods defined on the original type (uuid.UUID)! In this case, that includes the existing standard interface implementations MarshalJSON, UnmarshalJSON, and String Google ships. Not only does aliasing uuid.UUID give you built-in validation (unlike defining a new string type); it doesn’t expose a misleading inner field (unlike struct-embedding uuid.UUID); and it serializes as expected out of the box (unlike defining a new uuid.UUID type) because it automatically implements the necessary interfaces!

Where’s the catch? Spot it in this passage from the Go Blog’s intro to alias declarations:

In contrast to a regular type definition

type T T0

which declares a new type that is never identical to the type on the right-hand side of the declaration, an alias declaration

type A = T  // the "=" indicates an alias declaration

declares only a new name A for the type on the right-hand side: here, A and T denote the same and thus identical type T.

Alias declarations make it possible to provide a new name (in a new package!) for a given type while retaining type identity:

package pkg2

import "path/to/pkg1"

type T = pkg1.T

The type name has changed from pkg1.T to pkg2.T but values of type pkg2.T have the same type as variables of type pkg1.T.

In simpler terms, a type alias is not a new type — it’s entirely compatible with the old one, and with other aliases of the same original type. Remember the original mission: safety through incompatibility! If you can interchange your widget.IDs and order.IDs, you haven’t made anything safer.

Summary

I introduced these approaches naïvely as four equivalent options. Divide them instead into two groups: the first, a group of one, is built on fundamentally the wrong type (string); the other options, based on uuid.UUID, have more promise.

The type alias is fool’s gold: it’s syntactically minimal and it passes our tests, but it doesn’t prevent interoperability.

Prefer one of the remaining two. If you’re pressed for time or worried about introducing something that looks complex, the struct embedding is a minimal drop-in replacement for raw strings.

In a mature codebase, I prefer defining a new uuid.UUID type (type ID uuid.UUID) and implementing the serialization interfaces yourself. These are verbose, but you can copy-paste my implementations for starters. You might also take advantage of the custom implementations — to prefix your logged UUIDs with a type identifier, say, so widget f3f2e850-b5d4-11ef-ac7e-96584d5248b2 appears more unambiguously as Widget_f3f2e850-b5d4-11ef-ac7e-96584d5248b2.

Approach Pros Cons
Defining a new string type:
type ID string
Passes tests out of the box No protections against non-UUID values
Embedding uuid.UUID:
type ID struct { uuid.UUID }
Non-UUID values are impossible
Passes tests out of the box
Exposes the embedded uuid.UUID implementation
Defining a new uuid.UUID type:
type ID uuid.UUID
Non-UUID values are impossible Must reimplement json.Marshaler, json.Unmarshaler, and fmt.Stringer
Aliasing uuid.UUID:
type ID = uuid.UUID
Non-UUID values are impossible
Passes tests out of the box
Dangerous compatibility!

You can find implementations and tests for all the approaches described here on GitHub. The root testfile demonstrates each solution satisfies the serialization properties; swap widget imports to run the same tests against each. Two more parts to note: