This is entirely my advice, but I realized it while contracting
for Alpaca —
special thanks
to that team for their patience and good ideas.
Suppose you’re applying Safety Through Incompatibility in Widget Corp.’s widget fulfillment system. You want compile-time incompatible ID types for different collections, but you don’t have the luxury of starting from scratch: Widget Corp made the sensible1 decision to use random UUIDs for all their canonical identifiers years ago, and has thousands of those identifiers in the wild. What’s the lightest-touch way to bring that living system up to speed?
Whereas my post on “Safety Through Incompatibility” was largely
language-agnostic, this one will focus on a language-specific example:
switching to UUID-based typed IDs in Go, using Google’s uuid.UUID
.
There are several ways to do the same thing — or so it seems — and I
found scant guidance online on how to choose between them!
You already use UUIDs as IDs. You generate IDs for new records, but
also exchange records with the outside world: customers order widgets
through a JSON API, and you record widget IDs in your logs for debugging
and accounting. Thus, in addition to the intrinsic safety of type
incompatibility (that a Widget
ID be unusable where an
Order
ID is needed), you want three things from your ID
type:
In JSON, your IDs should json.Marshal
to the
standard UUID format: a hexadecimal JSON string like
"f3f2e850-b5d4-11ef-ac7e-96584d5248b2"
.
They should have the same format in your plaintext logs.2
json.Unmarshal
should extract IDs from the same
format, e.g. when you parse an API request body.
I capture these requirements in this file of tests.
string
typeIf you’re using raw string
s to represent your UUID IDs,
switching to a new type defined by string
is dead-simple:
serialization works out of the box.
type ID string
Update the places you construct IDs — somemthing like
widget.ID(uuid.New().String())
— and boom, the
widget fulfillment system has a dedicated widget ID type.
The downside is a different aspect of ID safety: not
incompatibility, but validity. You might be really confident
all your IDs are UUIDs, but (as for the raw string
type you
started with) there’s no type guarantee they always will be. Maybe you
have a test that pretends "MY-PET-ID"
is a valid UUID.
Maybe you’ll get nonsense data over the wire. You can call
uuid.Parse
to validate your strings, but that’s a secondary
process — you have to remember to do it — and one you can’t
quite push to the system boundary.
It’s easier to call a spade a spade: just use uuid.UUID
for UUIDs.
uuid.UUID
type ID struct { uuid.UUID }
This exemplifies a Go technique called Struct Embedding. ID
is a struct
— Go’s basic composite data type — with a
single, anonymous component: a UUID.
Like aliasing string
, serialization just works.
Unlike aliasing string
, UUID validity is enforced at the
system edge: json.Unmarshal
will error if you provide a
string that’s too long, or too short, or includes UUID-incompatible
characters.
To tell you the truth, my problems with a struct-embedding ID like
this are all nitpicky. For one, the underlying uuid.UUID
is
a public field; you can access it (e.g. access
widgetID.UUID
). Strictly speaking, that kind of conversion
back into a uuid.UUID
is possible for all the
approaches I describe here, but it’s a conversion to generally
discourage: it’s a leaky abstraction, and it allows comparing
incompatible UUID-derived IDs as if they were plain
uuid.UUID
s. Why dangle a public field if you don’t want
anyone to use it?
Moreover — and this is really a matter of taste — the
construction ID{uuid.New()}
just feels wrong. Are there more (unset) fields in the
ID
struct? Single-field struct embedding uses a nonatomic
composite data type to represent a conceptually atomic
identifier.
uuid.UUID
typetype ID uuid.UUID
Like the type ID string
declaration considered earlier,
type definition creates a wholly new type — ID
— with the
same memory layout as uuid.UUID
. In fact,
uuid.UUID
itself is defined through a construction
like this: type uuid.UUID [16]byte
.
Unlike in a public struct-embedding, the underlying type isn’t
exposed.
The main draback here is the serialization. Out of the box, this
thing serializes like a [16]byte
: expect JSON like
"[243,242,232,80,181,212,17,239,172,126,150,88,77,82,72,178]"
instead of anything recognizable.
You solve this by explicitly reimplementing some common single-method
interfaces already implemented by uuid.UUID
, essentially by
invoking those UUID implementations.
func (id ID) String() string { ... }
fmt.Stringer
.
func (id ID) MarshalJSON() ([]byte, error) { ... }
json.Marshaler
.
func (id *ID) UnmarshalJSON(data []byte) error { ... }
json.Unmarshaler
.
id
without dereferencing
it. Here’s
my implementation.
These implementations take a little more effort on your part to make
this approach pass the tests, and the implementations themselves are
dense little typesystem tongue-twisters. Defining a new
uuid.UUID
type is less sloppy than struct embedding, but
maybe we can find something less verbose?
uuid.UUID
type ID = uuid.UUID
There’s a subtle difference between declaring a new type using
another — as in the type ID uuid.UUID
example above — and
aliasing a type in Go. Syntactically, the difference is small:
the =
in the alias declaration. In practice, the difference
is much bigger.
An aliased type (ID
) preserves the receiver
methods defined on the original type (uuid.UUID
)! In this
case, that includes the existing standard interface implementations
MarshalJSON
, UnmarshalJSON
, and
String
Google ships. Not only does aliasing
uuid.UUID
give you built-in validation (unlike defining a
new string
type); it doesn’t expose a misleading inner
field (unlike struct-embedding uuid.UUID
); and it
serializes as expected out of the box (unlike defining a new
uuid.UUID
type) because it automatically implements the
necessary interfaces!
Where’s the catch? Spot it in this passage from the Go Blog’s intro to alias declarations:
In contrast to a regular type definition
type T T0
which declares a new type that is never identical to the type on the right-hand side of the declaration, an alias declaration
type A = T // the "=" indicates an alias declaration
declares only a new name
A
for the type on the right-hand side: here,A
andT
denote the same and thus identical typeT
.Alias declarations make it possible to provide a new name (in a new package!) for a given type while retaining type identity:
package pkg2 import "path/to/pkg1" type T = pkg1.T
The type name has changed from
pkg1.T
topkg2.T
but values of typepkg2.T
have the same type as variables of typepkg1.T
.
In simpler terms, a type alias is not a new type — it’s
entirely compatible with the old one, and with other aliases of the
same original type. Remember the original mission: safety through
incompatibility! If you can interchange your widget.ID
s and
order.ID
s, you haven’t made anything safer.
I introduced these approaches naïvely as four equivalent options.
Divide them instead into two groups: the first, a group of one, is built
on fundamentally the wrong type (string
); the other
options, based on uuid.UUID
, have more promise.
The type alias is fool’s gold: it’s syntactically minimal and it passes our tests, but it doesn’t prevent interoperability.
Prefer one of the remaining two. If you’re pressed for time or
worried about introducing something that looks complex, the
struct embedding is a minimal drop-in replacement for raw
string
s.
In a mature codebase, I prefer defining a new
uuid.UUID
type (type ID uuid.UUID
)
and implementing the serialization interfaces yourself. These are
verbose, but you can copy-paste my implementations for starters. You
might also take advantage of the custom implementations — to prefix your
logged UUIDs with a type identifier, say, so widget
f3f2e850-b5d4-11ef-ac7e-96584d5248b2
appears more
unambiguously as
Widget_f3f2e850-b5d4-11ef-ac7e-96584d5248b2
.
Approach | Pros | Cons |
---|---|---|
Defining a new string type:
type ID string |
Passes tests out of the box | No protections against non-UUID values |
Embedding uuid.UUID : type ID struct { uuid.UUID } |
Non-UUID values are impossible Passes tests out of the box |
Exposes the embedded
uuid.UUID implementation |
Defining a new uuid.UUID
type: type ID uuid.UUID |
Non-UUID values are impossible | Must reimplement
json.Marshaler , json.Unmarshaler , and
fmt.Stringer |
Aliasing uuid.UUID : type ID = uuid.UUID |
Non-UUID values are impossible Passes tests out of the box |
Dangerous compatibility! |
You can find implementations and tests for all the approaches
described here on GitHub. The
root testfile demonstrates each solution satisfies the serialization
properties; swap widget
imports to run the same tests
against each. Two more parts to note:
uuid_alias_test.go demonstrates the type-compatibility — and, therefore, the danger — of true type-aliased IDs.
pkg/generic_uuid_type
is a generic adaptation of the verbose type ID uuid.UUID
approach I prefer. It implements the interfaces once (the generic
T
provides incompatibility) for any number of
collections.