Logger
I first drafted this post in December 2022. I stopped following progress in Elixir shortly after; maybe its logging has completely changed! That’s besides the point: this is a post about control and cognitive load, specifically with process-scoped global variables. Even if the examples go stale, the principle holds.
Consider this Elixir function:
def log_and_operate(trace_id) do
Logger.metadata(trace_id: trace_id)
Logger.info("Reached pre-operation")
()
operateLogger.info("Reached post-operation")
end
Setting a trace_id
in your structured logger should
contextualize one log entry by locating it in a stream of other log
entries with the same key. You include it because you want to fetch your
logs later, search for something like
trace_id:"correct-horse"
, and see logs like this:
{trace_id: "correct-horse", message: "Reached pre-operation"}
# ...logs from operate()
{trace_id: "correct-horse", message: "Reached post-operation"}
But, in Elixir1, operate()
has free
access to all the logger metadata, including the
trace_id
field doing the correlating! If it so chooses,
it’ll break your trace_id
connection between the pre- and
post-operation log lines.
def operate() do
Logger.metadata(trace_id: "battery-staple")
Logger.info("Changed the trace ID")
end
{trace_id: "correct-horse", message: "Reached pre-operation"}
{trace_id: "battery-staple", message: "Changed the trace ID"}
{trace_id: "battery-staple", message: "Reached post-operation"}
Of course, that’s a reductive example. In practice, your
trace_id
has to survive a huge tree of
conditionally-connected codepaths, any of which could trample
your precious log metadata. You have to read a lot of code to
feel confident that won’t happen.
Logger
metadata is a process-scoped variable, so it’s
intrinsically tied to what happens in that process, and in what
order (“control”2). Even if a function with
Logger
calls doesn’t receive any logger-specific arguments
— that is, it doesn’t explicitly depend on logging elsewhere — its
behavior can implicitly depend on all sequentially prior
operations in the same process. If you imagine the process’s execution
as the depth-first traversal of a tree of function calls, this means
knowing all the previously-visited vertices, not just the lexical
parents of the current vertex.
Elixir’s standard Logger
does what Peter Bourgon calls
magic:
it obscures a complex process-global data-interdependency.
Unfortunately, ruling out influence is an indispensable part of reading
code. When influence extends to a whole process, and when your program
is anything but strictly non-recursive, you can’t just treat code
lexically: you have to mentally model the process execution as you read
it.
One work-around is to, for some particularly sensitive metadata keys, defer to those already on the logger:
# safe_set_trace_id is a dedicated setter for a single Logger metadata field.
def safe_set_trace_id(trace_id) do
if !Logger.metadata(:trace_id) do
Logger.metadata(trace_id: trace_id)
end
end
# protected_metadata and safe_set_metadata protect a static set of metadata
# keys.
@protected_metadata [:trace_id, :others]
def safe_set_metadata(opts) do
opts|> Enum.each(fn {key, value} ->
if !Enum.member(@protected_metadata, key) || !Logger.metadata(key) do
Logger.metadata(key, value)
end)
end
This requires either sharing a canonical set of precious metadata
keys, then enforcing that every Logger
call checks the
list, or writing a module wrapping Logger
and forbidding
direct Logger
calls.
Alternatively, you can enforce that keys for correlation be unchanged by redundantly(?) re-setting them:
def log_and_operate(trace_id) do
Logger.metadata(trace_id: trace_id)
Logger.info("Reached pre-operation")
()
operateLogger.info("Reached post-operation", trace_id: trace_id)
end
The Go approach is to relay the updates through return values: an
updated logger or context.Context
, which callers can take
or leave selectively.
In Go convention, instead of having process-scoped logger metadata, loggers (with metadata) are objects. Of course, you commonly want to inherit some logger metadata; that’s why you see code building on a service-wide base logger:
func (s *Server) Operate() {
.Logger.With("key", "value")
s}
Extracting a logger from a received context:
func Operate(ctx context.Context) {
.From(ctx).With("key", "value")
log}
Or receiving a logger from the caller, then forking it — extending the received logger’s metadata in a local copy:
func Operate(logger *zap.SugaredLogger) {
var value string
= logger.With("key", "value")
logger }
Even in Go, sometimes you do want to modify a logger for the caller scope. How do you do this without process-scoped magic? Return values that let the caller decide what to do with their own logger. Admittedly, this can feel a little clumsy — threading contextual data up and down the callstack resists later refactoring, and including an updated logger among return values feels even weirder. Regardless, at least the Go approach is explicit.
That’s one of my main complaints about Elixir. Every function call implicitly delegates — even just briefly — control of the caller’s logger. If you want to make any strong guarantees about the logging behavior of any given function, you have to know
And I don’t want to know that much! Excluding swaths of code from consideration is essential to being productive in a big, shared codebase (and especially for doing the kind of parachute-in spot debugging routinely required in a live product).
My point isn’t that you should pick Go3, but that you should spurn ‘magic,’ particularly the “comprehension-subverting magic of globals” (Bourgon), a point on which the Go community is pretty disciplined. Think less, get more done; better to be ugly and explicit than opaque.