« blog

‘Magic’ in Elixir’s Logger

I first drafted this post in December 2022. I stopped following progress in Elixir shortly after; maybe its logging has completely changed! That’s besides the point: this is a post about control and cognitive load, specifically with process-scoped global variables. Even if the examples go stale, the principle holds.

Consider this Elixir function:

def log_and_operate(trace_id) do
  Logger.metadata(trace_id: trace_id)
  Logger.info("Reached pre-operation")
  operate()
  Logger.info("Reached post-operation")
end

Setting a trace_id in your structured logger should contextualize one log entry by locating it in a stream of other log entries with the same key. You include it because you want to fetch your logs later, search for something like trace_id:"correct-horse", and see logs like this:

{trace_id: "correct-horse", message: "Reached pre-operation"}
# ...logs from operate()
{trace_id: "correct-horse", message: "Reached post-operation"}

But, in Elixir1, operate() has free access to all the logger metadata, including the trace_id field doing the correlating! If it so chooses, it’ll break your trace_id connection between the pre- and post-operation log lines.

def operate() do
  Logger.metadata(trace_id: "battery-staple")
  Logger.info("Changed the trace ID")
end
{trace_id: "correct-horse", message: "Reached pre-operation"}
{trace_id: "battery-staple", message: "Changed the trace ID"}
{trace_id: "battery-staple", message: "Reached post-operation"}

Of course, that’s a reductive example. In practice, your trace_id has to survive a huge tree of conditionally-connected codepaths, any of which could trample your precious log metadata. You have to read a lot of code to feel confident that won’t happen.

Logger metadata is a process-scoped variable, so it’s intrinsically tied to what happens in that process, and in what order (“control”2). Even if a function with Logger calls doesn’t receive any logger-specific arguments — that is, it doesn’t explicitly depend on logging elsewhere — its behavior can implicitly depend on all sequentially prior operations in the same process. If you imagine the process’s execution as the depth-first traversal of a tree of function calls, this means knowing all the previously-visited vertices, not just the lexical parents of the current vertex.

Elixir’s standard Logger does what Peter Bourgon calls magic: it obscures a complex process-global data-interdependency. Unfortunately, ruling out influence is an indispensable part of reading code. When influence extends to a whole process, and when your program is anything but strictly non-recursive, you can’t just treat code lexically: you have to mentally model the process execution as you read it.

Functional work-arounds

One work-around is to, for some particularly sensitive metadata keys, defer to those already on the logger:

# safe_set_trace_id is a dedicated setter for a single Logger metadata field.
def safe_set_trace_id(trace_id) do
  if !Logger.metadata(:trace_id) do
    Logger.metadata(trace_id: trace_id)
  end
end

# protected_metadata and safe_set_metadata protect a static set of metadata
# keys.
@protected_metadata [:trace_id, :others]
def safe_set_metadata(opts) do
  opts
  |> Enum.each(fn {key, value} ->
    if !Enum.member(@protected_metadata, key) || !Logger.metadata(key) do
      Logger.metadata(key, value)
  end)
end

This requires either sharing a canonical set of precious metadata keys, then enforcing that every Logger call checks the list, or writing a module wrapping Logger and forbidding direct Logger calls.

Alternatively, you can enforce that keys for correlation be unchanged by redundantly(?) re-setting them:

def log_and_operate(trace_id) do
  Logger.metadata(trace_id: trace_id)
  Logger.info("Reached pre-operation")
  operate()
  Logger.info("Reached post-operation", trace_id: trace_id)
end

Better: explicit sharing

The Go approach is to relay the updates through return values: an updated logger or context.Context, which callers can take or leave selectively.

In Go convention, instead of having process-scoped logger metadata, loggers (with metadata) are objects. Of course, you commonly want to inherit some logger metadata; that’s why you see code building on a service-wide base logger:

func (s *Server) Operate() {
  s.Logger.With("key", "value")
}

Extracting a logger from a received context:

func Operate(ctx context.Context) {
  log.From(ctx).With("key", "value")
}

Or receiving a logger from the caller, then forking it — extending the received logger’s metadata in a local copy:

func Operate(logger *zap.SugaredLogger) {
  var value string
  logger = logger.With("key", "value")
}

Even in Go, sometimes you do want to modify a logger for the caller scope. How do you do this without process-scoped magic? Return values that let the caller decide what to do with their own logger. Admittedly, this can feel a little clumsy — threading contextual data up and down the callstack resists later refactoring, and including an updated logger among return values feels even weirder. Regardless, at least the Go approach is explicit.

That’s one of my main complaints about Elixir. Every function call implicitly delegates — even just briefly — control of the caller’s logger. If you want to make any strong guarantees about the logging behavior of any given function, you have to know

  1. Everything that ran in the process before the function was called, if you want to make assumptions about the original state.
  2. Everything that runs between logging calls in that function, all the way down.

And I don’t want to know that much! Excluding swaths of code from consideration is essential to being productive in a big, shared codebase (and especially for doing the kind of parachute-in spot debugging routinely required in a live product).

My point isn’t that you should pick Go3, but that you should spurn ‘magic,’ particularly the “comprehension-subverting magic of globals” (Bourgon), a point on which the Go community is pretty disciplined. Think less, get more done; better to be ugly and explicit than opaque.