Misk Observability: Prometheus Metrics, Structured Logging, and Tracing

Series: Building Production Services with Misk — Part 22 of 24

Operating a service is the act of seeing inside it while it runs, and a service you can’t see inside is a service you’re operating by superstition. The good news is that misk observability isn’t something you bolt on at 3am during your first incident — a lot of it is already wired. Back in Part 7 we walked the interceptor chain, and two of those interceptors exist purely to watch: one times every action and shovels the result into a histogram, the other opens a tracing span around the request. You get request-rate, latency, and traces for every web action without writing a line. This post is about the rest — the metrics you emit, the logs you structure, the noise you sample away, and the spans you nest by hand. We’ll lean on misk metrics prometheus as the spine, then fold in logging, sampling, and tracing.

Metrics: a thin abstraction over Prometheus

Misk’s metrics interface is deliberately small. The README says it plainly: it’s “a partial abstraction over Prometheus metrics,” with two roles — producers (your application code, depending on misk-metrics) and backends (infrastructure code, and the only real one is Prometheus). You inject Metrics, you create instruments, you observe them. There’s no proprietary metric model — no Misk vocabulary to learn — sitting between you and the time series.

Here’s the shape of it — verbatim from how it’s meant to be used:

class MyService @Inject constructor(private val metrics: Metrics) {
  private val operations = metrics.counter(
    name = "my_labeled_counter",
    help = "Counts operations by type and status",
    labelNames = listOf("type", "status"),
  )

  private val duration = metrics.histogram(
    name = "my_operation_duration",
    help = "Measures the duration of operations in milliseconds",
    labelNames = listOf("operation_type"),
  )

  fun performOperation(type: String) {
    operations.labels(type, "success").inc()

    val start = System.currentTimeMillis()
    try {
      // do work
    } finally {
      duration.labels(type).observe((System.currentTimeMillis() - start).toDouble())
    }
  }
}

The three primitives are counter, gauge, and histogram — and the return types aren’t Misk wrappers — they’re the genuine io.prometheus.client.Counter, Gauge, and Histogram. That’s the whole point of “partial abstraction”: Misk gives you a clean metrics.counter(...) factory, then gets out of the way and hands you the real Prometheus object, labels and all. You call .labels(...).inc() or .labels(...).observe(...) exactly as the Prometheus Java client documents, because it is the Prometheus Java client.

There are a couple of less-obvious instruments worth knowing. peakGauge registers a gauge that resets to its initial value after each scrape — useful for “max concurrency seen since the last collection” without you having to track the high-water mark yourself. providedGauge registers a gauge whose value is pulled from a provider at collection time, which is the right tool when the number already lives somewhere (a queue depth, a cache size) and you don’t want to push updates on every change. And summary exists if you genuinely need client-side quantiles — but the kdoc is blunt that summaries “can be an order of magnitude more expensive than histograms in terms of CPU,” so reach for histogram unless you have a specific reason not to.

The V1/V2 thing you will trip over

This is the migration that matters, and it’s a sharp edge, so let’s be honest about it. There are two Metrics interfaces in the tree: misk.metrics.Metrics (V1) and misk.metrics.v2.Metrics (V2). V1 is @Deprecated with a ReplaceWith pointing at V2. You want V2 — full stop.

The trap is the word histogram. In V1, the method called histogram(...) does not create a Prometheus histogram — it creates a Summary. The kdoc admits it directly: “For legacy reasons this function is called histogram(…) but it’s not backed by a histogram because of issues with the previous time series backend.” V2 fixes the naming — v2.histogram(...) returns a real Histogram — but that fix is not backwards compatible. From the V2 source:

misk.metrics.v2.Metrics is NOT backward compatible with misk.metrics.Metrics. This is because the metric type of the histogram(...) function has changed. … the dashboards and monitors based on the metric will break because the data type of the metric will have changed.

So you can’t just flip the import. The official migration (docs/migrations/misk-metrics.md) ships an OpenRewrite recipe that does two things: it renames your existing histogram(..) calls to legacyHistogram(..) (preserving the Summary, so your dashboards survive), then changes the type from misk.metrics.Metrics to misk.metrics.v2.Metrics. The net is that your old metrics keep their old (summary) shape under a new name, and any new histogram(...) calls get real histograms. If you want a metric to actually become a histogram, the kdoc is explicit: you have to give it a different name, because the time series database can’t reconcile the two data structures. Plan for a rename, not a flip.

One thing to flag plainly: as of this writing the migration story is V1 → V2, and both are Prometheus-backed. There is no Micrometer-based metrics backend in Misk’s metrics modules — Micrometer’s MeterRegistry shows up in peripheral places like the executor-service factory, not as a metrics backend you’d choose. If you’ve heard “Misk is moving to Micrometer,” treat that as direction-of-travel, not something you wire today.

The backend: misk-prometheus

Producing metrics is half the story; exposing them is the other half. The misk-prometheus module is the backend. Installing the metrics module binds a CollectorRegistry and both Metrics interfaces as singletons, all backed by that one registry. The PrometheusHttpService then exposes a scrape endpoint, configured by PrometheusConfig:

data class PrometheusConfig(
  val hostname: String? = null,
  val http_port: Int = 9102,
  val max_age_in_seconds: Long? = null,
  val disable_default_summary_metrics: Boolean = false,
)

So metrics come out on port 9102 by default, on a separate listener from your application traffic — which is exactly what you want, because your Prometheus scraper shouldn’t be hitting the same port as your customers. disable_default_summary_metrics is the knob that lets you stop emitting the legacy summary version of a metric once its histogram counterpart exists, so you’re not paying for both.

Datadog, briefly

If your shop runs Datadog rather than a Prometheus server, you don’t throw any of this away. The standard pattern is to scrape the Prometheus 9102 endpoint with the Datadog Agent’s OpenMetrics/Prometheus check and forward from there — the producer code is identical, the abstraction is the same, only the collector changes. Where Datadog does show up first-class in Misk is tracing, via misk-datadog, which we’ll get to below. The takeaway: metrics are vendor-neutral by construction, and “switching to Datadog” is an infra decision, not a code rewrite.

Structured logging: tags, not string concatenation

misk-logging extends the SLF4J API, and its one strong opinion is that log context belongs in structured fields — not jammed into the message string. You get a logger the boring way:

import misk.logging.getLogger

class MyAction {
  companion object {
    private val logger = getLogger<MyAction>()
  }
}

getLogger<T>() is a reified helper returning a KLogger (from kotlin-logging). The interesting part is the logging extensions, which take tags — Pair<String, Any?> — and a lazy message:

logger.info("orderToken" to order.token, "amount" to order.amount) {
  "processing order"
}

Under the hood those tags are pushed into SLF4J’s MDC for the duration of the call and then restored — so they land as structured fields in your JSON logs rather than being interpolated into the message. That same key/value lands in your log aggregator as a queryable field — not as a substring you have to regex out later. The message is a lambda, so the cost of building it is only paid if the level is enabled.

The genuinely clever piece is withSmartTags. Tags added inside a withSmartTags { ... } block are remembered on a thread-local so that if an exception escapes the block, the framework’s exception logger can re-attach those tags to the error log — even though, by the time the exception is logged, the MDC context has unwound. The README’s worked example is the tell: an action sets processValue and contextToken, calls into a client that throws, and the “unexpected error dispatching to ServiceAction” log — which would normally have no context — comes out carrying both tags. Misk already wires this into ExceptionHandlingInterceptor for web actions and into the SQS job consumer, so your unhandled errors get the surrounding business context for free. For anything else that consumes events (Kafka, scheduled tasks, Temporal), the docs point you at those two as the pattern to copy.

Sampling: turn down the volume without going silent

A log line — or a metric — that fires on every request through a hot path is a great way to set fire to your logging bill and bury the signal. misk-sampling is the answer, and it’s tiny: a Sampler interface with one method, sample(): Boolean, plus a sampledCall { } convenience that runs a lambda only when sample() returns true.

Three implementations, all reachable from the companion:

Sampler.always()              // every time — mostly for tests
Sampler.percentage(5)         // ~5% of calls
Sampler.rateLimiting(10)      // at most 10 positive samples per second

PercentSampler is probabilistic — it rolls a number and compares it to your percentage. RateLimitingSampler is a token-bucket via Guava’s Ticker, so it caps rate rather than proportion, which is what you usually want for a chatty error path: “tell me up to 10 of these per second, drop the rest.” Where it clicks neatly into logging is the .sampled(...) extension:

private val logger = getLogger<MyClass>().sampled(Sampler.rateLimiting(1))

That gives you a logger that logs at most once per second no matter how hard the path is hammered. The README is emphatic that a sampled logger “MUST be instantiated statically, in a companion object or as a Singleton” — and that’s not pedantry. Build a fresh RateLimitingSampler per request and every request gets its own bucket, every bucket has a token, and you’ve sampled exactly nothing while feeling clever about it.

Tracing: wisp-tracing over OpenTracing

Tracing in Misk is built on OpenTracing, and wisp-tracing is a thin set of convenience functions over the OpenTracing Java API. The everyday call is tracer.trace("span-name") { ... }, which opens a span and scope, runs your block, and closes both — no try/finally to forget:

import wisp.tracing.trace

tracer.trace("load-order") {
  doSomething()
}

Nesting trace calls implicitly creates parent/child spans, so a child operation inside a request span just shows up under it. There’s traceWithSpan { span -> } when you need the span itself — to set typed tags or baggage — and traceWithNewRootSpan(...) when you want a span to stand on its own rather than hang off the current request. The one footgun the README flags is concurrency: scopes are not thread-safe, so if you hand work to another thread you must open a fresh scope on that thread with withNewScope(span) before touching the trace. Cross a thread boundary without it and your spans go missing or attach to the wrong parent.

The backend is where Datadog re-enters. misk-datadog’s DatadogTracingBackendModule binds OpenTracing’s Tracer to the Datadog global tracer:

bind<Tracer>().toInstance(io.opentracing.util.GlobalTracer.get())

The crucial caveat lives in the module’s own comment: the real DDTracer is installed by the dd-java-agent before your main() runs. No -javaagent:dd-java-agent.jar on the JVM, and GlobalTracer.get() returns a no-op — tracing silently does nothing. The module also wires an MDCScopeListener so that the active trace and span IDs land in your MDC, which is what lets your logs and traces cross-link in the Datadog UI.

Production notes & gotchas

v2.Metrics, always. V1 is deprecated, and its histogram(...) is a Summary in disguise. Use misk.metrics.v2.Metrics, run the OpenRewrite recipe to migrate, and remember the rename-not-flip rule for any metric you want to actually become a histogram.
A histogram needs buckets. Misk’s default buckets assume milliseconds and span 1ms to 1hr. Great for latency, wrong for almost anything else. If you’re measuring payload sizes or counts, pass your own buckets — otherwise every observation piles into one bucket and your percentiles are fiction.
Sampled loggers must be static. A per-request Sampler has a per-request bucket, which means no sampling at all. Companion object or singleton, every time.
No agent, no traces. Datadog tracing depends on dd-java-agent being attached at JVM start. If traces are mysteriously empty in one environment, check the launch command before you touch any code.
Cardinality is the silent killer. Every distinct combination of label values is a separate time series. Putting a user ID, order token, or raw URL in labelNames will quietly melt your metrics backend. Labels are for bounded sets — status, type, region — not identifiers.
MDC is thread-local, so async loses it. Structured tags and trace context ride the MDC, which is per-thread. Dispatch to a coroutine or executor and the context doesn’t follow — re-establish tags with withTags/withSmartTags and open a new tracing scope on the new thread.

What’s next

We’ve covered seeing inside a running service. The companion skill is proving it behaves before it runs in front of customers — and a fake CollectorRegistry, a MockTracer, and assertions like collectorRegistry.summaryP99(...) are exactly the seams that make observable code testable. In Part 23: Misk Testing we’ll wire up @MiskTest, install fake fixtures, and turn “I think it works” into a green check.

Target keywords: misk metrics prometheus, misk observability.