Misk Distributed Leases & Leader Election: Coordinating a Cluster


Series: Building Production Services with Misk — Part 20 of 24

You run three replicas of your service so a deploy or a dead node doesn’t take you down. Then someone adds a nightly reconciliation job, or a queue drainer, or a cache warmer — and now it runs three times, concurrently, racing itself. The fix is not “run one replica.” The fix is exactly one replica should do this thing at a time, while the other two stay ready to take over. That’s leader election, and a misk distributed lease is how you get it: a named, cluster-wide, time-bounded claim that — most of the time — exactly one instance holds. This post is about the Lease abstraction, the backends behind it, and the distributed-systems fine print that the word “lease” (rather than “lock”) is quietly warning you about.

What a lease actually is

A lease is a lock with an expiry. You ask the cluster for the lease named nightly-reconcile; if nobody holds it, you get it — but only until held_until, a timestamp some minutes in the future. You don’t hold it forever, and you don’t hold it until you explicitly release — you hold it until the clock runs out, and you’re expected to keep re-acquiring while you still want it.

That expiry is the entire point. A plain mutual-exclusion lock has a fatal failure mode in a distributed system: the holder dies — OOM, kill -9, network partition — and never releases. Now the resource is locked forever and your job never runs again. A lease sidesteps this by construction: the dead holder’s claim simply expires, and a survivor acquires it on the next attempt. Nobody has to detect the death or clean up; the timeout does it.

Leader election falls out of this for free. “The leader” is just “whoever currently holds the lease named leader.” Run the singleton work behind a lease check, have every replica try to acquire on a schedule, and the cluster self-organizes — one leader, automatic failover, no ZooKeeper to babysit. Misk’s contribution here isn’t a novel consensus algorithm — it’s a clean, swappable interface so your code doesn’t care whether the lease is backed by MySQL, by region weight, or by a fake in tests.

The Lease API

Two interfaces, both living in the wisp.lease package (Wisp is Misk’s framework-agnostic core; misk-lease adapts it to Guice). You inject a LeaseManager and ask it for leases by name:

interface LeaseManager {
  fun requestLease(name: String): Lease
  fun releaseAll() {}
}

requestLease registers interest in a named lease and hands back a Lease handle. It does not acquire anything yet — that’s deliberate, and the doc comment is explicit that registering interest gives the backend a chance to set up bookkeeping (or to decide, via consistent hashing, that this instance shouldn’t own the lease at all). The Lease itself is where the work happens:

interface Lease {
  val name: String

  fun shouldHold(): Boolean   // should THIS instance own it? (no network, no blocking)
  fun isHeld(): Boolean       // does this instance currently own it? (no network, no blocking)

  fun acquire(): Boolean      // try to take the lease; true if we now hold it
  fun release(): Boolean      // give it up; true if we held it and released
  fun release(lazy: Boolean): Boolean

  fun addListener(listener: StateChangeListener)

  interface StateChangeListener {
    fun afterAcquire(lease: Lease)
    fun beforeRelease(lease: Lease)
  }
}

Three things worth flagging, because they’re easy to get wrong:

  • acquire() returns a Boolean and you must check it. It is not “block until I’m the leader.” It’s “try right now; tell me if I won.” Ignoring the return value is the single most common lease bug — you proceed to do leader-only work whether or not you’re actually the leader.
  • isHeld() and shouldHold() promise not to make network calls or block. They’re cheap, local checks you can call on a hot path. acquire(), release(), and the older checkHeld() may hit the network — the Lease doc comment says so directly — so treat those as I/O.
  • checkHeld() and checkHeldElsewhere() are deprecated. They still work (the exemplar uses checkHeld()), but the interface now nudges you toward isHeld() precisely to avoid the surprise network call. New code should prefer isHeld().

There’s also an acquire(options: AcquireOptions) overload with a WaitMode (DoNotWait, WaitForLeaseDuration, WaitUpTo(timeout)). Be warned: the default acquire() is non-blocking, and backends that don’t support waiting will throw UnsupportedOperationException rather than silently degrade — unless you pass unsupportedWaitBehavior = FallbackToNonBlocking. The SQL backend, as we’ll see, does not implement real waiting.

For the common “do it if I’m the leader, then let go” pattern, there’s an AutoCloseable extension so you can lean on use {}:

leaseManager.acquireOrNull("nightly-reconcile")?.use { lease ->
  // we hold the lease in here; it's released when the block exits
  reconcile()
}

A worked example

The exemplar service ships two endpoints that are about as small as a lease demo can get — and they’re the honest baseline, so let’s read them. Acquiring:

@Singleton
class LeaseAcquireWebAction @Inject constructor(
  private val leaseManager: LeaseManager,
) : WebAction {
  @Get("/lease-acquire/{name}")
  @Unauthenticated
  @ResponseContentType(MediaTypes.TEXT_PLAIN_UTF8)
  fun acquire(@PathParam name: String): LeaseAcquireResponse {
    val lease = leaseManager.requestLease(name)
    val acquired = lease.acquire()
    return LeaseAcquireResponse(name = name, acquired = acquired)
  }

  data class LeaseAcquireResponse(val name: String, val acquired: Boolean)
}

And checking, from a different request (likely a different replica behind the load balancer):

@Singleton
class LeaseCheckWebAction @Inject constructor(
  private val leaseManager: LeaseManager,
) : WebAction {
  @Get("/lease-check/{name}")
  @Unauthenticated
  fun check(@PathParam name: String): LeaseCheckResponse {
    val lease = leaseManager.requestLease(name)
    val held = lease.checkHeld()
    return LeaseCheckResponse(name = name, held = held)
  }
}

The shape is the whole lesson: requestLease(name) then act on the handle. Now the part the toy example doesn’t show but production demands — you don’t acquire once, you acquire repeatedly, because the lease expires. Real leader-elected work looks more like a recurring task (Part 19’s misk-cron or a repeated task) that re-checks every interval:

// runs on every replica, on a schedule shorter than the lease duration
fun tick() {
  val lease = leaseManager.requestLease("nightly-reconcile")
  if (lease.acquire()) {
    runReconciliationBatch()   // only the current leader gets here
  }
}

Run that every minute on all three replicas with the SQL backend’s five-minute default lease, and exactly one replica keeps winning acquire() and doing the work; if it dies, within five minutes its lease expires and another replica’s acquire() starts returning true. That’s leader election with no extra moving parts.

The backends: SQL and cluster-aware

The LeaseManager interface is one thing; which implementation you bind decides what “distributed” means. Misk ships two production-shaped options, and they’re philosophically different.

misk-lease-mysqlSqlLeaseManager. A genuine first-come-first-serve distributed lease backed by a single MySQL row per lease, guarded by optimistic concurrency. The table is just (lease_name, version, held_until), and the acquire is an atomic conditional UPDATE:

-- acquire only if nobody bumped the version since we read it
UPDATE leases
SET held_until = :held_until, version = :version
WHERE lease_name = :lease_name AND version = :current_version;

The manager reads the current row, and if the lease is expired (now > held_until) it tries the UPDATE with the incremented version. Exactly one racing replica’s UPDATE affects a row; the losers see updatedRows == 0 and get a not-held handle back. New leases are an INSERT that races on the unique key. Release is a DELETE matching (lease_name, version), so you can only release the version you actually hold. It’s a clean, boring, correct design — a database row’s contention as a coordination primitive, nothing more exotic. The default lease duration is five minutes (Duration.ofSeconds(300)), and requestLease does the acquisition logic eagerly. Wiring it is one module, fed your JDBC config:

SqlLeaseModule(config.data_source_clusters)   // installed in ExemplarService

Note SqlLeaseModule is annotated @ExperimentalMiskApi — it works, the exemplar ships it, but the surface may shift. And its acquire(options) does not support real waiting: a WaitForLeaseDuration or WaitUpTo request throws UnsupportedOperationException unless you opt into non-blocking fallback.

misk-clusteringClusterAwareLeaseManager. This one is not first-come-first-serve at all, and the distinction matters enormously. Its Lease is essentially a no-op that returns true from acquire(), isHeld(), and shouldHold() if and only if the app is running in the active region — it just consults a ClusterWeightProvider and checks the weight is non-zero. Read its own doc comment: it’s “suitable for situations where lease injection is necessary but not functionally important, such as in Misk SQS.” It implements both LeaderLeaseManager and LoadBalancedLeaseManager marker interfaces, but it provides region-level gating, not single-instance mutual exclusion. Every replica in the active region will happily believe it holds the lease. If you wire this expecting one leader and get N-per-region, that’s not a bug — it’s the contract, and you read it wrong.

That’s the backdrop to misk-clustering proper: it models cluster membership. The Cluster interface exposes a Snapshot of readyMembers, your own self member, and a ClusterResourceMapper (hash-ring or rendezvous hashing) so resources can be deterministically assigned to instances — which is exactly the consistent-hashing mechanism shouldHold() was designed to express. The DynamoDB implementation (misk-clustering-dynamodb) is the membership registry: per its README, each instance writes a heartbeat row every ~30 seconds and reads back all rows fresh within ~60, using a DynamoDB TTL attribute (expires_at) to evict the dead. The table is tiny:

name = "misk-cluster-members"
kind = "dynamodbv2"
hash_key = "name"
ttl_attribute = "expires_at"

So the mental model: misk-clustering answers “who’s in the cluster right now and which resources are mine?”; misk-lease-mysql answers “let me atomically claim this named resource for the next few minutes.” They solve adjacent problems, and the right one depends on whether you need true single-holder exclusion (SQL) or region-active gating over a known membership (clustering).

For tests, FakeLeaseManager (in misk.clustering.fake.lease) gives you a lease that’s held by default, with markLeaseHeld(name) and markLeaseHeldElsewhere(name) to script either side of an election deterministically — no database, no clock games. There are FakeLeaderLeaseManager and FakeLoadBalancedLeaseManager variants tagged with the matching marker interfaces.

Going deeper: a lease is not a lock guarantee

Here is the uncomfortable truth the literature has shouted for years and the name lease is trying to tell you: a time-bounded lease cannot give you mutual exclusion across processes, only across honest, punctual processes. If your leader-only work has a correctness requirement that two instances must never run it simultaneously, a lease alone will not get you there. Reason it through:

  • Leases are time-bounded. You acquired nightly-reconcile until held_until. Your batch takes longer than you thought — a slow query, a GC pause, a paused container — the lease expires while you’re still working. Another replica’s acquire() now legitimately succeeds. For a window, two instances both believe they’re the leader and both run the work. The lease did exactly what it promised; your assumption that holding-then-doing-work is atomic was the mistake.
  • Clock skew is real and it’s load-bearing. held_until is a timestamp, and isHeld() compares it against some clock. The SQL backend computes expiry against a java.time.Clock and the holder checks expiry against its own clock. If two machines disagree about “now” by tens of seconds — which happens, NTP is not magic — one can think a lease is live while another thinks it’s expired. Every lease system inherits the clock-sync assumptions of its substrate.
  • A successful acquire() is a statement about the past. By the time the boolean comes back true, it describes the moment the UPDATE committed, not the moment your next line of code runs. The window between “I won the lease” and “I act on it” is where split-brain lives.

The honest framing: leases make duplicate work rare and self-healing, not impossible. That’s the right tool when the singleton work is idempotent or merely wasteful-if-doubled (cache warming, metrics rollups, draining a queue with at-least-once semantics anyway). It is the wrong tool, used alone, when double execution corrupts data. For that you need the work itself to be safe under concurrency — a fencing token, a conditional write, an idempotency key — so that the lease is an optimization that reduces contention, not the sole thing standing between you and a double-charge. Design the work to tolerate two leaders for a few seconds, and the lease’s weaknesses stop mattering.

Production notes & gotchas

  • Always check acquire()’s return value. It’s a Boolean, not a void “become leader” call. The framework cannot stop you from doing leader work after a false; only you can.
  • Re-acquire on a schedule shorter than the lease duration. With the SQL backend’s five-minute default, a job that runs once and assumes it’s leader forever is wrong twice over: it’ll lose leadership silently after expiry, and it won’t fail over promptly. Tick well inside the window.
  • Don’t confuse the cluster-aware lease with single-instance exclusion. ClusterAwareLease gates on active region, not one holder — every replica in the active region “holds” it. If you need exactly-one, you want the SQL backend (or another true FCFS implementation), full stop.
  • SqlLeaseModule is @ExperimentalMiskApi. Pin your Misk version and read the changelog before upgrading; the module surface is explicitly not frozen.
  • acquire(options) with a wait mode throws on the SQL backend. It doesn’t block-and-retry for you. If you want waiting semantics, pass UnsupportedWaitBehavior.FallbackToNonBlocking (and accept it’s just a non-blocking try) or build the retry loop yourself.
  • Keep isHeld()/shouldHold() on the hot path and acquire()/release() off it. The interface contract says the former don’t touch the network and the latter may. Calling acquire() in a tight loop hammers your MySQL row with contended UPDATEs.
  • Release explicitly on graceful shutdown. Expiry is the safety net for crashes, not an excuse to skip cleanup. releaseAll() exists on LeaseManager, and the framework’s own LeaseService releases on shutDown() — let a stopping replica hand off leadership in seconds instead of making the cluster wait out the full lease.

What’s next

Leases, like everything in Misk, are driven by config — the SQL backend needs a DataSourceClustersConfig, the Dynamo cluster needs a table name and heartbeat intervals, and all of it has to vary cleanly between local, staging, and production. In Part 21: Misk Configuration & Environments we’ll get into how Misk loads, layers, and validates configuration across environments — the unglamorous plumbing that everything else in this series quietly depends on.


Target keywords: misk distributed lease, misk leader election.

Comments