rota
Cert renewal for self-hosters. One Rust binary, CLI plus dashboard, pluggable CAs, registrars, and install targets.
Why this exists
Public TLS certificate lifetimes are getting shorter on a fixed schedule. CA/Browser Forum Ballot SC-081, adopted in April 2025, drops the maximum validity of publicly trusted certs from 397 days down to 47 days over four years:
| Effective | Max validity |
|---|---|
| 2026-03-15 | 200 days |
| 2027-03-15 | 100 days |
| 2029-03-15 | 47 days |
Apple championed the ballot. Their argument (Apple Platform Security) is that shorter validity reduces the damage window when a key gets compromised, encourages more automation, and reduces dependence on revocation infrastructure that has been historically unreliable.
Fair enough as far as it goes. Where it falls down for me:
- Revocation isn't broken so much as underfunded. OCSP stapling and CRLite are real and deployed.
- The historic CA failures (DigiNotar, Symantec, TrustCor) weren't validity failures. They were infrastructure compromises and policy failures. Shorter cert lifetimes wouldn't have helped.
- The cost falls hardest on small operators. Cloud customers absorb 47-day renewal automatically; air-gapped, embedded, IoT, and self-hosted setups pay the entire complexity tax. The rational response from a small operator is to hand DNS to a managed proxy and stop self-hosting. That's a power shift away from individuals running their own infrastructure.
I run my own stuff and I'd like to keep doing that. So I'm writing the tool I'd want.
What it does
Watches your CA-issued certs and knows when they're close to expiry. Generates fresh CSRs against persistent private keys you control. Submits reissue or renewal requests to the CA over that CA's API. Completes domain-control validation by writing TXT records at your registrar (DNS-01) or dropping a token under .well-known/acme-challenge/ for an existing webserver to serve (HTTP-01). Installs issued certs where they need to land: Synology DSM, plain filesystem, nginx reload, HAProxy runtime API hot-swap, Kubernetes Secret. Logs every step. Surfaces a real-time dashboard. Alerts before failures, not after.
Optionally federates across multiple rotad instances: one node renews, peers pick up the cert from a shared SurrealDB and install locally.
Who this is for
Operators who run their own webservers, mail servers, dashboards, hobby boxes, homelabs. People who want renewal automation without surrendering DNS or HTTPS termination to a managed proxy. Anyone who'd rather drop a single Rust binary on a box than wrangle a Python toolchain just to keep certs fresh.
If you're already happy with certbot, this isn't a replacement. It's a different tradeoff for the operator who wants pluggable backends and built-in operational surface (audit log, dashboard, alerts, metrics, federation) in one process.
License
Getting started
Stand up a single-node rota that renews one cert against Let's Encrypt via DNS-01, with the cert installed to the local filesystem.
Install
cargo install rota
That builds two binaries: rotad (the daemon) and rota (the CLI client that talks to the daemon over a UNIX socket).
Minimal config
Drop this at /etc/rota/rota.yaml:
daemon:
database_path: /var/lib/rota/rota.db
listen_addr: 127.0.0.1:7878
socket_path: /var/run/rota.sock
check_interval_seconds: 3600
renew_threshold_days: 30
acme:
directory_url: https://acme-v02.api.letsencrypt.org/directory
contact_email: ops@example.com
account_credentials_file: /etc/rota/secrets/acme-account.json
cloudflare:
api_token_file: /etc/rota/secrets/cloudflare.token
certs:
- id: example-public
description: example.com marketing site
domains: [example.com, www.example.com]
key_path: /var/lib/rota/keys/example.com.key
ca:
kind: acme
dcv:
kind: cloudflare
install:
kind: filesystem
directory: /etc/ssl/example
Three things to provision before starting rotad:
- Cloudflare API token at
/etc/rota/secrets/cloudflare.token. Scope:Zone.DNS:Editon every zone rota will publish DCV records in. rota only supports tokens, not the legacy Global API Key. - ACME account credentials file. The first run creates this automatically; just make sure the parent directory is writable.
- Private key directory (
/var/lib/rota/keys/) with0700mode. The first run also generates the per-cert key automatically. rota reuses the same key on every renewal so cert-pinning operators don't break.
First run
rotad --config /etc/rota/rota.yaml
The daemon opens the audit DB (SQLite by default), connects to the configured CAs, registrars / DCV solvers, and install backends, then sweeps every cert on the configured check_interval_seconds. The first sweep happens after one full interval, not immediately, so the daemon doesn't hammer the CA on startup. Any cert whose installed copy is closer to notAfter than renew_threshold_days gets renewed.
Talking to the daemon
rota status
Prints a one-line summary per cert: id, domains, days until expiry, last renewal status. The same data is on the dashboard at http://127.0.0.1:7878/.
rota renew example-public
Force a renewal regardless of expiry. Useful when you've just rotated DNS and want to confirm the pipeline end to end.
rota log example-public
Print the most recent renewal's audit trail (CSR generated, CA submitted, DCV published, cert issued, cert installed, DCV removed).
Where to go next
- Architecture overview. Four trait surfaces and how they compose.
- Configuration reference. Every field of
rota.yaml. - Backends. What ships today and what's coming.
- Federation runbook. Running multiple
rotadinstances with shared state.
Architecture overview
rota is one Rust binary running as a daemon, with two thin clients sharing its state.
rota CLI ──── socket ────▶ rotad (daemon)
scheduler, audit, API
HTTP + WS, SQLite or SurrealDB
Dashboard ──── HTTP ─────▶
(htmx + SSR) WS │ │ │ │
┌──────┘ │ │ └──────┐
▼ ▼ ▼ ▼
CABackend Dcv Install Alert
Backend Backend Backend
Namecheap Namecheap DSM Email (SMTP)
ACME Cloudflare Filesystem Webhook
Webroot nginx
HAProxy
Kubernetes
Daemon, CLI, and dashboard all build from one Cargo workspace and ship as a single binary. No Node, no Deno, no npm.
Four trait surfaces
The renewal pipeline composes generically across vendors. Adding support for a new CA, DCV strategy, install target, or alert sink is one trait impl, not a fork of the renewal logic.
CABackend
Issues certificates from a Certificate Authority. Two methods:
submit(domains, csr_pem, preferred_kinds). Submits a CSR. Returns one or moreDcvChallengevalues the caller must satisfy via the DCV backend.preferred_kindslets the caller hint at DNS-01 vs HTTP-01; CAs that offer a choice walk the list and pick.await_issuance(domains). Polls until the cert is signed.
Today: NamecheapCa (traditional reissue, DNS-01 only) and AcmeCa (RFC 8555: Let's Encrypt, ZeroSSL with EAB, BuyPass, any directory that speaks the spec).
DcvBackend
Solves the CA's domain-control challenge.
supported_kinds(). WhichChallengeKinds the backend can satisfy:Dns01,Http01.supports(challenge). Whether this specific challenge is satisfiable. Default impl matches againstsupported_kinds().publish(challenge). Make the response visible to the CA. Idempotent.remove(challenge). Clean up after issuance. Idempotent.
DcvChallenge is a tagged enum:
Dns01 { record_name, record_value, ttl }. TXT record atrecord_name. Solvers:NamecheapDcv,CloudflareDcv.Http01 { domain, token, key_authorization }.key_authorizationbody served athttp://<domain>/.well-known/acme-challenge/<token>. Solver:WebrootDcv(drops the file under a directory served by an existing webserver).
InstallBackend
Places the issued cert and chain where the system serving the domain can read them. Implementations may also trigger a service reload.
install(cert, private_key_pem, domains). Land the artifacts.current_cert_pem(cert_id). Read back the installed leaf cert for the scheduler's days-until-expiry calculation. Default returnsNone; backends opt in.
Today: DsmInstall (Synology), FilesystemInstall (mode-600 key + mode-644 cert + chain + fullchain), NginxInstall (filesystem + reload subprocess), HaproxyInstall (filesystem + runtime API hot-swap), K8sSecretInstall (server-side-apply a kubernetes.io/tls Secret).
AlertBackend
Daemon-wide notification sinks. Every renewal failure fans out to every configured sink.
dispatch(event). Deliver. Errors are logged but never break the renewal pipeline.
Today: EmailAlert (lettre, STARTTLS / implicit TLS / plaintext) and WebhookAlert (generic JSON envelope POST).
Renewer pipeline
For one cert, one renewal:
- Load (or generate) the persistent private key from
key_path. - Generate a CSR against that key.
ca.submit()returns DCV challenges.- Pre-flight check:
dcv.supports()for each challenge. Fast-fail if the configured solver can't handle what the CA returned. dcv.publish()every challenge.ca.await_issuance()waits for the CA to validate and sign.dcv.remove()cleans up. Runs unconditionally, even if issuance failed, so a partial run doesn't leave stray records.- Persist the issued cert and chain to the audit store (for cluster cert distribution).
install.install()writes locally.- The audit log records every step.
Scheduler
Ticks every check_interval_seconds. For each cert: read the install backend's current_cert_pem, parse notAfter, compare to renew_threshold_days, queue a renewal if due. A per-cert failure cooldown prevents a flaky CA from getting hammered every tick.
In a cluster, the scheduler's sweep is gated on cluster.is_leader(). Followers skip silently; the leader keeps doing the work.
Audit store
Every renewal opens a row, appends step events, and closes with a status. Two backends:
SqliteAuditStore(default). Single-file SQLite, no external service. Good for single-node deployments.SurrealAuditStore. Connects to an existing SurrealDB. Required for cluster federation: the lock and cert distribution rows live in the same database.
Cluster
When cluster.enabled = true and audit is SurrealDB, the daemon runs a SurrealClusterCoordinator that holds a lock at cluster_lock:singleton with a TTL refresh. The leader's renewer pipeline writes successful issuances to issued_cert rows. An InstallSyncTask on every node polls the table and runs the local install backend with the operator-pre-provisioned private key when the audit cert is fresher than what's installed locally. Private keys are never distributed through the audit store.
See the federation runbook for the operator-side walkthrough.
Backends
What ships today, what's coming, and the design choices behind each.
CA backends
Namecheap (traditional reissue)
kind: namecheap with ssl_id: <numeric SSL id>. Uses Namecheap's namecheap.ssl.reissue and namecheap.ssl.getInfo flow. Activation is one-shot: rota only handles reissue within an existing SSL subscription. First-time activation requires a long list of admin-contact fields rota does not model in the config. Operators activate once by hand in the Namecheap dashboard; rota handles every renewal after that.
DNS-01 only. The Namecheap reissue API folds every SAN under one DCV record, so rota's multi-challenge trait sees a single-element vec.
ACME (RFC 8555)
kind: acme. Speaks Let's Encrypt, ZeroSSL (with External Account Binding), BuyPass, and any directory that follows the spec. Uses the instant-acme crate.
rota manages its own persistent ECDSA key per cert because operators rely on key continuity for cert pinning. The ACME submit path uses finalize_csr(csr_der) so the operator's key stays canonical across renewals.
The ACME backend walks the configured DCV solver's supported_kinds() to pick a challenge type per authorization. So dcv: { kind: webroot } automatically gets HTTP-01; dcv: { kind: cloudflare } automatically gets DNS-01.
DCV backends
Namecheap DNS
kind: namecheap. DNS-01 via namecheap.domains.dns.{getHosts,setHosts}.
Watch out: Namecheap's setHosts is a full replacement of every record on the domain, not a per-record edit. Publishing one TXT therefore requires reading every existing record first, merging the new one in, and writing the merged set back. rota does this transparently.
Cloudflare DNS
kind: cloudflare. DNS-01 via Cloudflare's v4 API with Bearer-token auth. Token scopes: Zone.DNS:Edit on every zone rota will manage. rota does not support the legacy Global API Key.
Cloudflare's per-record edit API means rota doesn't have to read every record on the zone first. The flow: resolve the apex zone for the record name, look for an existing TXT match (idempotency), POST the record if absent. Removal mirrors the lookup-and-delete shape.
Webroot (HTTP-01)
kind: webroot with directory: <document root>. rota writes the key authorization to <directory>/.well-known/acme-challenge/<token> (mode 644) and removes it after issuance. The operator's existing webserver (nginx, Caddy, Apache, anything that serves static files over HTTP on port 80) is responsible for actually exposing the directory.
Why webroot rather than a daemon-internal listener: most self-hosters already run a webserver on 80 and 443. Asking rota to bind 80 means coordinating port handoff (or running rota as root) for one purpose: serving a five-byte file the existing webserver could serve in its sleep.
Defensive against malformed challenge tokens: rota refuses path-shaped tokens (/, \, .., empty) so a misbehaving CA can't traverse out of the challenge directory.
Install backends
Synology DSM
kind: dsm with description: <DSM panel label>. Uses synowebapi to install the cert into DSM's certificate store. The cert id surfaces in the DSM Control Panel under the configured description.
Filesystem
kind: filesystem with directory: <path>. Lays the issued cert, chain, and private key down under predictable filenames so any service that reads disk-based PEM (nginx, HAProxy, Caddy, custom Rust + rustls) can pick them up.
Filenames mirror the certbot convention so existing reload scripts that grep for fullchain.pem and privkey.pem work unchanged. Writes are atomic per file: each artifact goes to a sibling .tmp, fsync, rename.
nginx
kind: nginx with directory: <path> and optional reload_command: [<argv>]. Filesystem write plus an nginx reload subprocess.
Default reload is ["nginx", "-s", "reload"]. Operators on systemd typically override with ["systemctl", "reload", "nginx"] and a sudoers rule that keeps the daemon unprivileged. The reload runs without a shell wrapper, so argv entries are not interpreted: no globbing, no env interpolation. A non-zero exit surfaces as an Install error so the renewer records the failure on the audit log.
HAProxy
kind: haproxy with directory:, socket_path:, and cert_storage_name:. Filesystem write plus HAProxy runtime API hot-swap.
The runtime API sequence:
set ssl cert <storage_name> <<EOL
<leaf + chain + key bundle>
EOL
commit ssl cert <storage_name>
No reload, no dropped TCP connections. HAProxy hands the new certificate to live SNI lookups on the next handshake. Requires HAProxy 2.x or later with the admin socket exposed:
global
stats socket /run/haproxy/admin.sock mode 660 level admin
Kubernetes Secret
kind: k8s_secret with namespace:, secret_name:, and optional kubeconfig_path:. Server-side applies a kubernetes.io/tls Secret. Drop-in for Ingress, Gateway, and any controller that consumes the standard TLS Secret shape.
Auth resolution:
kubeconfig_pathomitted: in-cluster ServiceAccount (run rotad as a Pod).kubeconfig_pathset: load the named kubeconfig (run rotad outside the cluster).
Required RBAC on secrets in the target namespace: get, create, patch. Server-side apply with FieldManager "rota" so concurrent managers (cert-manager, helm, etc.) get clean conflict signaling rather than silent overwrites.
Alert backends
kind: email. Lettre-backed SMTP. Submission ports (587 STARTTLS, 465 SMTPS) both supported via tls:. Auth is username + password from a file the daemon reads at runtime; the password never sits in the parsed config tree.
Webhook
kind: webhook. POSTs a generic JSON envelope to a URL:
{"cert_id": "...", "kind": "renewal_failed", "message": "...", "timestamp": "RFC3339"}
Vendor-neutral on the wire. Slack-incoming, Discord, Microsoft Teams, and similar opinionated formats are out of scope: point a small relay (n8n, Pipedream, your own service) at this URL and translate. Keeps rota's wire format flat instead of growing a per-vendor bestiary.
Optional Bearer token auth from a file. Per-request timeout defaults to 10 seconds.
Roadmap
More CAs: Sectigo direct, GoDaddy. More DNS-01 solvers: Route 53, DigitalOcean, Porkbun. More install targets: a native HTTP-01 listener (instead of webroot), more reload integrations as operators surface needs.
Configuration reference
rota.yaml is the single source of truth for daemon settings, CAs, DCV solvers, install targets, alerts, and federation. The path defaults to /etc/rota/rota.yaml; override with --config <path> or the ROTA_CONFIG env var.
Top-level shape
daemon: {...} # daemon-wide settings
audit: {...} # optional; defaults to SQLite at daemon.database_path
namecheap: {...} # account-wide, required if any cert names namecheap
cloudflare: {...} # account-wide, required if any cert names cloudflare
acme: {...} # account-wide, required if any cert names acme
cluster: {...} # optional federation block
alerts: [...] # optional list of notification sinks
certs: [...] # required list of cert configs
daemon
daemon:
database_path: /var/lib/rota/rota.db
listen_addr: 127.0.0.1:7878
socket_path: /var/run/rota.sock
check_interval_seconds: 3600
renew_threshold_days: 30
| Field | Default | Notes |
|---|---|---|
database_path | /var/lib/rota/rota.db | SQLite audit DB. Auto-created mode 600. |
listen_addr | 127.0.0.1:7878 | Dashboard HTTP listen. Bind behind a reverse proxy for external access. |
socket_path | /var/run/rota.sock | UNIX socket the rota CLI talks to. |
check_interval_seconds | 3600 | Scheduler sweep cadence. |
renew_threshold_days | 30 | Renew when the installed cert's notAfter is closer than this. |
audit
Omit for SQLite at daemon.database_path (single-node default). For SurrealDB:
audit:
kind: surrealdb
endpoint: ws://surreal.internal:8000
namespace: rota
database: prod
username: rota
password_file: /etc/rota/secrets/surreal.password
endpoint accepts mem://, file://path, ws://, wss://, http://, https://. Embedded engines (mem://, file://) skip auth; remote engines need username and password_file.
Secrets and environment variables
Every *_file: field in this config can read from an environment variable instead of a file by setting the path to env:VAR_NAME. Every operator-set String field accepts ${VAR} interpolation against the process environment at config-load time. An unset referenced variable is a fatal startup error.
namecheap:
api_key_file: env:NAMECHEAP_API_KEY # secret comes from env, not a file
username: ${NAMECHEAP_USERNAME} # inline interpolation
client_ip: ${NAMECHEAP_CLIENT_IP}
This pairs with any secret-injection mechanism that exports env vars to the daemon process. Common shapes:
- Doppler: run rotad as
doppler run --project rota --config prd -- rotad --config /etc/rota/rota.yaml. Doppler injectsNAMECHEAP_API_KEY, etc. into the child process. - systemd LoadCredentialEncrypted=: drops decrypted material into
$CREDENTIALS_DIRECTORY/<name>. Reference viaenv:if you also export the value throughEnvironment=, or point*_file:directly at the credentials path. - HashiCorp Vault Agent: writes a templated env file the systemd unit reads via
EnvironmentFile=. - Plain compose
env_file:for hosts where Doppler is overkill.
Refusing to start when a referenced variable is unset means a misconfigured deploy fails loud at boot rather than silently calling a vendor API with an empty key.
CA accounts
namecheap
namecheap:
api_key_file: /etc/rota/secrets/namecheap-api.key
username: your-namecheap-username
api_user: optional-sub-account-user # defaults to username
client_ip: 192.0.2.1
client_ip must be on the account's whitelisted IPs in Namecheap, or the API rejects every call. The same credentials authenticate both the CA backend (reissue) and the DCV backend (DNS).
cloudflare
cloudflare:
api_token_file: /etc/rota/secrets/cloudflare.token
Token scope: Zone.DNS:Edit on every zone rota manages. rota does not support the legacy Global API Key.
acme
acme:
directory_url: https://acme-v02.api.letsencrypt.org/directory
contact_email: ops@example.com
account_credentials_file: /etc/rota/secrets/acme-account.json
external_account_binding: # optional; ZeroSSL et al.
kid: <CA-assigned key id>
hmac_key_file: /etc/rota/secrets/zerossl.hmac
Common directory URLs:
- Let's Encrypt prod:
https://acme-v02.api.letsencrypt.org/directory - Let's Encrypt staging:
https://acme-staging-v02.api.letsencrypt.org/directory - ZeroSSL:
https://acme.zerossl.com/v2/DV90 - BuyPass:
https://api.buypass.com/acme/directory
account_credentials_file is created on first run; treat like a private key (mode 0o600).
cluster
Omit for single-node. To enable federation:
cluster:
enabled: true
node_id: host-a # unique per node
lease_seconds: 60 # refresh cadence is lease/3 (~20s here)
Requires audit.kind: surrealdb because the lock and cert blobs live in that database. See the federation runbook for end-to-end setup.
alerts
A list. Every event fans out to every entry, so operators can mix sinks:
alerts:
- kind: email
smtp_host: smtp.example.com
smtp_port: 587
tls: starttls # starttls (587), implicit (465), or none
username: alerts@example.com
password_file: /etc/rota/secrets/smtp.password
from: rota@example.com
to: [oncall@example.com]
- kind: webhook
url: https://hooks.example.com/incoming/abc
bearer_token_file: /etc/rota/secrets/webhook.token # optional
timeout_seconds: 10 # optional, default 10
certs
Each cert picks one CA, one DCV solver, one install target:
certs:
- id: example-public # stable; used in logs, CLI, dashboard
description: example.com marketing site
domains: [example.com, www.example.com]
key_path: /var/lib/rota/keys/example.com.key
ca:
kind: <namecheap | acme>
dcv:
kind: <namecheap | cloudflare | webroot>
install:
kind: <dsm | filesystem | nginx | haproxy | k8s_secret>
ca variants
ca: { kind: namecheap, ssl_id: 12345678 }
ca: { kind: acme }
dcv variants
dcv: { kind: namecheap }
dcv: { kind: cloudflare }
dcv: { kind: webroot, directory: /var/www/example }
install variants
install: { kind: dsm, description: My Public Site }
install: { kind: filesystem, directory: /etc/ssl/example }
install:
kind: nginx
directory: /etc/nginx/certs/example
reload_command: [systemctl, reload, nginx] # optional, default [nginx, -s, reload]
install:
kind: haproxy
directory: /etc/haproxy/certs
socket_path: /run/haproxy/admin.sock
cert_storage_name: /etc/haproxy/certs/example.pem
install:
kind: k8s_secret
namespace: ingress-nginx
secret_name: example-tls
kubeconfig_path: /etc/rota/kubeconfig # optional, omit for in-cluster SA
Migration from earlier versions
v0.5 to v0.6
rota.yaml: renameregistrar:todcv:on every cert. The kind values (namecheap,cloudflare) are unchanged; only the parent field name moves.- New optional
cluster:block enables multi-host federation. - Wire protocol bumped from 1 to 2 (
CertSummary.registrar_backendbecomesdcv_backend). TherotaCLI must upgrade alongsiderotad; older clients hit a clean version-mismatch error rather than silent misparse.
Federation runbook
Multiple rotad instances pointing at the same SurrealDB elect a single leader to run the renewal scheduler. Followers stand by for failover and install cluster-distributed certs locally with operator-pre-provisioned private keys.
Why this exists
Two operator-side use cases:
- High-availability renewer. A single
rotadis a single point of failure. If its host goes down withinrenew_threshold_daysof a cert'snotAfter, the cert lapses. A two-node cluster with leader election keeps renewal pulled forward through host failures. - Multi-host install. A cert that fronts multiple machines (load balancers, service mesh ingress, redundant API servers) needs to land on each host. With federation, one node renews and every node installs locally.
Architecture
┌──────────────────────────┐
│ SurrealDB (operator) │
│ namespace: rota │
│ database: prod │
│ │
│ cluster_lock:singleton │
│ issued_cert:<rows> │
│ renewal:<rows> │
│ renewal_event:<rows> │
└──────────────────────────┘
▲ ▲ ▲
│ │ │
┌────────┘ │ └────────┐
│ │ │
┌─────┴────┐ ┌─────┴────┐ ┌─────┴────┐
│ rotad A │ │ rotad B │ │ rotad C │
│ leader │ │ follower │ │ follower │
└──────────┘ └──────────┘ └──────────┘
↓ scheduler ↓ install_sync ↓ install_sync
↓ renewer ↓ poll ↓ poll
↓ install (local)
All nodes share one SurrealDB. (rota doesn't care whether that's a single instance or a SurrealDB cluster.) One node holds the lock at cluster_lock:singleton and runs the renewal scheduler; the others have their schedulers gated on is_leader() and skip silently.
The leader's renewer pipeline persists every successful issuance to the issued_cert table. Every node (including the leader, but the leader's install_sync self-suppresses) runs an InstallSyncTask that polls issued_cert and runs the local InstallBackend when the audit cert is fresher than what's installed locally.
Trust model
The audit store carries cert PEM and chain PEM. Private keys are never written to the audit store.
Each cluster member's key_path private key is provisioned out-of-band: config-management, secrets manager, manual scp, whatever the operator already uses for sensitive material. The shared SurrealDB is in the trust boundary for cert metadata and renewal history but not for key material. If the database is compromised, an attacker can read which certs exist and when they were renewed; they cannot forge requests against the CA or impersonate any host.
Setup
1. Provision SurrealDB
Operators who already run SurrealDB skip ahead. Otherwise the simplest is one surreal instance behind a reverse proxy on a stable host:
surreal start --user root --pass <root-password> file:///var/lib/surrealdb
Then create the namespace and database for rota:
surreal sql --user root --pass <root-password> --ns rota --db prod
> DEFINE NAMESPACE rota;
> DEFINE DATABASE prod;
2. Provision per-cert private keys on each node
Pick the key_path directory each node will use. Mode 0700. Place the same private key file on every cluster member that participates in installing this cert:
# On every node:
install -d -m 0700 /var/lib/rota/keys
install -m 0600 example.com.key /var/lib/rota/keys/example.com.key
The keys must be byte-identical across nodes. rota uses one key per cert (no per-node keys) so the cert validates against any node's TLS handshake.
3. Configure each node
Each rota.yaml is the same except for cluster.node_id:
daemon:
database_path: /var/lib/rota/rota.db # local SQLite for local audit only; the shared audit lives in SurrealDB
listen_addr: 127.0.0.1:7878
socket_path: /var/run/rota.sock
check_interval_seconds: 3600
renew_threshold_days: 30
audit:
kind: surrealdb
endpoint: wss://surreal.internal:8000
namespace: rota
database: prod
username: rota
password_file: /etc/rota/secrets/surreal.password
cluster:
enabled: true
node_id: host-a # different per node: host-a, host-b, host-c
lease_seconds: 60
# ... ca / dcv / alerts / certs blocks identical across nodes
4. Start each node
# host-a:
rotad --config /etc/rota/rota.yaml &
# host-b:
rotad --config /etc/rota/rota.yaml &
# host-c:
rotad --config /etc/rota/rota.yaml &
Whichever node wins the initial lock acquisition becomes leader. The others log cluster: still follower and stand by.
Verifying
Who's the leader?
# On every node:
rota status
Each node shows the same cert table (it's pulled from the shared audit). rotad's logs differentiate:
INFO cluster: acquired leader lock # leader
INFO cluster: still follower # followers
A direct query against SurrealDB:
SELECT * FROM cluster_lock:singleton;
returns the holder's node_id and lease expiry.
Did the cert distribute?
After a successful renewal:
SELECT * FROM issued_cert WHERE cert_id = 'example-public' ORDER BY issued_at DESC LIMIT 1;
shows the fresh cert blob. Each follower's install_sync task picks it up on its next poll (one check_interval_seconds), installs locally, and the last_renewal_status on each node's rota status output reflects the fresh cert.
Failover
When the leader dies (host crash, kernel oom, network partition), its lease lapses after lease_seconds (default 60s). The next polling follower acquires the lock and becomes the new leader; renewals pick back up automatically. No operator intervention needed.
If a leader recovers from a transient failure and re-acquires the lock, no harm done: the record_issued_cert writes are append-only, and latest_issued_cert is monotonic by issued_at.
Failure modes worth knowing
SurrealDB unreachable from the leader. The lease loop logs lock-check failures and demotes defensively. Followers see no leader; on their next sweep one of them tries to acquire and may succeed (if their network sees SurrealDB) or also fail. Renewals pause until SurrealDB is reachable from at least one node.
Private key drift across nodes. If the per-node key_path differs, follower installs will succeed locally but the served cert won't match any other node's chain. Audit this with a cross-node openssl x509 -in and openssl rsa -in modulus comparison.
Cert distribution lag. Followers poll on check_interval_seconds. With the default 1h, a follower can be up to 1h behind the leader's renewal. Tune the interval down if you need tighter sync (the cost is more SurrealDB traffic, but it's a single SELECT per cert per tick).
Rolling back to single-node
Set cluster.enabled: false (or remove the block entirely) on the surviving node and restart it. The leader lock will lapse; no other node tries to acquire. The audit store retains its history. Point the surviving node at SQLite instead of SurrealDB if you want to fully decouple.