How We Keep Sky Protocol Running with 100% Uptime

Sky Protocol (formerly MakerDAO) is the oldest and largest on-chain stablecoin issuer. TechOps Services, the team that built KeeperHub, has been running their blockchain automations since 2018.
Seven years in, we've never missed a single execution.
The numbers
- $9.5B in total value locked, protected around the clock
- 100% uptime since inception
- ~30% gas savings through smart estimation algorithms
- 7+ years of continuous operation
What makes blockchain automation hard
Running automations at Sky's scale isn't like running a cron job. Gas spikes, node outages, and chain congestion can cause standard bots to fail silently. When that failure means a missed liquidation or a stalled governance vote, the consequences are measured in millions.
The core challenges we face daily:
Network unpredictability. Ethereum gas prices can 10x in minutes. Node providers go down without warning. Mempool congestion delays transactions for hours. Standard tools don't handle any of this gracefully.
Around-the-clock monitoring. Smart contract events don't wait for business hours. A critical governance action might trigger at 3am on a Sunday, and it needs to execute immediately.
Operational complexity at scale. Dozens of keepers run simultaneously, handling governance votes, yield farming, treasury movements, and protocol health checks. Each has different triggers, gas requirements, and failure modes.
The gap we saw early on: off-the-shelf automation tools lacked blockchain-specific reliability guarantees. No nonce management. No transaction replacement strategies. No exponential backoff for guaranteed execution on Ethereum.
So we built what was missing.
What we built
What began as embedded infrastructure work within the original Maker Foundation now runs the protocol's daily operations. When Maker Foundation dissolved in 2021, we handled the infrastructure migration to MakerDAO and then again to Sky.
Here's what the stack looks like today:
| Capability | What it does |
|---|---|
| Smart gas estimation | Starts at network average, escalates only when necessary. Avoids overpayment while guaranteeing execution |
| Intelligent retry logic | Pending transaction detection, nonce reuse, exponential backoff across up to 10 attempts per operation |
| Multi-node resilience | Fallback node infrastructure keeps things connected during provider outages |
| 24/7 global support | Distributed DevOps team monitors and responds to incidents in real time |
| Code-optional config | Automations managed through structured configuration, not fragile custom scripts |
What this powers
The automations we run for Sky Protocol cover four main areas:
Governance automation. Timely execution of on-chain votes and parameter changes. Delayed governance actions in DeFi can block protocol upgrades or leave parameters stale during market shifts.
Yield farming. Maintained without interruption through volatile market conditions and every major market event since 2018.
Treasury management. Secured fund movements across the protocol. When you're moving assets at this scale, there's no room for a stuck transaction.
Protocol health monitoring. 24/7 oversight of smart contract state. We've caught and resolved incidents before they hit the protocol.
From custom infrastructure to KeeperHub
Everything we learned running Sky Protocol's infrastructure for seven years is now in KeeperHub. The same gas estimation, retry logic, and monitoring that keeps a $9.5B protocol running, available to anyone building on-chain.
You don't need to spend years building custom keeper infrastructure anymore. That's the whole point.


