All posts

How We Keep Sky Protocol Running with 100% Uptime

Sky Protocol and KeeperHub

Sky Protocol (formerly MakerDAO) is the oldest and largest on-chain stablecoin issuer. TechOps Services, the team that built KeeperHub, has been running their blockchain automations since 2018.

Seven years in, we've never missed a single execution.

The numbers

  • $9.5B in total value locked, protected around the clock
  • 100% uptime since inception
  • ~30% gas savings through smart estimation algorithms
  • 7+ years of continuous operation

What makes blockchain automation hard

Running automations at Sky's scale isn't like running a cron job. Gas spikes, node outages, and chain congestion can cause standard bots to fail silently. When that failure means a missed liquidation or a stalled governance vote, the consequences are measured in millions.

The core challenges we face daily:

Network unpredictability. Ethereum gas prices can 10x in minutes. Node providers go down without warning. Mempool congestion delays transactions for hours. Standard tools don't handle any of this gracefully.

Around-the-clock monitoring. Smart contract events don't wait for business hours. A critical governance action might trigger at 3am on a Sunday, and it needs to execute immediately.

Operational complexity at scale. Dozens of keepers run simultaneously, handling governance votes, yield farming, treasury movements, and protocol health checks. Each has different triggers, gas requirements, and failure modes.

The gap we saw early on: off-the-shelf automation tools lacked blockchain-specific reliability guarantees. No nonce management. No transaction replacement strategies. No exponential backoff for guaranteed execution on Ethereum.

So we built what was missing.

What we built

What began as embedded infrastructure work within the original Maker Foundation now runs the protocol's daily operations. When Maker Foundation dissolved in 2021, we handled the infrastructure migration to MakerDAO and then again to Sky.

Here's what the stack looks like today:

Capability What it does
Smart gas estimation Starts at network average, escalates only when necessary. Avoids overpayment while guaranteeing execution
Intelligent retry logic Pending transaction detection, nonce reuse, exponential backoff across up to 10 attempts per operation
Multi-node resilience Fallback node infrastructure keeps things connected during provider outages
24/7 global support Distributed DevOps team monitors and responds to incidents in real time
Code-optional config Automations managed through structured configuration, not fragile custom scripts

What this powers

The automations we run for Sky Protocol cover four main areas:

Governance automation. Timely execution of on-chain votes and parameter changes. Delayed governance actions in DeFi can block protocol upgrades or leave parameters stale during market shifts.

Yield farming. Maintained without interruption through volatile market conditions and every major market event since 2018.

Treasury management. Secured fund movements across the protocol. When you're moving assets at this scale, there's no room for a stuck transaction.

Protocol health monitoring. 24/7 oversight of smart contract state. We've caught and resolved incidents before they hit the protocol.

From custom infrastructure to KeeperHub

Everything we learned running Sky Protocol's infrastructure for seven years is now in KeeperHub. The same gas estimation, retry logic, and monitoring that keeps a $9.5B protocol running, available to anyone building on-chain.

You don't need to spend years building custom keeper infrastructure anymore. That's the whole point.

Start building with KeeperHub

Stay in the loop

Get the latest on Web3 automation, product updates, and technical deep dives delivered to your inbox.