AI Automation Fleet — Governance as Code

Problem

The gap between an agent demo and a trustworthy production system isn’t model performance. It’s these three things: agents that act without approval, decisions that can’t be audited, and costs with no ceiling. I wanted to build and run a system that closes this gap head-on, even at the scale of a personal setup.

Approach

A single personal Mac serves as the always-on host, with three layers stacked on top of it.

Tool layer — the MCP daemon. 58 tools — mail, calendar, files, browser, shell, and more — are exposed over the Model Context Protocol. Any action that causes a change requires human approval under policy; every call is written to an audit log, and policy decisions can explain “why this was blocked.”
Operations layer — the autonomous-loop registry. 150+ recurring jobs (157 actual, measured) — mail triage, news collection, security scanning, backup verification, and the like — are managed under a single registry. Each loop is fail-closed-loud: if it fails, it doesn’t fail silently, it surfaces on a notification channel. A new loop must pass a gate (runbook + lint) before it can be registered.
Control layer — circuit breakers and ceilings. Any action that posts or sends externally is bounded by rate ceilings and circuit breakers, a hard deny-list, and a kill switch. The whole system is designed to run on local models alone, with no frontier model required — I fixed not a cost ceiling but the cost structure itself at $0.

Results

58 tools and 150+ autonomous loops have been running without interruption for months on a single host.
Metered spend: $0 — the combination of local LLMs (Ollama/MLX) and free tiers doesn’t just “save money,” it makes spending structurally impossible.
The system also keeps a record of its own mistakes: an outage caused by a bad restart, a loop that double-fired — these defects live on in the audit log and postmortem docs, and they’ve changed the next round of design.

Limitations and failure modes

This is a personal project. It doesn’t address the permission separation, multi-user, and compliance requirements of an organizational setting.
Because it’s a single-host architecture, if that Mac goes down, everything stops — a deliberate choice that prioritizes controllability over availability.
Approval gates create fatigue. Tuning the policy for which actions get auto-approved is never-ending work.
This is not, and is not meant to be, fully autonomous — a human’s after-the-fact veto is part of the design.

What I learned

Governance isn’t a safety feature bolted onto an agent afterward — it’s the ground floor of the architecture. Lay down approval, auditing, and ceilings first, and the automation built on top of them can afford to be bold. Get the order backwards, and you can’t trust any of it.