Your agents run in production.
Now govern them.
Open-source decision infrastructure for agentic pipelines. Guards, consensus policies, and audit trails — so the 2am page never happens.
What happens when your agent
acts at 2am?
Your agentic pipeline is in production. It's making decisions, calling APIs, modifying data. Without governance, every autonomous action is a potential incident.
No audit trail
When an agent makes a bad call, you can't reconstruct what happened. No decision log, no vote history, no artifact trail.
No guard rails
Agents execute actions without policy checks. A misconfigured prompt can trigger irreversible changes in production.
No human escalation
There's no mechanism to page a human when confidence is low. The agent either acts or doesn't — no middle ground.
No observability
You can't monitor decision quality over time. No metrics on guard accuracy, no SLAs on AI outputs, no post-mortem data.
Why your pipeline needs
a decision firewall
consensus.tools interposes between agent intent and system action. Every decision passes through policy-driven guards before it becomes output.
How decisions flow through your pipeline
Your agent proposes an action. Guards evaluate it. Personas vote. A policy resolves the verdict. If risk is high, a human gets paged. Every decision is logged.
Guard Evaluates the Action
Your agent calls the guard before executing. The guard inspects the action payload — what type of action, what files are touched, what risk signals exist — and each evaluator persona votes independently.
import { GuardHandler } from "@consensus-tools/guards";
const handler = new GuardHandler({ storage });
const result = await handler.evaluate({
boardId: "prod-pipeline",
action: {
type: "code_merge",
payload: {
files: ["src/auth/login.ts", "src/auth/session.ts"],
tests_passed: true,
author: "agent-codex-7",
},
},
});
// result.decision: "REQUIRE_HUMAN"
// result.risk_score: 0.82
// result.reason: "Sensitive auth files touched"Consensus Policy Resolves the Vote
Multiple personas evaluated the action. Now the policy engine aggregates their votes. 9 algorithms available — from simple majority to reputation-weighted. Same inputs, same output, every time.
const { decision, tally, quorumMet } = computeDecision(
votes, // security: NO (0.9), compliance: YES (0.7), ops: YES (0.8)
policy, // { quorum: 0.6, riskThreshold: 0.7 }
"hybrid", // weight by reputation
);
// decision: "REQUIRE_HUMAN" — risk above threshold
// tally: { yes: 2, no: 1, rewrite: 0 }
// quorumMet: trueHuman Gets Paged (or Action Executes)
Based on the verdict: ALLOW executes immediately. BLOCK stops the action and logs why. REQUIRE_HUMAN pages your oncall via Slack/Teams/PagerDuty and waits for approval. REWRITE modifies the action and re-evaluates.
const workflow = await runner.createWorkflow("PR Merge Gate", {
nodes: [
{ id: "pr", type: "trigger", config: { source: "github.pull_request" } },
{ id: "guard", type: "guard", config: { guardType: "code_merge" } },
{ id: "review", type: "hitl", config: { channel: "slack", timeout: "4h" } },
{ id: "merge", type: "action", config: { action: "github.merge_pr" } },
],
});
// If guard returns ALLOW → skip HITL → merge
// If guard returns REQUIRE_HUMAN → pause at HITL → wait for Slack approval
// If guard returns BLOCK → stop → log to audit trailEverything Gets Logged
Every decision creates an auditable artifact. The append-only ledger records: what action was proposed, who voted, what the risk scores were, what the final decision was, and whether a human intervened. Queryable via MCP, CLI, or the local dashboard.
// Query via MCP in Claude Code:
// > audit.search "code_merge AND risk_score > 0.7"
// Or via CLI:
// $ consensus-tools board get board_prod --format json
// Or programmatically:
const events = await storage.searchAudit({
field: "action_type",
value: "code_merge",
fromDate: "2026-03-01",
});
// => 47 decisions, 3 blocked, 12 escalated to humanTry it now
Run the consensus-engineer skill in Claude Code. It analyzes your project, recommends guard integration, scaffolds the setup, and proves it works.
I have an agentic coding pipeline that auto-merges PRs. Set up a consensus-tools code_merge guard that evaluates PRs touching auth, payment, or infrastructure files. Use SUPERMAJORITY policy with 3 personas (security, compliance, ops). Block any PR that fails CI. Escalate to Slack #eng-oncall if risk score exceeds 0.7.
Works with your agent stack
consensus.tools drops into existing agentic systems as the decision firewall. Use your current framework, then gate high-impact actions behind policy-driven guards.
Built as a layered monorepo
16 packages across 5 tiers. Dependencies flow downward only. Use what you need — from a single guard to the full orchestration stack.
Tier 0 — Foundations
Schemas, secrets, and shared typesTier 1 — Primitives
Guards, telemetry, evaluators, integrationsTier 2 — Engine
Job engine, ledger, 9 consensus algorithmsTier 3 — Orchestration
DAG workflows, runtime decision firewallTier 4 — Interface
SDKs, CLI, MCP tools, dashboardDependencies flow downward only (Tier 0 → 4). Enforced by dependency-cruiser in CI.
Built in the open
The consensus-tools CLI, guard packages, and agent skills are Apache 2.0-licensed and production-grade. Run entirely on your own infrastructure.
npx consensus-tools initCLI & Guards
Install and run locally. Configure guard policies, define consensus rules, review decisions from the terminal. No account required.
consensus.config.tsPolicy Engine
Define guard policies as code. 9 consensus algorithms, configurable thresholds, custom guard types. Version-controlled governance.
local-boardSQLite Dashboard
Full observability with a local dashboard. Decision logs, guard metrics, audit trails. Append-only SQLite ledger.