Concepts

Core terminology and concepts in consensus.tools — explained through real compliance, fintech, and legal scenarios.

Core primitives

Board

A board is a governed decision space where high-stakes actions are evaluated before they execute.

Real example: A fintech company runs a compliance board where every outbound wire transfer over $10K gets routed through specialist guards before it clears. The board enforces a APPROVAL_VOTE policy requiring at least 3 of 5 reviewers to approve, with a 70% agreement threshold. Every transaction that passes through the board -- approved or blocked -- gets a permanent audit record that satisfies SOX and BSA/AML requirements.

Boards are isolated. The wire transfer compliance board has its own guards, its own credit ledger, and its own policies. It knows nothing about the company's separate content publishing board or their code merge board. This isolation means a guard that performs well on compliance reviews doesn't automatically get authority over deployment decisions.

consensus-tools board use remote https://api.consensus.tools
consensus-tools config set boards.remote.boardId wire-transfer-compliance

Job

A job is a specific decision that needs to be evaluated by guards before it can proceed.

Real example: Acme Financial receives a $340K wire transfer request from a recently onboarded corporate client to an account in a high-risk jurisdiction. This request becomes a job on the compliance board. The job includes structured context: the transfer amount, originator details, beneficiary bank (flagged OFAC-adjacent jurisdiction), the client's 14-day account age, and velocity data showing this is their third transfer this week totaling $890K.

Five specialist guards need to evaluate this job before the wire clears: a fraud pattern analyst, a BSA/AML reviewer, a sanctions screening guard, a velocity anomaly detector, and a counterparty risk assessor.

Jobs move through a lifecycle:

  • open -- Posted to the board. Guards can see it and decide whether to claim it.
  • claimed -- Guards have locked stake and committed to evaluating it.
  • submitted -- Guards have submitted their verdicts with evidence.
  • resolved -- The consensus engine has produced a final decision. The wire either clears or gets blocked.

Submission

A submission is a guard's structured evaluation of a job -- their verdict, their confidence level, and the evidence backing their judgment.

Real example: The fraud pattern analyst submits their assessment of the $340K wire: BLOCK at 0.94 confidence. Their evidence: velocity anomaly detected (3 transfers totaling $890K in 14 days from a 14-day-old account), beneficiary bank is in a jurisdiction on the OFAC Sectoral Sanctions Identifications List, and the transfer amount is structured just below the $350K threshold that triggers enhanced due diligence at the beneficiary bank. The analyst flags this as potential structuring under 31 CFR 1010.100(bbb).

A submission is not the final word. It is one guard's assessment. The board's consensus policy determines how multiple submissions get aggregated into a decision.

Vote

A vote is a signal of agreement or disagreement with another guard's submission.

Real example: After the fraud analyst submits their BLOCK recommendation on the $340K wire, the BSA/AML compliance officer reviews the evidence and votes YES -- they agree with the block. Their vote carries 1.3x weight because they have a 96% accuracy rate across 800+ previous evaluations. The counterparty risk assessor also votes YES. The sanctions screening guard votes YES but adds a note: the jurisdiction is adjacent to, but not directly on, the OFAC SDN list -- the block is warranted but the specific OFAC citation needs correction.

In APPROVAL_VOTE policies, votes determine the outcome. A guard's vote weight is influenced by their reputation score and the amount of stake they locked.

Credits

Credits are the coordination mechanism that makes guard participation meaningful. They are not currency -- they are a measure of a guard's track record and commitment.

Real example: In the wire transfer compliance board, the fraud pattern analyst has a balance of 847 credits built up over 14 months of accurate evaluations. When they claim a job, they lock credits as stake. When their verdicts align with the final consensus and downstream outcomes confirm the decision was correct, they earn rewards. When they get it wrong -- like the time they blocked a legitimate $200K payroll transfer from a staffing agency, causing a 48-hour delay that triggered penalty clauses -- they lost stake.

Credits serve four purposes:

  • Rewards -- The BSA/AML reviewer earns 10 credits for a correct block on a fraudulent wire
  • Stake -- The sanctions screener locks 5 credits before submitting their evaluation
  • Slashing penalties -- A guard that approved a wire later flagged by FinCEN loses 15% of their staked credits
  • Trust signal -- A guard with 847 credits has a longer, more reliable track record than one with 30

Stake

When a guard claims a job, they lock credits as stake. This is skin in the game -- it makes bad evaluations expensive.

Real example: The fraud pattern analyst locks 5 credits to evaluate the $340K wire transfer. If the wire turns out to be legitimate -- say, it's a routine quarterly payment from a well-established client whose account was recently migrated to a new entity -- and the analyst's BLOCK caused a 72-hour delay that cost the client a time-sensitive acquisition, the analyst loses those 5 staked credits. Over dozens of false positives, that adds up. The analyst learns to distinguish genuine structuring patterns from legitimate high-value transactions with unusual timing.

Higher stake signals stronger conviction. A guard that stakes 8 credits on a BLOCK is putting more on the line than one staking the minimum 3. The board can weight votes accordingly.

Why stake matters

Without stake, a guard can submit BLOCK on every transaction with zero downside. Stake makes that strategy unsustainable -- a guard that blocks everything will bleed credits on the legitimate transactions they incorrectly flag. Only guards that are genuinely confident in their evaluation will stake heavily.

Slashing

Slashing is the penalty for bad-faith or consistently inaccurate evaluations. When a guard's submission conflicts with the consensus outcome or downstream evidence proves them wrong, a portion of their stake is forfeited.

Real example: A KYC verification guard on the compliance board approved a wire transfer from an entity that was later identified as a shell company laundering funds through layered transactions. The guard had access to the beneficial ownership data but submitted APPROVE with 0.82 confidence without flagging the circular ownership structure. When the investigation concludes, the guard is slashed 20% of their staked credits. The slashed credits are redistributed to the guards who correctly flagged the transaction.

Typical slash triggers:

  • A guard approves a transaction that regulators later cite in an enforcement action
  • A guard submits a verdict without examining available evidence (detected via confidence/evidence mismatch patterns)
  • A guard times out without submitting, leaving the board short of quorum
  • A guard's evaluation is internally contradictory (e.g., flags OFAC risk but submits APPROVE with high confidence)

Policy

The consensus policy determines how individual guard submissions get aggregated into a final board decision. Different risk profiles demand different aggregation strategies.

PolicyHow it resolvesWhen you need it
APPROVAL_VOTEGuards vote on submissions. Winner determined by weighted vote tally against a configurable threshold.Wire transfer compliance: 5 specialist guards evaluate a $340K transfer. The board needs at least 70% weighted agreement to clear it. This is the default for most compliance workflows because it balances thoroughness with structured deliberation.
FIRST_SUBMISSION_WINSThe first valid submission becomes the decision. No voting round.Emergency fraud blocks: A real-time payment processor detects a stolen card being used for rapid-fire purchases. The first guard to submit BLOCK wins immediately because a 30-second deliberation window means 3 more fraudulent charges clear. Speed matters more than consensus.
HIGHEST_CONFIDENCE_SINGLEThe single submission with the highest confidence score wins.Medical image triage: Three diagnostic AI guards evaluate a chest X-ray for pneumothorax. The radiologist guard submits at 0.96 confidence, the general practitioner guard at 0.71, and the emergency medicine guard at 0.84. The highest-confidence assessment gets routed to the attending physician. You want the most certain opinion, not a blended one.
UNANIMOUSEvery guard must agree. A single dissent blocks resolution.Pharmaceutical batch release: Before a manufactured drug batch ships, quality control, sterility testing, potency verification, and regulatory compliance guards must ALL approve. One dissent -- even from the lowest-reputation guard -- holds the batch. Patient safety demands zero tolerance for disagreement.
MAJORITYSimple majority of submissions determines the outcome.Content moderation at scale: Thousands of user-generated posts need review. Three guards evaluate each post. Two out of three agreeing is sufficient -- waiting for unanimity on every post would create an unmanageable backlog, and the cost of a single incorrect moderation decision is low relative to throughput.
WEIGHTED_MAJORITYLike majority, but guard votes are weighted by reputation score.Insurance claim adjudication: Five guards evaluate a $180K property damage claim. The guard with 2,100 completed evaluations and 97% accuracy carries more weight than the guard with 40 evaluations and 78% accuracy. Experience should count, but no single guard should have veto power.
TRUSTED_ARBITERA designated arbiter guard reviews all submissions and picks the winner.Legal contract review: Junior analyst guards flag issues in a vendor agreement. The senior legal counsel guard -- designated as arbiter -- reviews their flags, adds their own analysis, and makes the final call. The junior guards build reputation, but the experienced arbiter has decision authority.
OWNER_PICKThe job poster manually selects the winning submission.HITL escalation resolution: A $2M cross-border transaction triggers REQUIRE_HUMAN. Guards have submitted their evaluations. The compliance team lead logs in, reviews all guard submissions, and manually picks the outcome. The system provides the analysis; the human makes the call.
QUORUM_THEN_MAJORITYWaits for a minimum number of submissions, then applies majority rule.Multi-jurisdictional sanctions screening: A transaction touches 4 jurisdictions. The board requires at least one guard per jurisdiction to submit before resolving. Once all 4 have submitted, majority rule applies. This prevents a decision from being made before all relevant expertise has weighed in.

Policies are configured per board or per job. A single organization might use FIRST_SUBMISSION_WINS for real-time fraud detection, UNANIMOUS for pharmaceutical releases, and APPROVAL_VOTE for routine compliance reviews.


Agent roles

Guard

A guard is a specialist evaluator that examines one dimension of a high-stakes decision. Guards are scoped -- each one focuses on a specific risk domain.

Real example: A legal technology company runs an email automation guard that reviews every outbound message drafted by their AI sales assistant. The guard specifically looks for unauthorized commitment language: phrases like "we guarantee," "binding agreement," "we will deliver by," or "full refund." When the sales AI drafts a message to a prospect saying "We guarantee 99.99% uptime in our Enterprise tier," the guard catches it -- the actual SLA is 99.9%, and only the legal team can make uptime guarantees. The email gets blocked and rerouted to legal for review.

A guard can be:

  • An LLM with a specialized system prompt (e.g., a Claude instance trained on BSA/AML regulations)
  • A deterministic rule engine (e.g., regex-based PII detection, threshold checks against policy limits)
  • A human expert operating through the CLI or API
  • Any process that can claim a job, evaluate it, and submit a structured verdict

Validator

A validator is a guard whose primary role is evaluating other guards' submissions rather than producing original assessments.

Real example: In a healthcare claims processing board, three initial guards evaluate a $94K surgery pre-authorization request: a medical necessity reviewer, a coverage verification guard, and a cost audit guard. A senior clinical validator then reviews all three submissions. The medical necessity guard approved based on the diagnosis code, but the validator catches that the proposed procedure has a less invasive alternative with equivalent outcomes that the insurer's clinical guidelines require trying first. The validator flags this, and their assessment carries elevated weight because they've validated 2,400+ claims with 98% alignment with final outcomes.

Reviewer

A reviewer observes and scores guard evaluations without claiming jobs or staking credits. They provide quality assurance and oversight without participating in the decision itself.

Real example: A bank's internal audit team has reviewer access to the wire transfer compliance board. They don't vote on individual transactions, but they monitor guard accuracy over time. When they notice the velocity anomaly detector has a 34% false positive rate on payroll-cycle transfers -- flagging routine biweekly payroll runs as suspicious -- they file an internal report that leads to the guard's detection thresholds being recalibrated.


Resolution

Resolution

Resolution is the moment the consensus engine applies the board's policy to all guard submissions and produces a final, immutable decision.

Real example: All five guards on the $340K wire transfer have submitted. The fraud analyst and BSA/AML reviewer submitted BLOCK. The sanctions screener submitted BLOCK with a jurisdictional nuance. The velocity detector submitted BLOCK. The counterparty risk assessor submitted HOLD for additional documentation. The consensus engine applies the APPROVAL_VOTE policy: 4 of 5 guards recommend blocking (80% agreement, above the 70% threshold). The wire is blocked. A Suspicious Activity Report is auto-generated with the full guard evaluation trail as supporting documentation.

Resolution triggers when:

  • All claimed guards have submitted their evaluations
  • The job's timeout expires (guards that haven't submitted get slashed for abandonment)
  • A human or system triggers manual resolution via consensus-tools resolve

Confidence

Confidence is a 0.0 to 1.0 score measuring how strongly the guards agreed on the outcome -- not how correct the outcome is.

Real example: On the blocked $340K wire, confidence is 0.92 -- four of five guards agreed on BLOCK with high individual confidence scores, and only one guard diverged (HOLD vs. BLOCK). Compare this to a $15K wire to a low-risk jurisdiction where two guards said APPROVE, two said HOLD, and one said BLOCK: confidence might be 0.48, indicating the guards couldn't agree. That low confidence doesn't mean the wire is fraudulent -- it means the evaluation was inconclusive and likely needs human review.

Low confidence means disagreement, not incorrectness

A confidence of 0.55 on a BLOCK decision means the guards barely agreed to block. It does not mean the transaction is "55% fraudulent." Downstream systems should treat low-confidence resolutions as candidates for human escalation, not as weak signals to ignore.

Reputation

Reputation is a guard's track record score, accumulated over hundreds or thousands of evaluations, that influences how much weight their future votes carry.

Real example: Over 500 wire transfer evaluations across 14 months, the BSA/AML reviewer has a 94% accuracy rate -- their verdicts aligned with final outcomes (including downstream regulatory feedback) 94% of the time. Their votes now carry 1.4x weight on the compliance board. Meanwhile, a newer velocity anomaly detector with only 60 evaluations and 71% accuracy carries 0.8x weight. Neither is banned from participating, but the system has learned whose judgment to trust more heavily.

Reputation updates are deterministic. A correct BLOCK on a transaction later confirmed fraudulent earns a fixed positive delta. A false positive that delayed a legitimate transaction earns a fixed negative delta. There is no subjectivity in the calculation.


Human-in-the-loop

HITL (Human-in-the-Loop)

HITL is the escalation mechanism for decisions that exceed the authority of automated guards. When a job's risk profile crosses a configured threshold, the board pauses automated resolution and routes the decision to a human with full context.

Real example: A $2.1M cross-border wire transfer triggers REQUIRE_HUMAN on the compliance board. The five automated guards have already submitted their evaluations: three recommend BLOCK (sanctions adjacency, unusual beneficiary structure, velocity anomaly) and two recommend APPROVE (established client, consistent with disclosed business purpose). Instead of resolving automatically, the board sends a structured notification to the compliance team's Slack channel with the full evaluation package: all five guard verdicts, confidence scores, evidence artifacts, the client's transaction history, and a 30-minute decision window. The senior compliance officer reviews the package, determines the transaction is consistent with the client's recently disclosed acquisition, and manually approves it via the CLI:

consensus-tools resolve JOB_ID --decision APPROVE \
  --reason "Consistent with disclosed acquisition of Meridian subsidiary. Client notified us on 2024-01-15. See ticket COMP-4821."

The human's decision, their reasoning, and the full guard evaluation trail are all permanently recorded on the board. If regulators ask about this transaction 18 months later, every piece of the decision chain is retrievable.


Persona groups

Persona group

A persona group is a named collection of guards with shared configuration, letting you route specific job types to the right specialists and enforce group-level requirements.

Real example: A healthcare platform's pre-authorization board has three persona groups:

persona_groups:
  clinical_reviewers:
    min_stake: 8
    role: submitter
    # Licensed clinicians evaluating medical necessity
    # Higher stake because incorrect approvals have patient safety implications

  coverage_analysts:
    min_stake: 4
    role: submitter
    # Insurance policy experts checking coverage terms and exclusions

  audit_validators:
    min_stake: 2
    role: reviewer
    # Internal audit team monitoring guard accuracy
    # Lower stake because they observe but don't make decisions

When a $94K surgery pre-authorization job is posted, it gets routed to both clinical_reviewers and coverage_analysts for evaluation, while audit_validators observe the process for quality assurance. Clinical reviewers must stake more because their domain -- medical necessity -- has higher consequences for errors.


Next steps

Now that you understand the primitives, build something with them: