Handbook / Research & open source
The research behind the practice.
Every claim Cohorte makes about trust, verification, sovereignty or agent governance traces back to a paper, a repository, or a free playbook. This is the index.
Why this page exists.
Most AI-governance consultancies appeared after the topic became fashionable. They sell decks. Cohorte's founder published the conformal-reliability-certification method for black-box AI systems, the 10,000-trial taxonomy of agent-exploitation surface, the orchestration architecture grounding the open-source stack, and the operating model behind the PwC AI Factory. The work is recorded here so a buyer can verify the practice rather than take it on faith. If a claim in a Cohorte briefing has no anchor on this page, the briefing says so explicitly.
Papers.
Four papers, three published as preprints on arXiv, one under review at the Transactions on Machine Learning Research. Each one is a load-bearing piece of the Trust & Governance briefings.
Black-Box Reliability Certification for AI Agents via Self-Consistency Sampling and Conformal Calibration
Given a black-box AI system and a task, at what confidence level can a practitioner trust the system's output? We answer with a reliability level, a single number per (system, task) pair, derived from self-consistency sampling and conformal calibration, that serves as a black-box deployment gate with exact, finite-sample, distribution-free guarantees. GPT-4.1 earns 94.6% on GSM8K and 96.8% on TruthfulQA; GPT-4.1-nano earns 89.8% on GSM8K and 66.5% on MMLU. Sequential stopping reduces API costs by approximately 50% without losing the guarantee.
Mapping the Exploitation Surface: A 10,000-Trial Taxonomy of What Makes LLM Agents Exploit Vulnerabilities
LLM agents with tool access can discover and exploit security vulnerabilities. The open question is which features of a system prompt trigger this behaviour. We test 37 prompt conditions across 12 psychological dimensions on 7 models in real Docker sandboxes, about 10,000 trials in total. Nine of twelve hypothesised attack dimensions produce zero exploitation. One dimension works: goal reframing (puzzle, CTF, easter egg). On Claude Sonnet 4, the puzzle framing triggers 38-40% exploitation despite an explicit safety instruction.
Context Kubernetes: An Orchestration Architecture for Enterprise Knowledge in Agentic AI Systems
Delivering the right knowledge, to the right agent, with the right permissions, at the right freshness, within the right cost envelope, across an entire organisation, is structurally analogous to the container-orchestration problem Kubernetes solved a decade ago. We introduce a declarative manifest, a reconciliation loop, and a three-tier permission model where agent authority is always a strict subset of human authority. The prototype (~7,000 lines, 92 tests) is evaluated across eight experiments. Without governance, agents serve phantom content in 26.5% of queries; governed routing eliminates phantom delivery. The three-tier model blocks attacks that RBAC does not.
Three Phases of Expert Routing: How Load Balance Evolves During Mixture-of-Experts Training
We model MoE token routing as a congestion game and track its effective congestion parameter across training. The trajectory reveals three phases: a surge phase where the router learns to balance load, a stabilisation phase where experts specialise under steady balance, and a relaxation phase where the router trades balance for quality. This non-monotone trajectory is invisible to post-hoc analysis of converged models. Studied across OLMoE-1B-7B (20 checkpoints) and OpenMoE-8B (6 checkpoints), with bootstrap confidence intervals on every estimate.
Try TrustGate.
The TrustGate paper is built on six experiments, five tasks and five models. The widget below runs on the actual calibration distributions from those runs. Pick a model and a task, slide the target confidence, watch the conformal cutoff m* move. There is no model call here: every number you see is published in the paper. Most buyers who get to this paragraph want a sense of does this actually work before reading the proofs. This is that sense.
Open-source stack.
The reference implementations behind the AI Operating System. Six repositories, each scoped to one job. Released as Cohorte AI on GitHub so they can be audited, forked, deployed and adopted independently of any commercial engagement.
trustgate
Reliability and trust layer for agent outputs. Self-consistency sampling, conformal calibration, the reliability-level method from the TMLR paper.
guardrails
Policy, safety and compliance guardrails for agents. Input filtering, output validation, the patterns from briefing 03.
context-router
Smart routing of queries to the right context and data source. The retrieval layer with permission enforcement.
agent-monitor
Observability, logging and behavioural monitoring for agents in production. Drift detection, runtime signals, regulator-facing reporting.
agent-auth
Permissions, roles and authentication for agent actions. The three-tier permission model where agent authority is a strict subset of human authority.
context-kubernetes
The orchestration layer that ties the other five together. Declarative manifest, reconciliation loop, the reference architecture from the arXiv paper.
Free playbooks.
Two long-form playbooks already published. They sit between the papers (technical) and the briefings (commercial). A buyer who wants to understand the operating model before getting on a call reads these.
The Enterprise Agentic Platform.
The four-layer reference architecture for putting agents into a serious organisation. Local agents on each employee's machine, a governance middleware that mediates every organisational call, a permission model where agent authority is a strict subset of human authority, and a reliability layer that proves the output. Ten chapters and four appendices, written for architects and CISOs.
From Autonomous Agents to Accountable Systems.
The argument behind the briefings. Why autonomy is the wrong target, why accountability is, and how to design a system whose outputs a human can sign for under audit. Pairs with briefing 02 (Verification & evaluation) and briefing 03 (Agent governance in production).
How this connects to the practice.
The research is the foundation. The practice is what we install in client teams.
The research record
Papers, repositories, playbooks. What grounds every claim. Citable, auditable, reproducible.
The five briefings
How the research applies to a buyer's stack. The Trust & Governance hub at /trust-and-governance.
The four programs
The Pilot, Team Bootcamp, Curriculum License, AI Readiness Program. Where the operating model gets installed in a team.
Honesty about the research. Three of the four papers are preprints on arXiv, not peer-reviewed publications yet. The TrustGate paper is under review at TMLR, double-blind, not accepted. The two playbooks are practitioner-facing essays, free to read, not academic publications. We list them together because they are the work that grounds the practice. We do not call them more than they are.
Want to read the papers before the briefings ship?
The TrustGate paper is the load-bearing one. The exploitation-surface paper is the one CISOs ask about first. Email, and Charafeddine sends the preprints by reply.