Skip to content

Anatomy of an Enterprise Tool Library

Most teams reach the same wall on the same Friday afternoon. A handful of MCP servers worked fine in development. Now there are fifteen, half of them touch production, and nobody can answer the basic questions — who called what, with which credential, against which policy, and where is the log. The problem is not any single server. It is that there is no shared layer underneath them.

This post is about that layer. It describes how an Enterprise Tool Library is built: the six concerns it has to merge into one runtime, the fail-closed pipeline a tool call walks through, and the two ideas — DADL and Code Mode — that make the whole thing scale to dozens of backends without bloating the agent’s context. For the business framing in German, see the companion piece on dunkel.cloud (DE).

What an Enterprise Tool Library actually is

Section titled “What an Enterprise Tool Library actually is”

The term “MCP gateway” gets used for two very different things. One is a passthrough proxy that forwards tool calls to a single upstream server. The other is a governed execution layer that fronts every backend the organization owns. Only the second one is an Enterprise Tool Library.

The distinguishing property is not the number of backends. It is structural. A library has:

  • One authentication and authorization boundary. Every tool call — no matter which upstream system it ends up hitting — passes the same identity check and the same per-tool access decision. A single source of truth for “who is allowed to call what.”
  • One coherent audit line. Every call lands in the same store, with the same fields, queryable with one SQL query. Not “logs over here, MCP server logs over there, a CSV someone exported last quarter.”
  • One shared declarative description language. REST APIs, MCP servers, internal services — all surfaced through the same definition format, so policy, retry, pagination, and credential injection are runtime concerns instead of bespoke wrapper code.
  • A constant context footprint. Adding the twentieth backend does not double the tokens the agent has to carry. The cost of “more tools” is paid by the library, not by every prompt.

A single MCP server has none of these. A passthrough gateway has the first two at best, and only for one backend at a time. An Enterprise Tool Library is what you get when those four properties hold across the whole tool surface.

ToolMesh is our open-source implementation of that shape. The rest of this post walks through how the parts fit together.

ToolMesh organizes the work into six pillars. Each one solves a concrete problem, lives in a specific package, and has a single config touchpoint.

1. Any Backend — two ingress paths, one runtime

Section titled “1. Any Backend — two ingress paths, one runtime”

The library has to front anything the organization already runs. ToolMesh supports two ingress paths into the same execution pipeline:

  • MCP backends wrap existing MCP servers via HTTP or STDIO transport. If you already have a working server, you keep it; ToolMesh wraps it with the governance layer.
  • DADL backends describe REST APIs declaratively in YAML. No custom server code, no separate runtime — the description is the integration.

Both paths register through the same internal backend.Register() interface and produce tools that look identical to the model. The backend type becomes an implementation detail; the security guarantees do not depend on it. Configuration lives in a single file:

config/backends.yaml
backends:
- name: github
type: rest
file: github.dadl
- name: cloudflare
type: rest
file: cloudflare.dadl
- name: legacy-internal
type: mcp
transport: http
url: http://internal-mcp:8080

Adding a backend is editing one file, not building a service.

2. Code Mode — flat tool surface, constant context

Section titled “2. Code Mode — flat tool surface, constant context”

A naive MCP setup exposes every tool definition directly to the model. With one backend that is fine. With twenty it is ruinous: dozens of tool schemas, each with its own parameters and descriptions, all loaded into context before the agent has even read the user’s question. Around 50,000 tokens of pure tool metadata is a normal observation in the wild.

ToolMesh sidesteps this entirely. The model sees two meta-tools — discover_tools and execute_code — and a compact TypeScript interface describing the available functions. That interface is roughly 1,000 tokens regardless of whether one or twenty backends are attached. The agent calls discover_tools once to see the type signatures, then writes JavaScript against them:

const issues = await toolmesh.github_list_issues({
owner: "DunkelCloud",
repo: "ToolMesh",
state: "open",
});
const recent = issues.filter(i => Date.now() - new Date(i.created_at) < 7*864e5);
return recent.map(i => ({ number: i.number, title: i.title }));

The execute_code runner parses that JavaScript with a Go AST walker, extracts each toolmesh.* call, and dispatches it through the normal pipeline. The model never executes arbitrary code against your infrastructure — it produces a script whose tool calls are statically extracted and authorized one by one. Code Mode is described in detail in /en/code-mode/.

3. Credential Store — runtime injection, never in prompts

Section titled “3. Credential Store — runtime injection, never in prompts”

The concern most setups quietly ignore. A typical MCP configuration puts API keys in client configs, env vars on developer laptops, or — worst case — directly in the model’s context. None of those rotate well, none of those audit well, and any of them can leak into a transcript.

ToolMesh references credentials by name in the DADL file and resolves them server-side at execution time. The model sees the tool interface and the filtered response. It never sees the token. The credential layer is pluggable through credentials.Register():

TierBackendUse case
EmbeddedEnv vars (CREDENTIAL_*)Local development, single-tenant
PlannedInfisicalCentralized secret management
PlannedHashiCorp Vault / OpenBaoEnterprise secret governance

The embedded backend is the default and ships in the binary. The hosted backends are planned through the enterprise build tag — the registration pattern (inspired by Go’s database/sql drivers) is already in place, so adding a backend does not require touching the executor.

4. OpenFGA — ReBAC, honest about the default

Section titled “4. OpenFGA — ReBAC, honest about the default”

Authorization in ToolMesh is relationship-based, powered by OpenFGA. The model is user → plan → tool: users belong to plans, plans have tool entitlements, and the executor asks OpenFGA whether the caller is allowed to invoke this specific tool before anything runs.

Two modes ship today:

OPENFGA_MODEBehavior
bypass (default)No authorization checks. Dev-only.
restrictOpenFGA enforced on every tool call. Production.

The default is bypass so that docker compose up produces a working instance without forcing every contributor through OpenFGA setup. That is a deliberate developer-experience choice, not a security claim. Production deployments switch to restrict and run ./config/openfga/setup.sh to seed the model. Defaulting to bypass is honest about where the dev surface starts; the Authorization page documents the production checklist.

5. Output Gate — deterministic JS, with room to grow

Section titled “5. Output Gate — deterministic JS, with room to grow”

Not every response should reach the model unchanged. Customer records contain PII. Internal systems return metadata the agent does not need. Error messages occasionally leak infrastructure details.

Today, ToolMesh ships Layer 1 of the gate: deterministic JavaScript policies executed in the embedded goja engine. A policy receives the response and the caller context and returns a filtered payload or a rejection:

policies/redact-pii.js
export function onResponse(response, ctx) {
if (ctx.user.callerClass === "trusted") return response;
return redactEmails(redactPhones(response));
}

Policies are reviewable code committed to the repo, not black-box configuration. The gate sits on top of a pluggable evaluator registry (gate.RegisterEvaluator()) — additional layers are planned, including LLM-based content classification for compliance use cases. Today we ship one layer. We do not pretend to ship more. See /en/output-gate/.

6. Audit — slog and SQLite, SQL-queryable

Section titled “6. Audit — slog and SQLite, SQL-queryable”

Every tool call is recorded: who called it, which tool, with what parameters, what the result was, how long it took, and whether it succeeded. ToolMesh ships two audit backends — Go’s structured slog for log-style setups, and an append-only SQLite store for queryable compliance audits.

When someone asks “what did that agent do last Tuesday between 14:00 and 15:00?”, the answer is a SELECT query, not a forensic exercise across five log aggregators.

The pillars matter, but their value comes from the order they run in. A single tool call walks this path:

┌─────────────────────────────────────────────────────────────────┐
│ Client (Claude Desktop / ChatGPT / CLI / hosted agent) │
└──────────────────────────────┬──────────────────────────────────┘
┌───────────────────────┐
│ 1. Authentication │ OAuth 2.1 PKCE / API key
│ → UserContext │ → identity established
└──────────┬────────────┘
┌───────────────────────┐
│ 2. Authorization │ OpenFGA: user × plan × tool
│ (fail closed) │ → allowed or DENY
└──────────┬────────────┘
┌───────────────────────┐
│ 3. Credential │ Resolved server-side,
│ Injection │ never exposed to the model
└──────────┬────────────┘
┌───────────────────────┐
│ 4. Output Gate (in) │ Optional pre-execution check
│ (fail closed) │ → allowed or REJECT
└──────────┬────────────┘
┌───────────────────────┐
│ 5. Execute │ MCP server or REST via DADL
│ │ → raw response
└──────────┬────────────┘
┌───────────────────────┐
│ 6. Output Gate (out) │ PII redaction, shaping
│ │ → filtered response
└──────────┬────────────┘
┌───────────────────────┐
│ 7. Audit │ Append-only, queryable
│ │ → SQLite / slog
└──────────┬────────────┘
Back to client

The pipeline is fail-closed: if AuthZ denies, nothing executes. If the gate rejects, nothing executes. If credential resolution fails, nothing executes. The default for every uncertainty is “do not run.” This is the inverse of how most ad-hoc setups behave, where a missing check silently means “proceed.”

CallerClass — the second axis nobody else has

Section titled “CallerClass — the second axis nobody else has”

Most gateways treat authorization as a single axis: who is the user? That is necessary but not sufficient. The same user, asking the same question, runs at a very different trust level when they are typing into a local Claude Desktop session versus when a hosted agent is making the call on their behalf as part of a scheduled workflow.

ToolMesh treats this as a first-class concept: CallerClass, attached to every request, with three values:

CallerClassTypical originTypical filtering
trustedLocal CLI, developer laptopMinimal — credentials redacted only
standardAuthenticated user via known clientPII redaction, schema shaping
untrustedHosted agents, CI bots, third-party clientsStrict — no admin tools, aggressive redaction

CallerClass modulates two things at once: which tools are reachable at all (an untrusted caller can be blocked from admin-level tools even if the user’s plan would normally allow them) and what the Output Gate redacts before the response reaches the model. Same backend, same user, different trust envelope — enforced automatically.

This is the part that does not exist in passthrough gateways. They model the user. They do not model “who is asking on the user’s behalf.” For agent infrastructure, that second axis is where most of the real risk lives.

Adding GitHub’s list_issues endpoint as a governed tool, in its entirety, is this:

# github.dadl
backend:
name: github
type: rest
base_url: https://api.github.com
auth:
type: bearer
credential: github_token # resolved at runtime, never in prompt
defaults:
errors:
retry_on: [429, 502, 503]
retry_strategy:
max_retries: 3
backoff: exponential
tools:
list_issues:
method: GET
path: /repos/{owner}/{repo}/issues
access: read # → mapped to plan entitlements via OpenFGA
description: "List issues in a repository"
params:
owner: { type: string, in: path, required: true }
repo: { type: string, in: path, required: true }
state: { type: string, in: query, default: open }
per_page: { type: integer, in: query, default: 30 }
pagination:
strategy: link_header

Twenty-three lines. Drop the file into the registry, reference it from backends.yaml, restart. What you get for free:

  • Authentication — OAuth or API key on the inbound side, bearer token injection on the outbound side, both handled by the runtime.
  • Authorizationaccess: read slots the tool into the OpenFGA model; the request is denied before execution if the caller’s plan does not include it.
  • Credential isolationgithub_token is resolved from the credential store at execution time; the model never sees the value.
  • Retry and pagination — declared in the description, executed by the runtime.
  • Audit — every call lands in the audit store with caller, parameters, status, latency.
  • Output filtering — the gate runs on the response before the model sees it, applying whatever policy applies to the caller’s class.

None of this is wrapper code anyone wrote for GitHub specifically. It is what the runtime does for every DADL-described tool. Building the same governance around a hand-rolled MCP server is several hundred lines of boilerplate per backend — work most teams ship without and regret later.

The companion post Stop Rebuilding REST API Wrappers for MCP goes deeper on the DADL format and what disappears when you adopt it.

Most gateways scale linearly with the number of backends — every new server adds tool definitions that ride along in every prompt. Twenty backends, twenty schema bundles, fifty thousand tokens before the user has typed a word.

Code Mode breaks that scaling. The model sees discover_tools and execute_code, plus a compact TypeScript interface that lists what is available. Adding a backend grows the interface description, but not the per-request overhead — the agent looks up tools on demand instead of carrying them all the time.

In practice, a ToolMesh instance fronting the current public DADL registry — 23 APIs, 2,982 individual tools across GitHub, Cloudflare, GitLab, Stripe, Hetzner Cloud, Linode, NetBox, DeepL and others — surfaces to the model as roughly the same context cost as a single hand-built MCP server with a dozen tools would. That ratio is the entire point. Without it, the library shape does not work at all.

ToolMesh is early-stage. Some of the architecture is shipped, some is in motion. Being explicit about that is more useful than being aspirational.

Shipped today:

  • All six pillars wired through the fail-closed pipeline
  • Authentication (OAuth 2.1 PKCE, API keys, multi-user via users.yaml)
  • Authorization (OpenFGA, bypass / restrict modes)
  • Credential store with embedded env-var backend
  • Output Gate Layer 1 (deterministic goja JS policies)
  • Audit via slog and SQLite
  • MCP and DADL backends, registry-based
  • Code Mode with AST-extracted tool calls

Deliberate developer defaults — switch off for production:

  • OPENFGA_MODE=bypass (no authorization)
  • Debug logging on
  • HTTP+TLS reverse proxy assumed in front for TLS termination

These defaults make docker compose up work cleanly in development. They are documented as not-production. The Configuration page lists every variable and its production recommendation.

Planned, registered but not yet shipped:

  • Infisical and HashiCorp Vault credential backends (extension point exists)
  • Additional gate layers (LLM-based classification, compliance filters)
  • Expanded DADL registry coverage (community contributions welcome)

Calling these “planned” is the right level of confidence. The registration mechanics are in place, the work to wire them in is ahead of us.

This article is the architectural reference. The detailed connector walkthroughs — how to populate NetBox from cloud APIs, how to put OPNsense behind a tool layer, how to integrate the HashiCorp stack — live in their own posts. The current installments:

If you operate infrastructure that would make a good worked example, the DADL registry is the place to start — most APIs that have OpenAPI specs can become DADL files in an afternoon.

The fastest path from this article to a running instance:

  1. Hosted demo. Try the Demo — a managed ToolMesh with a curated set of backends, no setup.
  2. Self-host quickstart. Clone the repo, docker compose up, point your MCP client at http://localhost:8080. The full walkthrough is in Getting Started.

An Enterprise Tool Library is not a library of tools. It is the layer underneath them that makes “any backend” a property of the runtime instead of a property of every wrapper. The six pillars, the fail-closed pipeline, and the CallerClass axis are what turn a pile of MCP servers into something an enterprise can actually operate.