MCP integration — Synthetic Data Generator

Developer documentation

Model Context Protocol (MCP)

MCP exposes DataGen as first-class tools inside MCP-aware hosts. We designed the tool surface for agentic use: the assistant can start from plain column names and still reach the same quality bar as a carefully authored schema—while power users keep full control when they need it.

Under the hood, MCP maps cleanly to the same two modes as the REST API: AI generation (column names → inferred types) and schema generation (explicit field definitions). Your API key and tier limits apply the same way whether the host calls HTTP directly or invokes MCP tools.

Two MCP tool paths (mirrors the HTTP API)

Hosts discover tools by name. Prefer datagen_generate_ai when the conversation only has headers or a rough idea of a table; use datagen_validate + datagen_generate when someone pastes a schema or you need CSV/XML and tight type control.

Agent path datagen_infer_schema + datagen_generate_ai

The model can first call datagen_infer_schema to get a contract hash, then call datagen_generate_ai with expected_contract_hash, require_validate, and strict_contract. Use datagen_generate_ai_stream when the host should show progressive status from SSE.

Example: user says “I need fake rows for sku, warehouse_bin, qty_on_hand”—the assistant calls one tool instead of hand-authoring types.

Schema path datagen_generate

The model (or a human) supplies schema_fields with catalog type keys—same JSON you would post to /api/v1/generate. datagen_validate runs first when you want the assistant to catch bad types before generation.

Example: “Generate 200 CSV rows exactly like this JSON schema” for a regression pack or data contract.

Illustrative tool arguments (conceptual)

# Agent path — contract-first
datagen_infer_schema(
  field_names=["invoice_id", "vat_trn", "amount_aed", "issued_at"],
  locale="en_AE",
  domain_hint="Gulf B2B invoices for QA"
)
datagen_generate_ai(
  field_names=["invoice_id", "vat_trn", "amount_aed", "issued_at"],
  count=12,
  locale="en_AE",
  domain_hint="Gulf B2B invoices for QA",
  expected_contract_hash="hash_from_datagen_infer_schema",
  require_validate=true,
  strict_contract=true
)

# Schema path — full control (truncated)
datagen_validate(schema_fields=[
  {"name": "invoice_id", "type": "uuid"},
  {"name": "amount_aed", "type": "amount"}
], count=12)
# then
datagen_generate(schema_fields=[...], count=12, output_format="csv")

MCP tools accept structured_errors=true so the host can request application/vnd.agentic+json-style hints from the underlying API—handy for self-healing agent loops. Use Idempotency-Key on repeated datagen_generate_ai calls when your client supports it.

Downloadable MCP samples

Availability

MCP access is a paid option and is turned on when your organization requests it. Use Feedback on the home page or your usual DataGen contact to ask about enabling MCP for your team. When access is approved, we send connection and setup steps suited to your environment—what follows here is a high-level summary, not a substitute for that package.

What you can achieve

Agent-first generation with contracts — datagen_infer_schema then datagen_generate_ai / datagen_generate_ai_stream for column-name requests plus optional contract locks.
Schema-backed generation — datagen_validate then datagen_generate for explicit types, locales, constraints, and richer output_format options.
Catalog and guardrails — datagen_capabilities, datagen_field_types, datagen_health, plus resources such as datagen://catalog/field-types so the model grounds prompts in what your tier allows.
Ready-made server instructions — The MCP server nudges hosts toward health → capabilities → validate → generate, mirroring how a careful engineer sequences the REST API.

Same deployment as the API (hosted MCP)

On DataGen’s cloud deployment, the REST API and MCP share the same site and TLS certificate you already trust for the app. Your integrations use the same kind of base URL as for the API—for example https://datagen.gptlab.ae—while MCP’s entry path is /mcp/sse (your host’s docs may call this “remote MCP”, “SSE URL”, or similar).

Send the same X-API-Key you use for API calls on MCP requests so tools and resources run under your tier and limits. There is no separate “MCP secret”; it is one product surface with two protocols. After you subscribe or receive keys from us, use that key with the hosted MCP entry https://datagen.gptlab.ae/mcp/sse (or your contract’s hostname with the same /mcp/sse path).

How connection works (simple picture)

Your MCP host either opens a hosted MCP session to our HTTPS endpoint above, or—if your enablement pack includes it—starts a small local connector that still calls the same DataGen APIs over the network. Either way, generation, validation, and AI behavior follow the same rules as the browser app and /api/v1/....

What you need (at a glance)

MCP enabled for your organization — we confirm this when you request access.
An API key for your tier (issued or rotated as part of onboarding). For general API key help, see the API overview.
An MCP-capable host — whatever your organization uses (popular examples include coding assistants and desktop AI apps; other MCP clients work too).
The enablement steps we send you — exact commands, paths, and any prerequisites depend on your IT policy; they are not duplicated here so this page stays accurate when we update packaging.

How configuration looks

Hosted MCP: your product’s MCP settings may ask only for a server URL (for example https://datagen.gptlab.ae/mcp/sse) and a way to attach X-API-Key—same pattern as attaching an API key to REST clients. Follow that product’s guide for remote or URL-based MCP; paths stay on our domain.

Optional local connector: some teams use a tiny stdio process we provide so the host launches a subprocess instead of a raw URL. That flow uses environment variables such as DATAGEN_API_BASE and DATAGEN_API_KEY so the connector knows which DataGen deployment and which key to use—details stay in your enablement pack so they stay accurate.

Illustrative JSON shape for hosts that register a subprocess (placeholders only):

{
  "mcpServers": {
    "your_label_here": {
      "command": "…",
      "args": ["…from your enablement pack…"],
      "env": {
        "DATAGEN_API_BASE": "https://datagen.gptlab.ae",
        "DATAGEN_API_KEY": "…your-key…"
      }
    }
  }
}

After setup, DataGen tools appear however your host exposes MCP (chat, IDE, agents, automation, and other supported patterns). Use your enablement steps together with your host’s MCP documentation.

Relationship to the API

MCP is a companion to the HTTP API: same rules, same keys, same limits. For browsing endpoints interactively, use the API overview and the live docs linked there.

API overview Back to app