From drag-and-drop to autonomous agents: how DataGen’s HTTP API, MCP, and x402 fit together
Synthetic Data Generator started as a product you drive in the browser: a schema builder, AI-assisted layouts, and exports you can trust for demos, QA, and staging. The next chapter is machine-native: the same quality of synthetic data, requested from code, agents, and IDEs—sometimes with no human account setup at all, thanks to HTTP 402 Payment Required and the experimental x402 payment flow.
TL;DR
- Humans keep using the web app at datagen.gptlab.ae.
- Developers & agents with a key call
/api/v1/…withX-API-Key(free tier key: no email—see /api/v1/auth/free-key). - Keyless agents can call generation endpoints without a key, receive 402 + payment instructions, pay a small USDC amount on Base (exact network is deployment-configured), then retry with
X-PAYMENT. - Two generation modes: AI mode (column names only) vs schema mode (explicit types)—same product, two contracts.
- MCP exposes the same split as named tools for Cursor, Claude Desktop, and other MCP hosts (availability for organizations is described on the MCP page).
OLD way vs NEW way (same product, different driver)
Human-led: you open the site, design a schema or use AI mode in the UI, preview, and download. Powerful, visual, and explicit—still the best experience when a person is in the loop.
Agent-led: a script or LLM orchestrates discovery, validation, and generation over HTTP—or calls MCP tools—without opening the browser. That is where capabilities, validate, two generation modes, and optionally x402 matter.
Discovery: what “capabilities” means (not a saved-schema dump)
Agents should start with GET /api/v1/capabilities. That response is a tier-scoped manifest: which field types you may use, row limits, and other guardrails for the key (or environment) you are using.
It is not a catalog of every saved schema in your account—it is the machine-readable “rules of the road” so the agent does not propose impossible types or batch sizes.
For documentation depth, GET /api/v1/field-types lists the canonical type vocabulary used in schema mode.
Contract: validate before you spend tokens or money
POST /api/v1/validate is a dry run: send the same schema_fields shape you intend for /generate, and the API tells you whether the layout is acceptable. It is the right place to establish a data contract between agent and service before rows are materialized.
Two generation modes (the mental model)
We designed the API so agents can stay lazy about typing until they truly need control.
AI mode /api/v1/generate-ai
Send field_names plus optional domain_hint, locale, and count. The service infers types and fills realistic rows—ideal when the caller only knows column headers or user intent.
Schema mode /api/v1/generate
Send schema_fields with catalog type keys and optional constraints—ideal when QA, security, or compliance needs a fixed, reviewable layout.
Settlement: HTTP 402 + x402 (keyless pay-per-call)
There are two legitimate ways to call generation endpoints:
- With an API key — obtain a free-tier key from
/api/v1/auth/free-key(rate-limited per client IP), then sendX-API-Keyon each call. This is the best default for humans and long-lived integrations. - Without a key (agent commerce) — omit
X-API-KeyonPOST /api/v1/generate,POST /api/v1/generate-ai, orPOST /api/v1/generate-ai/stream. The API responds with HTTP 402 and a JSON payload describing accepted payment methods (USDC on Base in production-style configs; test deployments may use Base Sepolia—check thenetworkfield your client receives).
How x402 maps to “about $0.002”
The 402 body includes a maxAmountRequired in the smallest USDC unit (6 decimals). For smaller batches, the first pricing tier corresponds to 0.002 USDC per generation request; larger count values move into higher tiers.
After the wallet pays, the client retries the same HTTP request with an X-PAYMENT header carrying the settlement proof. Think of it as pay-per-delivery, not a monthly subscription.
Behind the scenes, verification uses a facilitator service (configurable per deployment). Redis may record that a payment hash was consumed so proofs are not replayed.
MCP: the same architecture as native tools
Model Context Protocol lets assistants and IDEs call DataGen without hand-writing curl for every task. The important idea is parity: MCP tools are thin, well-named wrappers over the same REST routes—so documentation you trust for HTTP stays true in MCP.
Prefer datagen_generate_ai when the model only has column names; prefer datagen_validate followed by datagen_generate when someone hands in a strict schema or you need richer export formats.
Enterprise note
Hosted MCP may be offered as an organization enablement. If you are evaluating MCP for a team, read the MCP integration overview and use Feedback on the home page to reach us.
Agent-friendly errors (optional)
For clients that can self-correct, send:
Accept: application/json, application/vnd.agentic+json
on validate, generate, and AI endpoints. The API can return structured hints (message + correction guidance) instead of only opaque 4xx bodies.
What we built with
The public stack includes FastAPI, Pydantic v2, Ollama for AI-assisted inference where configured, Qdrant and Redis for supporting services, n8n workflows for parts of the product surface, and the experimental x402 flow for keyless settlement on generation.
This article describes product behavior at a conceptual level. Always refer to OpenAPI and your deployment’s live responses for authoritative paths, headers, and pricing fields.
Comments
Discussion is powered by Utterances (GitHub sign-in).