OpenTerms Specification
Machine-readable legal terms for the agentic web. Define what AI agents can and can't do on your service — in a format they can actually parse.
OpenTerms is a JSON-based protocol that lets services declare their terms of service, permissions, rate limits, and usage constraints in a structured format. Think of it as robots.txt for AI agent behavior.
Since v0.2.0: JSON-LD support (@context), policy identifiers (policy_id), data residency, permission scoping/frequency, and an extensions namespace.
New in v0.3.0: discovery object — machine-readable pointers to MCP servers and OpenAPI specs. Plus formalized extensions.com.openterms.meta namespace for generator and provenance metadata.
For Agents
OpenTerms is designed to be queried by agents at runtime — before acting on a third-party domain. This section covers everything an agent needs: what to check, how to interpret results, and when to escalate.
Primary endpoint: GET https://openterms.com/api/v1/check?domain={domain}&action={action}
Seven canonical actions: read_content, scrape_data, api_access, create_account, make_purchases, post_content, allow_training.
What to check before acting
- Fetch permissions — call the check API with the target domain and the specific action you intend to take.
- Interpret the result —
truemeans allowed,falsemeans denied,nullmeans not specified. Conditional objects mean allowed or denied only under stated conditions. - Apply fail-closed logic for null and low confidence — if the result is
nullor confidence islow, treat as denied unless your policy explicitly permits proceeding with unknowns. - Log the check — record domain, action, decision, confidence, and timestamp in your own internal logs.
- Escalate when necessary — for conditional results, low-confidence results, or high-stakes actions, surface the check result to a human rather than auto-deciding.
API example
The API checks whether a domain publishes an openterms.json file. During the alpha, most domains have not yet adopted the standard, so the typical response is no_openterms_json:
GET https://openterms.com/api/v1/check?domain=github.com&action=read_content
{
"success": true,
"domain": "github.com",
"action": "read_content",
"result": { "value": null, "confidence": null, "caveats": [] },
"permission": "not_specified",
"confidence": 0,
"error": "no_openterms_json",
"message": "No openterms.json found at github.com."
}
This is a correct, handled response — not an error. It means the domain has not published machine-readable permissions. Agents should treat this as unresolved and fall back to other information sources.
When a domain does publish openterms.json, the response includes structured permission data:
GET https://openterms.com/api/v1/check?domain=example.com&action=read_content
{
"success": true,
"domain": "example.com",
"action": "read_content",
"result": {
"value": true,
"confidence": "high",
"caveats": []
},
"permission": "allowed",
"confidence": 0.99,
"policy_excerpt": "Explicitly allowed.",
"checked_at": "2026-05-02T21:00:00.000Z"
}
Decision policy
Use this table to map any check result to an agent action:
| Result value | Confidence | Agent action | Notes |
|---|---|---|---|
true |
high / medium | Proceed | Permission explicitly granted. Log the check and continue. |
true |
low | Escalate | Result may be inaccurate. Escalate to human or use conservative default. |
false |
any | Block | Permission explicitly denied. Do not proceed. Log the denial. |
null |
any | Escalate or Block | Not specified in the domain's openterms.json. Fail closed: treat as denied unless your policy allows proceeding with unknowns. |
| conditional object | high / medium | Check conditions | allowed: true with conditions — read the conditions field. Only proceed if conditions are met. allowed: false — block. |
| conditional object | low | Escalate | Low confidence on conditional result. Escalate to human review. |
Fail-closed recommendation: When result.value is null or result.confidence is "low", treat the permission as denied by default. Proceeding on unknowns shifts responsibility to the agent operator and may violate terms that were simply not indexed.
Logging guidance
Log every permission check. Minimum fields:
- domain — the domain checked
- action — the permission key (e.g.
scrape_data) - decision —
allowed,denied, ornot_specified - confidence — numeric score (0.0–1.0)
- checked_at — ISO 8601 timestamp from the response
- agent_id — identifier for the agent or run performing the check
Human escalation guidance
Escalate to a human when:
- Result is
null(not specified) and the action is high-stakes (purchases, account creation, content posting) - Confidence is
low - Result is a conditional object and the conditions cannot be programmatically verified
- The domain is newly indexed or unvalidated (
validation_status: "unvalidated")
Copy/Paste Agent Instructions
Copy this block directly into your agent's system prompt or instructions file. Customize the action list to match your use case.
Before performing any action on a third-party domain, check OpenTerms to verify
that the action is permitted by that domain's terms of service.
Check endpoint:
GET https://openterms.com/api/v1/check?domain={domain}&action={action}
Canonical actions to check:
read_content — read publicly available content
scrape_data — bulk-download or scrape data from the domain
api_access — access the domain's API programmatically
create_account — create a user account on the domain
make_purchases — make purchases or financial transactions
post_content — post, publish, or submit content
allow_training — use the domain's published content for model training
Decision policy:
result.value = true → allowed. Proceed. Log the check.
result.value = false → denied. Block. Do not proceed.
result.value = null → not specified. Fail closed. Treat as denied.
result.value = {object} → conditional. Read the conditions field.
Only proceed if allowed=true and conditions are met.
result.confidence = "low" → escalate to human regardless of value.
Logging (required for every check):
Log: domain, action, decision, confidence, checked_at, agent_id
Human escalation (required when):
- result.value is null for high-stakes actions (purchases, account creation, post_content)
- result.confidence is "low"
- result is a conditional object and conditions cannot be verified programmatically
- The domain is unvalidated
OpenTerms is one input to your decision — it does not replace legal review for
regulated use cases. When in doubt, escalate.
Common Agent Action Examples
Full check–interpret–decide flows for the most common agent operations.
Scrape a pricing page
import requests # 1. Check resp = requests.get( "https://openterms.com/api/v1/check", params={"domain": "acme.com", "action": "scrape_data"} ) check = resp.json() # 2. Interpret value = check["result"]["value"] confidence = check["result"]["confidence"] # 3. Decide if value is True and confidence != "low": # Proceed log_check("acme.com", "scrape_data", "allowed") scrape_pricing_page("https://acme.com/pricing") elif value is False: # Block log_check("acme.com", "scrape_data", "denied") raise PermissionDenied("scrape_data denied on acme.com") else: # null or low confidence → escalate escalate_to_human("acme.com", "scrape_data", check)
Call an API
$ curl "https://openterms.com/api/v1/check?domain=stripe.com&action=api_access" # result.value = true, confidence = 0.75 → "low" tier # Decision: escalate (api_access has <80% accuracy — human review recommended)
Create an account
check = openterms_check("example.com", "create_account") if check["result"]["value"] is None: # null = not specified — fail closed for account creation escalate_to_human("create_account on example.com: not specified in openterms.json") elif check["result"]["value"] is True: create_account("example.com") else: raise PermissionDenied("create_account denied")
Post content
check = openterms_check("forum.example.com", "post_content") result = check["result"] if isinstance(result["value"], dict): # Conditional — check the conditions if result["value"]["allowed"]: print(result["value"]["conditions"]) # e.g. "Must disclose AI authorship" escalate_for_human_confirmation() else: raise PermissionDenied("post_content denied under conditions") elif result["value"] is True: post_content("forum.example.com", content) else: raise PermissionDenied()
Use content for model training
check = openterms_check("dataset.example.com", "allow_training") result = check["result"] # allow_training has ~50% accuracy — always escalate regardless of value if result["confidence"] == "low": escalate_to_human( "allow_training has low confidence accuracy — human review required" ) elif result["value"] is True: add_to_training_dataset("dataset.example.com") elif result["value"] is False: raise PermissionDenied("allow_training denied") else: # null — fail closed for training data raise PermissionDenied("allow_training not specified — fail closed")
Quick Start
Create an openterms.json file and host it at the root of your domain:
{
"$schema": "https://openterms.com/schema/openterms.schema.json",
"openterms_version": "0.3.0",
"service": {
"name": "Your Service",
"domain": "yourservice.com",
"tos_url": "https://yourservice.com/terms"
},
"permissions": {
"read_content": true,
"scrape_data": false,
"api_access": true,
"create_account": false,
"make_purchases": false,
"post_content": false,
"allow_training": null
},
"requires_consent": true,
"jurisdiction": "US",
"contact": "legal@yourservice.com",
"last_updated": "2025-06-01"
}
That's it. AI agents fetch https://yourservice.com/openterms.json before taking any action, just like crawlers check robots.txt.
Pro tip: Add the $schema field to get auto-completion and inline validation in VS Code, JetBrains, and other editors that support JSON Schema.
How It Works
- Services publish an
openterms.jsonat their domain root (or any discoverable URL) - Agents query the Permission Check API before taking actions — permissions are structured data, not legalese
- Agents act on the result — proceed, skip, or surface to a human depending on the returned permission value
Core Fields
| Field | Type | Status | Description |
|---|---|---|---|
openterms_version |
string |
Required | Spec version (e.g. "0.3.0"). Semver format. |
service |
string | object |
Required | Service info. Shorthand: "acme.com". Full: object with name, domain, tos_url, privacy_url, description, logo_url. |
permissions |
object |
Required | What agents can do. See Permissions section. |
$schema |
string (URI) |
Optional | Self-referencing schema URI. Enables editor auto-completion. |
@context |
string | object |
New | JSON-LD context for semantic web / linked data interoperability. |
policy_id |
string |
New | Globally unique identifier for this terms document. Used to reference a specific version of terms in external tooling. |
requires_consent |
boolean |
Optional | Must the agent obtain explicit consent before acting? |
jurisdiction |
string | string[] |
Optional | ISO 3166-1/2 jurisdiction code(s). E.g. "US-DE", ["US-CA", "EU"]. |
contact |
string | object |
Optional | Legal contact. Shorthand: email string. Full: object with email, name, url. |
last_updated |
string (date) |
Optional | ISO 8601 date when terms were last modified. |
expires |
string (date) |
Optional | Date these terms expire. Agents should re-fetch after this date. |
discovery |
object |
v0.3.0 | Machine-readable pointers to MCP servers and API specs. See Discovery section. |
extensions |
object |
Optional | Namespace for custom or industry-specific fields. Use reverse-domain keys. See Extensions. |
Permissions
The permissions object defines what AI agents can do. Each value is either:
true— allowed unconditionallyfalse— denied- Conditional object — allowed with conditions
Standard Permissions
| Permission | Description |
|---|---|
read_content | Read publicly available content |
scrape_data | Scrape or bulk-download data |
api_access | Access the service's API programmatically |
create_account | Create user accounts programmatically |
make_purchases | Make purchases or financial transactions |
post_content | Post, publish, or submit content |
allow_training |
Whether external parties may train AI/ML models on the site's published content (Semantics A). Specifically covers third-party use of site-owned content for model training. Does not cover the service's internal use of user data for its own model improvement (Semantics B — Privacy Policy scope), nor whether the site trains on data submitted by agents (Semantics C — addressed in v0.4.0 proposal). |
The active schema defines exactly seven canonical permission keys: read_content, scrape_data, api_access, create_account, make_purchases, post_content, allow_training. The schema uses additionalProperties: false — additional permission keys are not recognized by the current spec.
Conditional Permission Object
{
"make_purchases": {
"allowed": true,
"conditions": "Max $500/day. Agent must be linked to verified human.",
"requires_auth": true,
"max_frequency": "50/day",
"scope": "authenticated"
}
}
| Field | Type | Description |
|---|---|---|
allowed | boolean | Required Whether the permission is granted. |
conditions | string | Human-readable conditions or restrictions. |
requires_auth | boolean | Whether this permission requires authentication. |
max_frequency | string | Rate limit for this specific action. E.g. "10/hour", "100/day". New |
scope | string | What data subset this applies to. E.g. "public", "authenticated", "premium". New |
Rate Limits
{
"rate_limits": {
"requests_per_minute": 60,
"requests_per_hour": 1000,
"requests_per_day": 10000,
"concurrent_sessions": 5
}
}
All fields are optional integers. concurrent_sessions (since v0.2.0) limits how many simultaneous agent connections are allowed.
Data Handling
{
"data_handling": {
"stores_agent_data": true,
"shares_with_third_parties": false,
"retention_days": 90,
"gdpr_compliant": true,
"ccpa_compliant": true,
"hipaa_compliant": false,
"data_residency": ["US", "EU"]
}
}
| Field | Type | Description |
|---|---|---|
stores_agent_data | boolean | Stores data about agent interactions? |
shares_with_third_parties | boolean | Shares agent data with third parties? |
retention_days | integer | Days data is retained. 0 = no retention. |
gdpr_compliant | boolean | GDPR compliant? |
ccpa_compliant | boolean | CCPA compliant? |
hipaa_compliant | boolean | HIPAA compliant? New |
data_residency | string | string[] | Where data is stored. ISO codes. New |
Authentication
{
"authentication": {
"required": true,
"methods": ["api_key", "oauth2"],
"registration_url": "https://acme.com/developers",
"docs_url": "https://docs.acme.com/auth"
}
}
Supported methods: api_key, oauth2, bearer_token, basic_auth, mTLS New, none.
The docs_url field (since v0.2.0) — link directly to your auth documentation for faster agent onboarding.
Verification
Since v0.2.0. Optional verification metadata — digital signatures, policy hashes, and JWKS endpoints — for service operators who want to publish a signed, versioned policy document.
{
"verification": {
"jwks_url": "https://acme.com/.well-known/jwks.json",
"signing_algorithm": "Ed25519",
"policy_hash": "a1b2c3d4e5f6..."
}
}
| Field | Type | Description |
|---|---|---|
jwks_url | string (URI) | URL to your JWKS endpoint for verifying signed policy documents. |
signing_algorithm | string | One of: Ed25519, RS256, ES256. |
policy_hash | string | SHA-256 hash of the canonical terms document. 64 hex chars. |
The policy_id field in openterms.json provides a stable identifier for this terms document. It can be referenced externally by companion tooling, but is not required for the core permission check workflow.
Extensions
The extensions object is a namespace for custom or industry-specific fields. Use reverse-domain notation to avoid conflicts:
{
"extensions": {
"health.hipaa.baa_required": true,
"health.hipaa.audit_log_url": "https://acme.com/api/audit",
"com.acme.internal_tier": "enterprise",
"org.fintech.pci_dss_level": 1
}
}
Extensions are free-form — any JSON value is accepted. This keeps the core schema stable while allowing domain-specific needs.
com.openterms.meta (v0.3.1)
extensions.com.openterms.meta is the first official OpenTerms namespace. It records provenance — how and where this file was created. This namespace is placed inside extensions rather than at root level because the root schema uses additionalProperties: false to ensure forward compatibility and strict validation.
Why not a top-level field? The root schema is intentionally closed. New root fields require a spec version bump and breaking changes. The extensions namespace lets tools add structured metadata without modifying the core spec contract.
{
"extensions": {
"com.openterms.meta": {
"source": "self",
"generator": "openterms.com/v0.3.0"
}
}
}
| Field | Type | Description |
|---|---|---|
source | string | Origin of this file. Typically "self" (written by the domain owner) or "openterms.com" (auto-generated). |
generator | string | Tool or service that generated this file, in reverse-domain/version format. E.g. "openterms.com/v0.3.0". |
generated_at | string (datetime) | ISO 8601 timestamp of when this file was generated. E.g. "2025-06-01T12:00:00Z". |
The validator displays a note when com.openterms.meta is present, indicating the file was auto-generated and showing the generator version.
Discovery (v0.3.0)
The discovery object positions openterms.json as both the legal permissions layer and the technical discovery entry point for a domain's agent-facing infrastructure. It provides machine-readable signposts to existing technical resources — MCP servers, OpenAPI specs — that an agent can connect to directly.
Discovery does not describe what those servers do. It points to them. OpenTerms doesn't duplicate what MCP manifests or OpenAPI specs already define. It simply says: "these endpoints exist and are permitted."
| Field | Type | Description |
|---|---|---|
mcp_servers |
array |
List of MCP (Model Context Protocol) server endpoints. Each entry has url, transport, and optional description. |
api_specs |
array |
List of API specification documents. Each entry has url, type, and optional description. |
MCP Server entry fields
| Field | Required | Values | Description |
|---|---|---|---|
url | Required | string (URI) | URL of the MCP server endpoint. |
transport | Required | "sse" | "stdio" | "streamable-http" | Transport protocol used by this MCP server. |
description | Optional | string | Human-readable summary of what this server provides. |
API Spec entry fields
| Field | Required | Values | Description |
|---|---|---|---|
url | Required | string (URI) | URL to the API specification document. |
type | Required | "openapi_3" | "swagger_2" | "graphql_schema" | The specification format. |
description | Optional | string | Human-readable description of this API spec. |
Complete v0.3.0 Example
A full openterms.json showing both permissions and discovery populated:
{
"$schema": "https://openterms.com/schema/openterms.schema.json",
"openterms_version": "0.3.0",
"service": "acme-corp.com",
"permissions": {
"read_content": true,
"scrape_data": false,
"api_access": {
"allowed": true,
"requires_auth": true,
"max_frequency": "1000/hour"
}
},
"discovery": {
"mcp_servers": [
{
"url": "https://acme-corp.com/mcp/sse",
"transport": "sse",
"description": "Provides tools for checking order status and inventory."
}
],
"api_specs": [
{
"url": "https://api.acme-corp.com/v1/openapi.json",
"type": "openapi_3",
"description": "Full REST API for catalog and user management."
}
]
}
}
Discovery is a signpost, not a description layer. MCP servers already have manifests. OpenAPI specs already describe endpoints. The discovery field simply makes those resources findable via a single, standardized location — no duplicated documentation required.
Examples
Complete, validated examples for common use cases:
| Use Case | File | Key Features |
|---|---|---|
| SaaS API | saas-api.json |
Full API with OAuth, rate limits, conditional purchases, sandboxed code execution |
| E-Commerce | ecommerce.json |
Purchase limits, product scraping with conditions, multi-jurisdiction |
| Social Platform | social-platform.json |
AI disclosure requirements, DM opt-in, frequency limits per permission |
| Open/Public API | open-api.json |
Minimal restrictions, high rate limits, no auth required |
| Healthcare (HIPAA) | healthcare.json |
HIPAA-scope fields, BAA requirement, extensions namespace, mTLS auth |
Load any example directly in the Validator to explore it interactively.
Adoption Guide
Step 1: Create your openterms.json
Start with the Quick Start template. Add permissions that match your service's terms of service. Be explicit — false is better than omitting a permission.
Step 2: Host it
Place the file at https://yourdomain.com/openterms.json — the standard discovery path. Alternatively, reference it from your existing robots.txt:
# AI Agent Terms OpenTerms: https://yourdomain.com/openterms.json
Step 3: Validate
Use the interactive validator or the programmatic API:
curl -X POST https://openterms.com/api/validate \
-H "Content-Type: application/json" \
-d '{"content": <your openterms.json>}'
Step 4: Keep it updated
Update last_updated whenever you change terms. Set expires to force agents to re-fetch periodically.
Framework Integrations
Add OpenTerms permission checks to any agent framework using a single HTTP call. All examples below use GET /api/v1/check directly — no SDK required.
openterms.json file. The API returns result: "no_openterms_json" for those domains — this is the expected response, not an error. See the handling section below.
curl — one-liner check
The simplest integration: a single HTTP GET before any automated action.
$ curl -s "https://openterms.com/api/v1/check?domain=example.com&action=scrape_data" { "success": true, "domain": "example.com", "action": "scrape_data", "result": "no_openterms_json", "checked_at": "2026-05-08T14:00:00.000Z" }
During public alpha, result: "no_openterms_json" is the expected response for most domains. The domain simply hasn't published an openterms.json yet. See the README for current usage examples: openterms-py on GitHub.
Handling no_openterms_json
Your integration should handle three result states:
| Result | Meaning | Recommended action |
|---|---|---|
allowed |
Domain has openterms.json; action is permitted | Proceed |
denied |
Domain has openterms.json; action is not permitted | Skip or surface to user |
no_openterms_json |
Domain hasn't published openterms.json yet — expected during public alpha | Fall back to your own default policy |
Python (requests)
Direct HTTP call — no third-party SDK needed:
import requests def check_action(domain: str, action: str) -> str: """Returns 'allowed', 'denied', or 'no_openterms_json'.""" resp = requests.get( "https://openterms.com/api/v1/check", params={"domain": domain, "action": action}, timeout=10, ) resp.raise_for_status() data = resp.json() return data.get("result", "no_openterms_json") # Usage — works whether or not the domain has published openterms.json status = check_action("example.com", "scrape_data") if status == "denied": print("Action not permitted by site policy.") elif status == "no_openterms_json": print("No openterms.json found — apply your default policy.") else: print("Permitted.")
Node.js (fetch)
Works with Node 18+ native fetch or any HTTP client:
// Returns 'allowed', 'denied', or 'no_openterms_json' async function checkAction(domain, action) { const url = `https://openterms.com/api/v1/check?domain=${domain}&action=${action}`; const res = await fetch(url); if (!res.ok) throw new Error(`OpenTerms check failed: ${res.status}`); const data = await res.json(); return data.result ?? 'no_openterms_json'; } // Usage const status = await checkAction('example.com', 'scrape_data'); if (status === 'denied') { console.log('Action not permitted by site policy.'); } else if (status === 'no_openterms_json') { console.log('No openterms.json found — apply your default policy.'); }
LangChain / tool wrapper
Drop a permission check into any tool-calling framework as a pre-execution guard. The pattern below uses the HTTP API directly — substitute your framework's HTTP client:
import requests # Framework-agnostic guard — call this before executing any agent action def openterms_guard(domain: str, action: str) -> dict: resp = requests.get( "https://openterms.com/api/v1/check", params={"domain": domain, "action": action}, timeout=10, ).json() result = resp.get("result", "no_openterms_json") # During public alpha, no_openterms_json is normal — domain hasn't published yet return { "permitted": result == "allowed", "result": result, "domain": domain, "action": action, } # Wire into your framework — example shows a generic tool-call pattern # See the README for current usage examples with specific frameworks.
CrewAI
crewai-openterms is an independent community package that provides CrewAI-compatible tools for the OpenTerms permission check API.
Install the package:
$ pip install crewai-openterms
args_schema requirement: CrewAI tools must declare a Pydantic args_schema so the framework can validate inputs before invoking the tool. crewai-openterms ships with schemas pre-defined — pass your domain and action arguments to the tool call and the schema validation is handled automatically.
from crewai import Agent, Task, Crew from crewai_openterms import OpenTermsCheckTool # Instantiate the permission-check tool check_tool = OpenTermsCheckTool() # Wire it into a CrewAI agent as a pre-action guard agent = Agent( role="Web Research Agent", goal="Check site permissions before scraping", backstory="Respects publisher terms before acting.", tools=[check_tool], verbose=True, ) task = Task( description="Check whether scrape_data is permitted for example.com", expected_output="Permission result from OpenTerms API", agent=agent, ) # The tool calls GET https://openterms.com/api/v1/check internally crew = Crew(agents=[agent], tasks=[task]) result = crew.kickoff( inputs={"domain": "example.com", "action": "scrape_data"} )
PyPI: pypi.org/project/crewai-openterms — GitHub: github.com/jstibal/crewai-openterms
LangChain
langchain-openterms provides permission-aware tools for LangChain agents.
Install the package:
$ pip install langchain-openterms # With openterms-py SDK (recommended — requires openterms-py>=0.3.1): $ pip install "langchain-openterms[sdk]"
Three integration patterns ship in the package:
OpenTermsGuard— wraps any LangChain tool; blocks the wrapped tool unless the check returnsallowOpenTermsChecker— standalone tool an agent can call to check permissionsOpenTermsCallbackHandler— passive observer; logs permission checks without blocking (monitoring only)
from langchain_openterms import OpenTermsGuard # Wrap any LangChain tool — fail-closed by default # from langchain_community.tools import BraveSearch # search = BraveSearch.from_api_key(api_key="...", search_kwargs={"count": 3}) guarded_search = OpenTermsGuard( tool=search, action="read_content", ) result = guarded_search.invoke("https://example.com/pricing") if "blocked" in result.lower(): print("Cannot proceed:", result) else: print("Allowed:", result)
PyPI: pypi.org/project/langchain-openterms — GitHub: github.com/jstibal/langchain-openterms
For a full guide covering all three packages, integration selection guidance, and the security model, see the SDK & Integrations page.
Permission Check API
The Permission Check API (GET /api/v1/check) returns whether a domain's openterms.json permits a specific agent action. Use it as a guard before performing any automated operation.
Endpoint
GET https://openterms.com/api/v1/check?domain=example.com&action=scrape_data
| Parameter | Type | Description | |
|---|---|---|---|
domain | string | Required | Domain to check (e.g. stripe.com). URLs are stripped to hostname automatically. |
action | string | Required |
Action to check. Can be an exact permission key or a free-text action that will be semantically mapped.
Examples: scrape_data, api_access, allow_training, scrape_pricing.
|
Response
A successful response:
{
"success": true,
"domain": "example.com",
"action": "scrape_data",
"result": {
"value": false,
"confidence": "medium",
"caveats": []
},
"permission": "denied",
"confidence": 0.99,
"policy_excerpt": "Explicitly denied.",
"openterms_version": "0.3.0",
"checked_at": "2026-04-24T09:00:00.000Z"
}
Response fields
| Field | Type | Description |
|---|---|---|
result | object |
The permission result with field-level confidence metadata. Contains:
value (true = allowed, false = denied, null = not specified),
confidence ("high", "medium", or "low"),
and caveats (array of known failure mode strings, empty for high/medium fields).
|
permission | string |
One of: allowed, denied, not_specified. Mirrors result.value as a string for convenience. |
confidence | number |
Match confidence score for this specific domain lookup (0.0–1.0).
Exact permission key matches → 0.99. Semantic prefix matches → 0.7–0.8.
Lower when the domain's openterms.json doesn't list the permission.
Distinct from result.confidence, which is the static field-level accuracy tier.
|
policy_excerpt | string |
Short human-readable explanation of the decision, drawn from the domain's openterms.json. |
resolved_permission | string |
Present when the action was semantically mapped (not an exact match). Shows the matched permission key. |
Per-field confidence levels
The result.confidence value reflects empirical accuracy from controlled measurement
(Haiku 4.5, Tests 7-18 baseline, 4-domain sample, external LLM judges).
Three tiers:
| Tier | Threshold | Meaning |
|---|---|---|
| high | 95%+ | Verified against independent external LLM judgment at 95% or higher accuracy. Safe to use for automated decision-making. |
| medium | 80–94% | Verified at 80-94% accuracy. Appropriate for automated decision-making with understanding that edge cases exist. Spot-checking recommended for high-stakes use cases. |
| low | <80% | Verified at below 80% accuracy. Should not be used as sole input for automated decisions. Human review recommended for high-stakes use cases. |
| Permission Field | Accuracy | Confidence | Caveats |
|---|---|---|---|
read_content | 100% | high | — |
post_content | 100% | high | — |
scrape_data | 88% | medium | — |
create_account | 88% | medium | — |
make_purchases | 88% | medium | — |
api_access | 75% | low | Below 80% accuracy threshold. Human review recommended for high-stakes decisions. |
allow_training | ~50% | low |
Explicit training prohibitions may be missed when stated as exclusive-channel restrictions (e.g., 'only via our API'). Known failure cases: instagram.com, deepgram.com. Extended disclosure: The allow_training field has approximately 50% accuracy in empirical testing. This field is particularly prone to false negatives — platforms that prohibit AI training may not be detected if their terms use indirect language (e.g., exclusive-channel restrictions rather than explicit training prohibitions). Known failure cases include instagram.com and deepgram.com. This field should not be used as the sole input for automated decisions. Human review is required. |
Note on confidence methodology: Confidence levels are based on empirical measurement against external LLM judges across a 4-domain sample (Tests 7-18). These are v1 accuracy estimates — they will be recalibrated when scale validation completes. A field's confidence level is constant across all domains; it reflects the generator's general accuracy for that field, not the quality of any one domain's openterms.json file.
Code examples
curl — check if scraping is allowed
$ curl "https://openterms.com/api/v1/check?domain=github.com&action=scrape_data"
Python — permission check with caveat handling
import requests def check_permission(domain: str, action: str): resp = requests.get( "https://openterms.com/api/v1/check", params={"domain": domain, "action": action}, timeout=30 ) data = resp.json() result = data.get("result", {}) # Always check for caveats warnings for caveat in result.get("caveats", []): print(f"⚠️ {data['action']} on {data['domain']}: {caveat}") # Log low-confidence fields for human review if result.get("confidence") == "low": print("❗ Manual verification recommended.") return result.get("value") # True / False / None # Usage decision = check_permission("example.com", "allow_training")
Rate limits
Same limits as the public generator API: 100 requests/hour and 1,000 requests/day per IP address.
Bulk Download
Download the entire registry as a single ZIP file — 500+ openterms.json entries, organized by validation status. Ideal for bootstrapping local policy enforcement, training datasets, or offline analysis.
- OTA-verified
- Cross-referenced against Open Terms Archive ground truth data, a third-party legal document archive.
⬇ Download the full dataset
openterms.com/registry/download
— live ZIP, generated fresh from the registry on every request.
ZIP Structure
openterms-registry-seed/ ├── validated/ │ ├── github-com.json │ ├── stripe-com.json │ └── ... (all validated entries) ├── unvalidated/ │ ├── example-com.json │ └── ... (all unvalidated entries) ├── flagged/ │ └── ... (entries with data quality issues) ├── index.json ← manifest: domain, category, validation_status, confidence └── README.md ← schema version, generated timestamp, usage
index.json Schema
The index.json manifest lists every entry with metadata, making it easy to filter locally without parsing each file:
{
"generated_at": "2026-04-16T14:00:00.000Z",
"schema_version": "0.3.1",
"total": 511,
"counts": { "validated": 53, "unvalidated": 458, "flagged": 0 },
"entries": [
{
"domain": "github.com",
"filename": "github-com.json",
"category": "Developer Tools",
"validation_status": "validated",
"confidence": 0.9
}
]
}
Use Cases
- Local policy enforcement — embed the dataset in your agent runtime for zero-latency permission checks
- Offline environments — air-gapped or latency-sensitive deployments that can't call the live API
- Training data — structured ToS signals for fine-tuning AI models
- Policy snapshots — snapshot the registry state at a point in time
Open Receipt Specification — Companion Spec External / Future context
The Open Receipt Specification is an external companion spec, not part of the OpenTerms runtime. It describes a pattern for generating structured records when AI agents acknowledge policies before acting.
GET /api/v1/check. The Open Receipt Specification is referenced here as future/external context. No such infrastructure is deployed or required for current-product usage.
OpenTerms and the Open Receipt Specification describe complementary layers:
- OpenTerms — declares what agents are permitted to do (the policy)
- Open Receipt Specification — describes a record structure for when an agent acknowledged that policy (an external companion spec)
- The
policy_idfield inopenterms.jsonis the linking identifier if you implement both
Internal Logging
What to log
When an agent calls the Permission Check API, consider logging the result in your own internal systems:
- Domain — the domain whose permissions were checked
- Action — the specific permission requested (e.g.,
scrape_data,api_access) - Result —
allowed,denied, orno_openterms_json - Timestamp — ISO 8601, when the check was performed
- Source — whether the record came from the registry, a live fetch, or a cached entry
These internal log records document what your agent checked before acting. What records are required for your use case is a decision for your team.
Implementation
import requests import json from datetime import datetime, timezone def check_and_log(domain: str, action: str) -> str: """Check permission and log the result. Returns result string.""" resp = requests.get( "https://openterms.com/api/v1/check", params={"domain": domain, "action": action}, timeout=10, ) resp.raise_for_status() data = resp.json() result = data.get("result", "no_openterms_json") # Build an internal log record record = { "domain": domain, "action": action, "result": result, "checked_at": datetime.now(timezone.utc).isoformat(), } with open("permission_checks.jsonl", "a") as f: f.write(json.dumps(record) + "\n") return result # Usage result = check_and_log("example.com", "api_access") if result == "denied": print("Action not permitted by site policy.") elif result == "no_openterms_json": print("No openterms.json found — apply your default policy.")
OpenTerms provides machine-readable permission data as one input to agent operation and review processes. It does not provide legal advice.
Training Semantics: A, B, and C
The word "training" in AI terms of service is ambiguous. It can refer to three distinct scenarios with different legal and operational implications. OpenTerms separates them precisely.
Semantics A — Third-party training on site content
Definition: Can external parties (AI companies, crawlers, developers) use the site's published content to train machine learning models?
This is what allow_training captures in v0.3.1. The service is the content publisher; the question is whether it grants third parties a training license.
allow_training — fully addressed.
Semantics B — Service internal use of user data
Definition: Does the service itself use user-submitted data — including profile data, usage patterns, messages, or transactions — to improve its own products, train internal models, or enhance recommendations?
This is a data processing relationship between the service and its users. It is intentionally out of scope for openterms.json. The permissions protocol governs what third parties (including AI agents) may do; it does not govern the service's own data practices. Semantics B belongs in Privacy Policy and DPA agreements, not in openterms.json.
Semantics C — Site training on agent-submitted data v0.4.0
Definition: Can the site train AI/ML models on data that an agent submits to it — including form inputs, API payloads, file uploads, chat messages, and structured data submissions?
This is the reverse direction from Semantics A. Instead of asking "can I train on your content?", the agent asks "will you train on my content when I submit it to you?" This is specifically relevant to enterprise agent deployments where agents submit proprietary or sensitive data to SaaS platforms.
Semantics C is a known gap in v0.3.1. A site may simultaneously prohibit third-party training on its content (allow_training: false) while reserving the right to train on submitted inputs. These are legally distinct and must not be conflated.
allow_training_on_submissions field — see planning document.
Summary
| Semantic | Question | Protocol Coverage |
|---|---|---|
| A — Third-party on site content | Can external parties train on the site's published content? | allow_training — v0.3.1 ✓ |
| B — Service internal use | Does the service train on its users' data internally? | Privacy Policy scope — intentionally out of scope |
| C — Site on agent submissions | Does the site train on data agents submit to it? | allow_training_on_submissions — v0.4.0 proposal |
CI/CD Validation
Validate your openterms.json in CI/CD pipelines using the API endpoint:
- name: Validate openterms.json
run: |
RESULT=$(curl -s -X POST https://openterms.com/api/validate \
-H "Content-Type: application/json" \
-d "{\"content\": $(cat openterms.json)}")
echo "$RESULT" | jq .
VALID=$(echo "$RESULT" | jq -r '.valid')
if [ "$VALID" != "true" ]; then
echo "openterms.json validation failed!"
exit 1
fi
// package.json
{
"scripts": {
"validate:terms": "curl -sf -X POST https://openterms.com/api/validate -H 'Content-Type: application/json' -d '{\"content\":'$(cat openterms.json)'}' | jq -e '.valid'"
}
}