v0.3.0 — Current

OpenTerms Specification

Machine-readable legal terms for the agentic web. Define what AI agents can and can't do on your service — in a format they can actually parse.

OpenTerms is a JSON-based protocol that lets services declare their terms of service, permissions, rate limits, and usage constraints in a structured format. Think of it as robots.txt for AI agent behavior.

Since v0.2.0: JSON-LD support (@context), policy identifiers (policy_id), data residency, permission scoping/frequency, and an extensions namespace.

New in v0.3.0: discovery object — machine-readable pointers to MCP servers and OpenAPI specs. Plus formalized extensions.com.openterms.meta namespace for generator and provenance metadata.

For Agents

OpenTerms is designed to be queried by agents at runtime — before acting on a third-party domain. This section covers everything an agent needs: what to check, how to interpret results, and when to escalate.

Primary endpoint: GET https://openterms.com/api/v1/check?domain={domain}&action={action}

Seven canonical actions: read_content, scrape_data, api_access, create_account, make_purchases, post_content, allow_training.

What to check before acting

  1. Fetch permissions — call the check API with the target domain and the specific action you intend to take.
  2. Interpret the resulttrue means allowed, false means denied, null means not specified. Conditional objects mean allowed or denied only under stated conditions.
  3. Apply fail-closed logic for null and low confidence — if the result is null or confidence is low, treat as denied unless your policy explicitly permits proceeding with unknowns.
  4. Log the check — record domain, action, decision, confidence, and timestamp in your own internal logs.
  5. Escalate when necessary — for conditional results, low-confidence results, or high-stakes actions, surface the check result to a human rather than auto-deciding.

API example

The API checks whether a domain publishes an openterms.json file. During the alpha, most domains have not yet adopted the standard, so the typical response is no_openterms_json:

curl — domain without openterms.json (most domains today)
GET https://openterms.com/api/v1/check?domain=github.com&action=read_content

{
  "success": true,
  "domain": "github.com",
  "action": "read_content",
  "result": { "value": null, "confidence": null, "caveats": [] },
  "permission": "not_specified",
  "confidence": 0,
  "error": "no_openterms_json",
  "message": "No openterms.json found at github.com."
}

This is a correct, handled response — not an error. It means the domain has not published machine-readable permissions. Agents should treat this as unresolved and fall back to other information sources.

When a domain does publish openterms.json, the response includes structured permission data:

curl — domain with openterms.json (after adoption)
GET https://openterms.com/api/v1/check?domain=example.com&action=read_content

{
  "success": true,
  "domain": "example.com",
  "action": "read_content",
  "result": {
    "value": true,
    "confidence": "high",
    "caveats": []
  },
  "permission": "allowed",
  "confidence": 0.99,
  "policy_excerpt": "Explicitly allowed.",
  "checked_at": "2026-05-02T21:00:00.000Z"
}

Decision policy

Use this table to map any check result to an agent action:

Result value Confidence Agent action Notes
true high / medium Proceed Permission explicitly granted. Log the check and continue.
true low Escalate Result may be inaccurate. Escalate to human or use conservative default.
false any Block Permission explicitly denied. Do not proceed. Log the denial.
null any Escalate or Block Not specified in the domain's openterms.json. Fail closed: treat as denied unless your policy allows proceeding with unknowns.
conditional object high / medium Check conditions allowed: true with conditions — read the conditions field. Only proceed if conditions are met. allowed: false — block.
conditional object low Escalate Low confidence on conditional result. Escalate to human review.

Fail-closed recommendation: When result.value is null or result.confidence is "low", treat the permission as denied by default. Proceeding on unknowns shifts responsibility to the agent operator and may violate terms that were simply not indexed.

Logging guidance

Log every permission check. Minimum fields:

  • domain — the domain checked
  • action — the permission key (e.g. scrape_data)
  • decisionallowed, denied, or not_specified
  • confidence — numeric score (0.0–1.0)
  • checked_at — ISO 8601 timestamp from the response
  • agent_id — identifier for the agent or run performing the check

Human escalation guidance

Escalate to a human when:

  • Result is null (not specified) and the action is high-stakes (purchases, account creation, content posting)
  • Confidence is low
  • Result is a conditional object and the conditions cannot be programmatically verified
  • The domain is newly indexed or unvalidated (validation_status: "unvalidated")

Copy/Paste Agent Instructions

Copy this block directly into your agent's system prompt or instructions file. Customize the action list to match your use case.

agent-instructions.txt
Before performing any action on a third-party domain, check OpenTerms to verify 
that the action is permitted by that domain's terms of service.

Check endpoint:
  GET https://openterms.com/api/v1/check?domain={domain}&action={action}

Canonical actions to check:
  read_content     — read publicly available content
  scrape_data      — bulk-download or scrape data from the domain
  api_access       — access the domain's API programmatically
  create_account   — create a user account on the domain
  make_purchases   — make purchases or financial transactions
  post_content     — post, publish, or submit content
  allow_training   — use the domain's published content for model training

Decision policy:
  result.value = true             → allowed. Proceed. Log the check.
  result.value = false            → denied. Block. Do not proceed.
  result.value = null             → not specified. Fail closed. Treat as denied.
  result.value = {object}         → conditional. Read the conditions field.
                                    Only proceed if allowed=true and conditions are met.
  result.confidence = "low"       → escalate to human regardless of value.

Logging (required for every check):
  Log: domain, action, decision, confidence, checked_at, agent_id

Human escalation (required when):
  - result.value is null for high-stakes actions (purchases, account creation, post_content)
  - result.confidence is "low"
  - result is a conditional object and conditions cannot be verified programmatically
  - The domain is unvalidated

OpenTerms is one input to your decision — it does not replace legal review for 
regulated use cases. When in doubt, escalate.

Common Agent Action Examples

Full check–interpret–decide flows for the most common agent operations.

Scrape a pricing page

Python
import requests

# 1. Check
resp = requests.get(
    "https://openterms.com/api/v1/check",
    params={"domain": "acme.com", "action": "scrape_data"}
)
check = resp.json()

# 2. Interpret
value = check["result"]["value"]
confidence = check["result"]["confidence"]

# 3. Decide
if value is True and confidence != "low":
    # Proceed
    log_check("acme.com", "scrape_data", "allowed")
    scrape_pricing_page("https://acme.com/pricing")
elif value is False:
    # Block
    log_check("acme.com", "scrape_data", "denied")
    raise PermissionDenied("scrape_data denied on acme.com")
else:
    # null or low confidence → escalate
    escalate_to_human("acme.com", "scrape_data", check)

Call an API

curl
$ curl "https://openterms.com/api/v1/check?domain=stripe.com&action=api_access"

# result.value = true, confidence = 0.75 → "low" tier
# Decision: escalate (api_access has <80% accuracy — human review recommended)

Create an account

Python
check = openterms_check("example.com", "create_account")

if check["result"]["value"] is None:
    # null = not specified — fail closed for account creation
    escalate_to_human("create_account on example.com: not specified in openterms.json")
elif check["result"]["value"] is True:
    create_account("example.com")
else:
    raise PermissionDenied("create_account denied")

Post content

Python
check = openterms_check("forum.example.com", "post_content")
result = check["result"]

if isinstance(result["value"], dict):
    # Conditional — check the conditions
    if result["value"]["allowed"]:
        print(result["value"]["conditions"])  # e.g. "Must disclose AI authorship"
        escalate_for_human_confirmation()
    else:
        raise PermissionDenied("post_content denied under conditions")
elif result["value"] is True:
    post_content("forum.example.com", content)
else:
    raise PermissionDenied()

Use content for model training

Python
check = openterms_check("dataset.example.com", "allow_training")
result = check["result"]

# allow_training has ~50% accuracy — always escalate regardless of value
if result["confidence"] == "low":
    escalate_to_human(
        "allow_training has low confidence accuracy — human review required"
    )
elif result["value"] is True:
    add_to_training_dataset("dataset.example.com")
elif result["value"] is False:
    raise PermissionDenied("allow_training denied")
else:
    # null — fail closed for training data
    raise PermissionDenied("allow_training not specified — fail closed")

Quick Start

Create an openterms.json file and host it at the root of your domain:

openterms.json
{
  "$schema": "https://openterms.com/schema/openterms.schema.json",
  "openterms_version": "0.3.0",
  "service": {
    "name": "Your Service",
    "domain": "yourservice.com",
    "tos_url": "https://yourservice.com/terms"
  },
  "permissions": {
    "read_content": true,
    "scrape_data": false,
    "api_access": true,
    "create_account": false,
    "make_purchases": false,
    "post_content": false,
    "allow_training": null
  },
  "requires_consent": true,
  "jurisdiction": "US",
  "contact": "legal@yourservice.com",
  "last_updated": "2025-06-01"
}

That's it. AI agents fetch https://yourservice.com/openterms.json before taking any action, just like crawlers check robots.txt.

Pro tip: Add the $schema field to get auto-completion and inline validation in VS Code, JetBrains, and other editors that support JSON Schema.

How It Works

  1. Services publish an openterms.json at their domain root (or any discoverable URL)
  2. Agents query the Permission Check API before taking actions — permissions are structured data, not legalese
  3. Agents act on the result — proceed, skip, or surface to a human depending on the returned permission value

Core Fields

Field Type Status Description
openterms_version string Required Spec version (e.g. "0.3.0"). Semver format.
service string | object Required Service info. Shorthand: "acme.com". Full: object with name, domain, tos_url, privacy_url, description, logo_url.
permissions object Required What agents can do. See Permissions section.
$schema string (URI) Optional Self-referencing schema URI. Enables editor auto-completion.
@context string | object New JSON-LD context for semantic web / linked data interoperability.
policy_id string New Globally unique identifier for this terms document. Used to reference a specific version of terms in external tooling.
requires_consent boolean Optional Must the agent obtain explicit consent before acting?
jurisdiction string | string[] Optional ISO 3166-1/2 jurisdiction code(s). E.g. "US-DE", ["US-CA", "EU"].
contact string | object Optional Legal contact. Shorthand: email string. Full: object with email, name, url.
last_updated string (date) Optional ISO 8601 date when terms were last modified.
expires string (date) Optional Date these terms expire. Agents should re-fetch after this date.
discovery object v0.3.0 Machine-readable pointers to MCP servers and API specs. See Discovery section.
extensions object Optional Namespace for custom or industry-specific fields. Use reverse-domain keys. See Extensions.

Permissions

The permissions object defines what AI agents can do. Each value is either:

  • true — allowed unconditionally
  • false — denied
  • Conditional object — allowed with conditions

Standard Permissions

PermissionDescription
read_contentRead publicly available content
scrape_dataScrape or bulk-download data
api_accessAccess the service's API programmatically
create_accountCreate user accounts programmatically
make_purchasesMake purchases or financial transactions
post_contentPost, publish, or submit content
allow_training Whether external parties may train AI/ML models on the site's published content (Semantics A). Specifically covers third-party use of site-owned content for model training. Does not cover the service's internal use of user data for its own model improvement (Semantics B — Privacy Policy scope), nor whether the site trains on data submitted by agents (Semantics C — addressed in v0.4.0 proposal).

The active schema defines exactly seven canonical permission keys: read_content, scrape_data, api_access, create_account, make_purchases, post_content, allow_training. The schema uses additionalProperties: false — additional permission keys are not recognized by the current spec.

Conditional Permission Object

{
  "make_purchases": {
    "allowed": true,
    "conditions": "Max $500/day. Agent must be linked to verified human.",
    "requires_auth": true,
    "max_frequency": "50/day",
    "scope": "authenticated"
  }
}
FieldTypeDescription
allowedbooleanRequired Whether the permission is granted.
conditionsstringHuman-readable conditions or restrictions.
requires_authbooleanWhether this permission requires authentication.
max_frequencystringRate limit for this specific action. E.g. "10/hour", "100/day". New
scopestringWhat data subset this applies to. E.g. "public", "authenticated", "premium". New

Rate Limits

{
  "rate_limits": {
    "requests_per_minute": 60,
    "requests_per_hour": 1000,
    "requests_per_day": 10000,
    "concurrent_sessions": 5
  }
}

All fields are optional integers. concurrent_sessions (since v0.2.0) limits how many simultaneous agent connections are allowed.

Data Handling

{
  "data_handling": {
    "stores_agent_data": true,
    "shares_with_third_parties": false,
    "retention_days": 90,
    "gdpr_compliant": true,
    "ccpa_compliant": true,
    "hipaa_compliant": false,
    "data_residency": ["US", "EU"]
  }
}
FieldTypeDescription
stores_agent_databooleanStores data about agent interactions?
shares_with_third_partiesbooleanShares agent data with third parties?
retention_daysintegerDays data is retained. 0 = no retention.
gdpr_compliantbooleanGDPR compliant?
ccpa_compliantbooleanCCPA compliant?
hipaa_compliantbooleanHIPAA compliant? New
data_residencystring | string[]Where data is stored. ISO codes. New

Authentication

{
  "authentication": {
    "required": true,
    "methods": ["api_key", "oauth2"],
    "registration_url": "https://acme.com/developers",
    "docs_url": "https://docs.acme.com/auth"
  }
}

Supported methods: api_key, oauth2, bearer_token, basic_auth, mTLS New, none.

The docs_url field (since v0.2.0) — link directly to your auth documentation for faster agent onboarding.

Verification

Since v0.2.0. Optional verification metadata — digital signatures, policy hashes, and JWKS endpoints — for service operators who want to publish a signed, versioned policy document.

{
  "verification": {
    "jwks_url": "https://acme.com/.well-known/jwks.json",
    "signing_algorithm": "Ed25519",
    "policy_hash": "a1b2c3d4e5f6..."
  }
}
FieldTypeDescription
jwks_urlstring (URI)URL to your JWKS endpoint for verifying signed policy documents.
signing_algorithmstringOne of: Ed25519, RS256, ES256.
policy_hashstringSHA-256 hash of the canonical terms document. 64 hex chars.

The policy_id field in openterms.json provides a stable identifier for this terms document. It can be referenced externally by companion tooling, but is not required for the core permission check workflow.

Extensions

The extensions object is a namespace for custom or industry-specific fields. Use reverse-domain notation to avoid conflicts:

{
  "extensions": {
    "health.hipaa.baa_required": true,
    "health.hipaa.audit_log_url": "https://acme.com/api/audit",
    "com.acme.internal_tier": "enterprise",
    "org.fintech.pci_dss_level": 1
  }
}

Extensions are free-form — any JSON value is accepted. This keeps the core schema stable while allowing domain-specific needs.

com.openterms.meta (v0.3.1)

extensions.com.openterms.meta is the first official OpenTerms namespace. It records provenance — how and where this file was created. This namespace is placed inside extensions rather than at root level because the root schema uses additionalProperties: false to ensure forward compatibility and strict validation.

Why not a top-level field? The root schema is intentionally closed. New root fields require a spec version bump and breaking changes. The extensions namespace lets tools add structured metadata without modifying the core spec contract.

{
  "extensions": {
    "com.openterms.meta": {
      "source": "self",
      "generator": "openterms.com/v0.3.0"
    }
  }
}
FieldTypeDescription
sourcestringOrigin of this file. Typically "self" (written by the domain owner) or "openterms.com" (auto-generated).
generatorstringTool or service that generated this file, in reverse-domain/version format. E.g. "openterms.com/v0.3.0".
generated_atstring (datetime)ISO 8601 timestamp of when this file was generated. E.g. "2025-06-01T12:00:00Z".

The validator displays a note when com.openterms.meta is present, indicating the file was auto-generated and showing the generator version.

Discovery (v0.3.0)

The discovery object positions openterms.json as both the legal permissions layer and the technical discovery entry point for a domain's agent-facing infrastructure. It provides machine-readable signposts to existing technical resources — MCP servers, OpenAPI specs — that an agent can connect to directly.

Discovery does not describe what those servers do. It points to them. OpenTerms doesn't duplicate what MCP manifests or OpenAPI specs already define. It simply says: "these endpoints exist and are permitted."

FieldTypeDescription
mcp_servers array List of MCP (Model Context Protocol) server endpoints. Each entry has url, transport, and optional description.
api_specs array List of API specification documents. Each entry has url, type, and optional description.

MCP Server entry fields

FieldRequiredValuesDescription
urlRequiredstring (URI)URL of the MCP server endpoint.
transportRequired"sse" | "stdio" | "streamable-http"Transport protocol used by this MCP server.
descriptionOptionalstringHuman-readable summary of what this server provides.

API Spec entry fields

FieldRequiredValuesDescription
urlRequiredstring (URI)URL to the API specification document.
typeRequired"openapi_3" | "swagger_2" | "graphql_schema"The specification format.
descriptionOptionalstringHuman-readable description of this API spec.

Complete v0.3.0 Example

A full openterms.json showing both permissions and discovery populated:

openterms.json (v0.3.0)
{
  "$schema": "https://openterms.com/schema/openterms.schema.json",
  "openterms_version": "0.3.0",
  "service": "acme-corp.com",
  "permissions": {
    "read_content": true,
    "scrape_data": false,
    "api_access": {
      "allowed": true,
      "requires_auth": true,
      "max_frequency": "1000/hour"
    }
  },
  "discovery": {
    "mcp_servers": [
      {
        "url": "https://acme-corp.com/mcp/sse",
        "transport": "sse",
        "description": "Provides tools for checking order status and inventory."
      }
    ],
    "api_specs": [
      {
        "url": "https://api.acme-corp.com/v1/openapi.json",
        "type": "openapi_3",
        "description": "Full REST API for catalog and user management."
      }
    ]
  }
}

Discovery is a signpost, not a description layer. MCP servers already have manifests. OpenAPI specs already describe endpoints. The discovery field simply makes those resources findable via a single, standardized location — no duplicated documentation required.

Examples

Complete, validated examples for common use cases:

Use CaseFileKey Features
SaaS API saas-api.json Full API with OAuth, rate limits, conditional purchases, sandboxed code execution
E-Commerce ecommerce.json Purchase limits, product scraping with conditions, multi-jurisdiction
Social Platform social-platform.json AI disclosure requirements, DM opt-in, frequency limits per permission
Open/Public API open-api.json Minimal restrictions, high rate limits, no auth required
Healthcare (HIPAA) healthcare.json HIPAA-scope fields, BAA requirement, extensions namespace, mTLS auth

Load any example directly in the Validator to explore it interactively.

Adoption Guide

Step 1: Create your openterms.json

Start with the Quick Start template. Add permissions that match your service's terms of service. Be explicit — false is better than omitting a permission.

Step 2: Host it

Place the file at https://yourdomain.com/openterms.json — the standard discovery path. Alternatively, reference it from your existing robots.txt:

robots.txt
# AI Agent Terms
OpenTerms: https://yourdomain.com/openterms.json

Step 3: Validate

Use the interactive validator or the programmatic API:

curl -X POST https://openterms.com/api/validate \
  -H "Content-Type: application/json" \
  -d '{"content": <your openterms.json>}'

Step 4: Keep it updated

Update last_updated whenever you change terms. Set expires to force agents to re-fetch periodically.

Framework Integrations

Add OpenTerms permission checks to any agent framework using a single HTTP call. All examples below use GET /api/v1/check directly — no SDK required.

Public alpha note: During the public alpha, most domains don't yet have an openterms.json file. The API returns result: "no_openterms_json" for those domains — this is the expected response, not an error. See the handling section below.

curl — one-liner check

The simplest integration: a single HTTP GET before any automated action.

shell
$ curl -s "https://openterms.com/api/v1/check?domain=example.com&action=scrape_data"
{
  "success": true,
  "domain": "example.com",
  "action": "scrape_data",
  "result": "no_openterms_json",
  "checked_at": "2026-05-08T14:00:00.000Z"
}

During public alpha, result: "no_openterms_json" is the expected response for most domains. The domain simply hasn't published an openterms.json yet. See the README for current usage examples: openterms-py on GitHub.

Handling no_openterms_json

Your integration should handle three result states:

ResultMeaningRecommended action
allowed Domain has openterms.json; action is permitted Proceed
denied Domain has openterms.json; action is not permitted Skip or surface to user
no_openterms_json Domain hasn't published openterms.json yet — expected during public alpha Fall back to your own default policy

Python (requests)

Direct HTTP call — no third-party SDK needed:

agent_guard.py
import requests

def check_action(domain: str, action: str) -> str:
    """Returns 'allowed', 'denied', or 'no_openterms_json'."""
    resp = requests.get(
        "https://openterms.com/api/v1/check",
        params={"domain": domain, "action": action},
        timeout=10,
    )
    resp.raise_for_status()
    data = resp.json()
    return data.get("result", "no_openterms_json")

# Usage — works whether or not the domain has published openterms.json
status = check_action("example.com", "scrape_data")
if status == "denied":
    print("Action not permitted by site policy.")
elif status == "no_openterms_json":
    print("No openterms.json found — apply your default policy.")
else:
    print("Permitted.")

Node.js (fetch)

Works with Node 18+ native fetch or any HTTP client:

agentGuard.js
// Returns 'allowed', 'denied', or 'no_openterms_json'
async function checkAction(domain, action) {
  const url = `https://openterms.com/api/v1/check?domain=${domain}&action=${action}`;
  const res = await fetch(url);
  if (!res.ok) throw new Error(`OpenTerms check failed: ${res.status}`);
  const data = await res.json();
  return data.result ?? 'no_openterms_json';
}

// Usage
const status = await checkAction('example.com', 'scrape_data');
if (status === 'denied') {
  console.log('Action not permitted by site policy.');
} else if (status === 'no_openterms_json') {
  console.log('No openterms.json found — apply your default policy.');
}

LangChain / tool wrapper

Drop a permission check into any tool-calling framework as a pre-execution guard. The pattern below uses the HTTP API directly — substitute your framework's HTTP client:

openterms_tool.py
import requests

# Framework-agnostic guard — call this before executing any agent action
def openterms_guard(domain: str, action: str) -> dict:
    resp = requests.get(
        "https://openterms.com/api/v1/check",
        params={"domain": domain, "action": action},
        timeout=10,
    ).json()

    result = resp.get("result", "no_openterms_json")
    # During public alpha, no_openterms_json is normal — domain hasn't published yet
    return {
        "permitted": result == "allowed",
        "result": result,
        "domain": domain,
        "action": action,
    }

# Wire into your framework — example shows a generic tool-call pattern
# See the README for current usage examples with specific frameworks.
Framework function names: This section uses the direct HTTP API only. For framework-specific helper names (LangChain Tool class, CrewAI task wrappers, etc.), see the README for current usage examples.

CrewAI

crewai-openterms is an independent community package that provides CrewAI-compatible tools for the OpenTerms permission check API.

Install the package:

shell
$ pip install crewai-openterms

args_schema requirement: CrewAI tools must declare a Pydantic args_schema so the framework can validate inputs before invoking the tool. crewai-openterms ships with schemas pre-defined — pass your domain and action arguments to the tool call and the schema validation is handled automatically.

crewai_example.py
from crewai import Agent, Task, Crew
from crewai_openterms import OpenTermsCheckTool

# Instantiate the permission-check tool
check_tool = OpenTermsCheckTool()

# Wire it into a CrewAI agent as a pre-action guard
agent = Agent(
    role="Web Research Agent",
    goal="Check site permissions before scraping",
    backstory="Respects publisher terms before acting.",
    tools=[check_tool],
    verbose=True,
)

task = Task(
    description="Check whether scrape_data is permitted for example.com",
    expected_output="Permission result from OpenTerms API",
    agent=agent,
)

# The tool calls GET https://openterms.com/api/v1/check internally
crew = Crew(agents=[agent], tasks=[task])
result = crew.kickoff(
    inputs={"domain": "example.com", "action": "scrape_data"}
)
Registry records, not rulings: Returned permission values are registry records derived from machine-readable files. Treat them as one structured input to agent decision logic, not a substitute for review in regulated contexts.

PyPI: pypi.org/project/crewai-openterms — GitHub: github.com/jstibal/crewai-openterms

LangChain

langchain-openterms provides permission-aware tools for LangChain agents. Install the package:

shell
$ pip install langchain-openterms

# With openterms-py SDK (recommended — requires openterms-py>=0.3.1):
$ pip install "langchain-openterms[sdk]"

Three integration patterns ship in the package:

  • OpenTermsGuard — wraps any LangChain tool; blocks the wrapped tool unless the check returns allow
  • OpenTermsChecker — standalone tool an agent can call to check permissions
  • OpenTermsCallbackHandler — passive observer; logs permission checks without blocking (monitoring only)
langchain_guard_example.py
from langchain_openterms import OpenTermsGuard

# Wrap any LangChain tool — fail-closed by default
# from langchain_community.tools import BraveSearch
# search = BraveSearch.from_api_key(api_key="...", search_kwargs={"count": 3})
guarded_search = OpenTermsGuard(
    tool=search,
    action="read_content",
)

result = guarded_search.invoke("https://example.com/pricing")
if "blocked" in result.lower():
    print("Cannot proceed:", result)
else:
    print("Allowed:", result)
Registry records, not rulings: Returned permission values are registry records derived from machine-readable files. Treat them as one structured input to agent decision logic, not a substitute for review in regulated contexts.

PyPI: pypi.org/project/langchain-openterms — GitHub: github.com/jstibal/langchain-openterms

For a full guide covering all three packages, integration selection guidance, and the security model, see the SDK & Integrations page.

Permission Check API

The Permission Check API (GET /api/v1/check) returns whether a domain's openterms.json permits a specific agent action. Use it as a guard before performing any automated operation.

Endpoint

Request
GET https://openterms.com/api/v1/check?domain=example.com&action=scrape_data
ParameterTypeDescription
domainstringRequired Domain to check (e.g. stripe.com). URLs are stripped to hostname automatically.
actionstringRequired Action to check. Can be an exact permission key or a free-text action that will be semantically mapped. Examples: scrape_data, api_access, allow_training, scrape_pricing.

Response

A successful response:

{
  "success": true,
  "domain": "example.com",
  "action": "scrape_data",
  "result": {
    "value": false,
    "confidence": "medium",
    "caveats": []
  },
  "permission": "denied",
  "confidence": 0.99,
  "policy_excerpt": "Explicitly denied.",
  "openterms_version": "0.3.0",
  "checked_at": "2026-04-24T09:00:00.000Z"
}

Response fields

FieldTypeDescription
resultobject The permission result with field-level confidence metadata. Contains: value (true = allowed, false = denied, null = not specified), confidence ("high", "medium", or "low"), and caveats (array of known failure mode strings, empty for high/medium fields).
permissionstring One of: allowed, denied, not_specified. Mirrors result.value as a string for convenience.
confidencenumber Match confidence score for this specific domain lookup (0.0–1.0). Exact permission key matches → 0.99. Semantic prefix matches → 0.7–0.8. Lower when the domain's openterms.json doesn't list the permission. Distinct from result.confidence, which is the static field-level accuracy tier.
policy_excerptstring Short human-readable explanation of the decision, drawn from the domain's openterms.json.
resolved_permissionstring Present when the action was semantically mapped (not an exact match). Shows the matched permission key.

Per-field confidence levels

The result.confidence value reflects empirical accuracy from controlled measurement (Haiku 4.5, Tests 7-18 baseline, 4-domain sample, external LLM judges). Three tiers:

TierThresholdMeaning
high 95%+ Verified against independent external LLM judgment at 95% or higher accuracy. Safe to use for automated decision-making.
medium 80–94% Verified at 80-94% accuracy. Appropriate for automated decision-making with understanding that edge cases exist. Spot-checking recommended for high-stakes use cases.
low <80% Verified at below 80% accuracy. Should not be used as sole input for automated decisions. Human review recommended for high-stakes use cases.
Permission FieldAccuracyConfidenceCaveats
read_content100% high
post_content100% high
scrape_data88% medium
create_account88% medium
make_purchases88% medium
api_access75% low Below 80% accuracy threshold. Human review recommended for high-stakes decisions.
allow_training~50% low Explicit training prohibitions may be missed when stated as exclusive-channel restrictions (e.g., 'only via our API'). Known failure cases: instagram.com, deepgram.com.

Extended disclosure: The allow_training field has approximately 50% accuracy in empirical testing. This field is particularly prone to false negatives — platforms that prohibit AI training may not be detected if their terms use indirect language (e.g., exclusive-channel restrictions rather than explicit training prohibitions). Known failure cases include instagram.com and deepgram.com. This field should not be used as the sole input for automated decisions. Human review is required.

Note on confidence methodology: Confidence levels are based on empirical measurement against external LLM judges across a 4-domain sample (Tests 7-18). These are v1 accuracy estimates — they will be recalibrated when scale validation completes. A field's confidence level is constant across all domains; it reflects the generator's general accuracy for that field, not the quality of any one domain's openterms.json file.

Code examples

curl — check if scraping is allowed

$ curl "https://openterms.com/api/v1/check?domain=github.com&action=scrape_data"

Python — permission check with caveat handling

permission_check.py
import requests

def check_permission(domain: str, action: str):
    resp = requests.get(
        "https://openterms.com/api/v1/check",
        params={"domain": domain, "action": action},
        timeout=30
    )
    data = resp.json()
    result = data.get("result", {})

    # Always check for caveats warnings
    for caveat in result.get("caveats", []):
        print(f"⚠️  {data['action']} on {data['domain']}: {caveat}")

    # Log low-confidence fields for human review
    if result.get("confidence") == "low":
        print("❗ Manual verification recommended.")

    return result.get("value")  # True / False / None

# Usage
decision = check_permission("example.com", "allow_training")

Rate limits

Same limits as the public generator API: 100 requests/hour and 1,000 requests/day per IP address.

Bulk Download

Download the entire registry as a single ZIP file — 500+ openterms.json entries, organized by validation status. Ideal for bootstrapping local policy enforcement, training datasets, or offline analysis.

OTA-verified
Cross-referenced against Open Terms Archive ground truth data, a third-party legal document archive.

⬇ Download the full dataset
openterms.com/registry/download — live ZIP, generated fresh from the registry on every request.

ZIP Structure

openterms-registry-seed.zip
openterms-registry-seed/
├── validated/
│   ├── github-com.json
│   ├── stripe-com.json
│   └── ... (all validated entries)
├── unvalidated/
│   ├── example-com.json
│   └── ... (all unvalidated entries)
├── flagged/
│   └── ... (entries with data quality issues)
├── index.json     ← manifest: domain, category, validation_status, confidence
└── README.md      ← schema version, generated timestamp, usage

index.json Schema

The index.json manifest lists every entry with metadata, making it easy to filter locally without parsing each file:

index.json (excerpt)
{
  "generated_at": "2026-04-16T14:00:00.000Z",
  "schema_version": "0.3.1",
  "total": 511,
  "counts": { "validated": 53, "unvalidated": 458, "flagged": 0 },
  "entries": [
    {
      "domain": "github.com",
      "filename": "github-com.json",
      "category": "Developer Tools",
      "validation_status": "validated",
      "confidence": 0.9
    }
  ]
}

Use Cases

  • Local policy enforcement — embed the dataset in your agent runtime for zero-latency permission checks
  • Offline environments — air-gapped or latency-sensitive deployments that can't call the live API
  • Training data — structured ToS signals for fine-tuning AI models
  • Policy snapshots — snapshot the registry state at a point in time

Open Receipt Specification — Companion Spec External / Future context

The Open Receipt Specification is an external companion spec, not part of the OpenTerms runtime. It describes a pattern for generating structured records when AI agents acknowledge policies before acting.

Current-product scope: The OpenTerms public alpha provides machine-readable permission data via GET /api/v1/check. The Open Receipt Specification is referenced here as future/external context. No such infrastructure is deployed or required for current-product usage.

OpenTerms and the Open Receipt Specification describe complementary layers:

  1. OpenTerms — declares what agents are permitted to do (the policy)
  2. Open Receipt Specification — describes a record structure for when an agent acknowledged that policy (an external companion spec)
  3. The policy_id field in openterms.json is the linking identifier if you implement both

Internal Logging

What to log

When an agent calls the Permission Check API, consider logging the result in your own internal systems:

  • Domain — the domain whose permissions were checked
  • Action — the specific permission requested (e.g., scrape_data, api_access)
  • Resultallowed, denied, or no_openterms_json
  • Timestamp — ISO 8601, when the check was performed
  • Source — whether the record came from the registry, a live fetch, or a cached entry

These internal log records document what your agent checked before acting. What records are required for your use case is a decision for your team.

Implementation

Python — log permission checks to internal log
import requests
import json
from datetime import datetime, timezone

def check_and_log(domain: str, action: str) -> str:
    """Check permission and log the result. Returns result string."""
    resp = requests.get(
        "https://openterms.com/api/v1/check",
        params={"domain": domain, "action": action},
        timeout=10,
    )
    resp.raise_for_status()
    data = resp.json()
    result = data.get("result", "no_openterms_json")

    # Build an internal log record
    record = {
        "domain": domain,
        "action": action,
        "result": result,
        "checked_at": datetime.now(timezone.utc).isoformat(),
    }
    with open("permission_checks.jsonl", "a") as f:
        f.write(json.dumps(record) + "\n")
    return result

# Usage
result = check_and_log("example.com", "api_access")
if result == "denied":
    print("Action not permitted by site policy.")
elif result == "no_openterms_json":
    print("No openterms.json found — apply your default policy.")

OpenTerms provides machine-readable permission data as one input to agent operation and review processes. It does not provide legal advice.

Training Semantics: A, B, and C

The word "training" in AI terms of service is ambiguous. It can refer to three distinct scenarios with different legal and operational implications. OpenTerms separates them precisely.

Semantics A — Third-party training on site content

Definition: Can external parties (AI companies, crawlers, developers) use the site's published content to train machine learning models?

This is what allow_training captures in v0.3.1. The service is the content publisher; the question is whether it grants third parties a training license.

v0.3.1 coverage: allow_training — fully addressed.

Semantics B — Service internal use of user data

Definition: Does the service itself use user-submitted data — including profile data, usage patterns, messages, or transactions — to improve its own products, train internal models, or enhance recommendations?

This is a data processing relationship between the service and its users. It is intentionally out of scope for openterms.json. The permissions protocol governs what third parties (including AI agents) may do; it does not govern the service's own data practices. Semantics B belongs in Privacy Policy and DPA agreements, not in openterms.json.

v0.3.1 coverage: Intentionally excluded. See service Privacy Policy.

Semantics C — Site training on agent-submitted data v0.4.0

Definition: Can the site train AI/ML models on data that an agent submits to it — including form inputs, API payloads, file uploads, chat messages, and structured data submissions?

This is the reverse direction from Semantics A. Instead of asking "can I train on your content?", the agent asks "will you train on my content when I submit it to you?" This is specifically relevant to enterprise agent deployments where agents submit proprietary or sensitive data to SaaS platforms.

Semantics C is a known gap in v0.3.1. A site may simultaneously prohibit third-party training on its content (allow_training: false) while reserving the right to train on submitted inputs. These are legally distinct and must not be conflated.

v0.4.0 proposal: allow_training_on_submissions field — see planning document.

Summary

SemanticQuestionProtocol Coverage
A — Third-party on site content Can external parties train on the site's published content? allow_training — v0.3.1 ✓
B — Service internal use Does the service train on its users' data internally? Privacy Policy scope — intentionally out of scope
C — Site on agent submissions Does the site train on data agents submit to it? allow_training_on_submissions — v0.4.0 proposal

CI/CD Validation

Validate your openterms.json in CI/CD pipelines using the API endpoint:

GitHub Actions
- name: Validate openterms.json
  run: |
    RESULT=$(curl -s -X POST https://openterms.com/api/validate \
      -H "Content-Type: application/json" \
      -d "{\"content\": $(cat openterms.json)}")
    echo "$RESULT" | jq .
    VALID=$(echo "$RESULT" | jq -r '.valid')
    if [ "$VALID" != "true" ]; then
      echo "openterms.json validation failed!"
      exit 1
    fi
npm script
// package.json
{
  "scripts": {
    "validate:terms": "curl -sf -X POST https://openterms.com/api/validate -H 'Content-Type: application/json' -d '{\"content\":'$(cat openterms.json)'}' | jq -e '.valid'"
  }
}