v0.3.0 — Current

OpenTerms Specification

Machine-readable legal terms for the agentic web. Define what AI agents can and can't do on your service — in a format they can actually parse.

OpenTerms is a JSON-based protocol that lets services declare their terms of service, permissions, rate limits, and usage constraints in a structured format. Think of it as robots.txt for AI agent behavior.

Since v0.2.0: JSON-LD support (@context), policy identifiers (policy_id), data residency, permission scoping/frequency, and an extensions namespace.

New in v0.3.0: discovery object — machine-readable pointers to MCP servers and OpenAPI specs. Plus formalized extensions.com.openterms.meta namespace for generator and provenance metadata.

For Agents

OpenTerms is designed to be queried by agents at runtime — before acting on a third-party domain. This section covers everything an agent needs: what to check, how to interpret results, and when to escalate.

Primary endpoint: GET https://openterms.com/api/v1/check?domain={domain}&action={action}

Seven canonical actions: read_content, scrape_data, api_access, create_account, make_purchases, post_content, allow_training.

What to check before acting

Fetch permissions — call the check API with the target domain and the specific action you intend to take.
Interpret the result — true means allowed, false means denied, null means not specified. Conditional objects mean allowed or denied only under stated conditions.
Apply fail-closed logic for null and low confidence — if the result is null or confidence is low, treat as denied unless your policy explicitly permits proceeding with unknowns.
Log the check — record domain, action, decision, confidence, and timestamp in your own internal logs.
Escalate when necessary — for conditional results, low-confidence results, or high-stakes actions, surface the check result to a human rather than auto-deciding.

API example

The API checks whether a domain publishes an openterms.json file. During the alpha, most domains have not yet adopted the standard, so the typical response is no_openterms_json:

curl — domain without openterms.json (most domains today)

GET https://openterms.com/api/v1/check?domain=github.com&action=read_content

{
  "success": true,
  "domain": "github.com",
  "action": "read_content",
  "result": { "value": null, "confidence": null, "caveats": [] },
  "permission": "not_specified",
  "confidence": 0,
  "error": "no_openterms_json",
  "message": "No openterms.json found at github.com."
}

This is a correct, handled response — not an error. It means the domain has not published machine-readable permissions. Agents should treat this as unresolved and fall back to other information sources.

When a domain does publish openterms.json, the response includes structured permission data:

curl — domain with openterms.json (after adoption)

GET https://openterms.com/api/v1/check?domain=example.com&action=read_content

{
  "success": true,
  "domain": "example.com",
  "action": "read_content",
  "result": {
    "value": true,
    "confidence": "high",
    "caveats": []
  },
  "permission": "allowed",
  "confidence": 0.99,
  "policy_excerpt": "Explicitly allowed.",
  "checked_at": "2026-05-02T21:00:00.000Z"
}

Decision policy

Use this table to map any check result to an agent action:

Result value	Confidence	Agent action	Notes
`true`	high / medium	Proceed	Permission explicitly granted. Log the check and continue.
`true`	low	Escalate	Result may be inaccurate. Escalate to human or use conservative default.
`false`	any	Block	Permission explicitly denied. Do not proceed. Log the denial.
`null`	any	Escalate or Block	Not specified in the domain's openterms.json. Fail closed: treat as denied unless your policy allows proceeding with unknowns.
conditional object	high / medium	Check conditions	`allowed: true` with conditions — read the `conditions` field. Only proceed if conditions are met. `allowed: false` — block.
conditional object	low	Escalate	Low confidence on conditional result. Escalate to human review.

Fail-closed recommendation: When result.value is null or result.confidence is "low", treat the permission as denied by default. Proceeding on unknowns shifts responsibility to the agent operator and may violate terms that were simply not indexed.

Logging guidance

Log every permission check. Minimum fields:

domain — the domain checked
action — the permission key (e.g. scrape_data)
decision — allowed, denied, or not_specified
confidence — numeric score (0.0–1.0)
checked_at — ISO 8601 timestamp from the response
agent_id — identifier for the agent or run performing the check

Human escalation guidance

Escalate to a human when:

Result is null (not specified) and the action is high-stakes (purchases, account creation, content posting)
Confidence is low
Result is a conditional object and the conditions cannot be programmatically verified
The domain is newly indexed or unvalidated (validation_status: "unvalidated")

Copy/Paste Agent Instructions

Copy this block directly into your agent's system prompt or instructions file. Customize the action list to match your use case.

agent-instructions.txt

Before performing any action on a third-party domain, check OpenTerms to verify 
that the action is permitted by that domain's terms of service.

Check endpoint:
  GET https://openterms.com/api/v1/check?domain={domain}&action={action}

Canonical actions to check:
  read_content     — read publicly available content
  scrape_data      — bulk-download or scrape data from the domain
  api_access       — access the domain's API programmatically
  create_account   — create a user account on the domain
  make_purchases   — make purchases or financial transactions
  post_content     — post, publish, or submit content
  allow_training   — use the domain's published content for model training

Decision policy:
  result.value = true             → allowed. Proceed. Log the check.
  result.value = false            → denied. Block. Do not proceed.
  result.value = null             → not specified. Fail closed. Treat as denied.
  result.value = {object}         → conditional. Read the conditions field.
                                    Only proceed if allowed=true and conditions are met.
  result.confidence = "low"       → escalate to human regardless of value.

Logging (required for every check):
  Log: domain, action, decision, confidence, checked_at, agent_id

Human escalation (required when):
  - result.value is null for high-stakes actions (purchases, account creation, post_content)
  - result.confidence is "low"
  - result is a conditional object and conditions cannot be verified programmatically
  - The domain is unvalidated

OpenTerms is one input to your decision — it does not replace legal review for 
regulated use cases. When in doubt, escalate.

Common Agent Action Examples

Full check–interpret–decide flows for the most common agent operations.

Scrape a pricing page

Python

import requests

# 1. Check
resp = requests.get(
    "https://openterms.com/api/v1/check",
    params={"domain": "acme.com", "action": "scrape_data"}
)
check = resp.json()

# 2. Interpret
value = check["result"]["value"]
confidence = check["result"]["confidence"]

# 3. Decide
if value is True and confidence != "low":
    # Proceed
    log_check("acme.com", "scrape_data", "allowed")
    scrape_pricing_page("https://acme.com/pricing")
elif value is False:
    # Block
    log_check("acme.com", "scrape_data", "denied")
    raise PermissionDenied("scrape_data denied on acme.com")
else:
    # null or low confidence → escalate
    escalate_to_human("acme.com", "scrape_data", check)

Call an API

curl

$ curl "https://openterms.com/api/v1/check?domain=stripe.com&action=api_access"

# result.value = true, confidence = 0.75 → "low" tier
# Decision: escalate (api_access has <80% accuracy — human review recommended)

Create an account

Python

check = openterms_check("example.com", "create_account")

if check["result"]["value"] is None:
    # null = not specified — fail closed for account creation
    escalate_to_human("create_account on example.com: not specified in openterms.json")
elif check["result"]["value"] is True:
    create_account("example.com")
else:
    raise PermissionDenied("create_account denied")

Post content

Python

check = openterms_check("forum.example.com", "post_content")
result = check["result"]

if isinstance(result["value"], dict):
    # Conditional — check the conditions
    if result["value"]["allowed"]:
        print(result["value"]["conditions"])  # e.g. "Must disclose AI authorship"
        escalate_for_human_confirmation()
    else:
        raise PermissionDenied("post_content denied under conditions")
elif result["value"] is True:
    post_content("forum.example.com", content)
else:
    raise PermissionDenied()

Use content for model training

Python

check = openterms_check("dataset.example.com", "allow_training")
result = check["result"]

# allow_training has ~50% accuracy — always escalate regardless of value
if result["confidence"] == "low":
    escalate_to_human(
        "allow_training has low confidence accuracy — human review required"
    )
elif result["value"] is True:
    add_to_training_dataset("dataset.example.com")
elif result["value"] is False:
    raise PermissionDenied("allow_training denied")
else:
    # null — fail closed for training data
    raise PermissionDenied("allow_training not specified — fail closed")

Quick Start

Create an openterms.json file and host it at the root of your domain:

openterms.json

{
  "$schema": "https://openterms.com/schema/openterms.schema.json",
  "openterms_version": "0.3.0",
  "service": {
    "name": "Your Service",
    "domain": "yourservice.com",
    "tos_url": "https://yourservice.com/terms"
  },
  "permissions": {
    "read_content": true,
    "scrape_data": false,
    "api_access": true,
    "create_account": false,
    "make_purchases": false,
    "post_content": false,
    "allow_training": null
  },
  "requires_consent": true,
  "jurisdiction": "US",
  "contact": "legal@yourservice.com",
  "last_updated": "2025-06-01"
}

That's it. AI agents fetch https://yourservice.com/openterms.json before taking any action, just like crawlers check robots.txt.

Pro tip: Add the $schema field to get auto-completion and inline validation in VS Code, JetBrains, and other editors that support JSON Schema.

How It Works

Services publish an openterms.json at their domain root (or any discoverable URL)
Agents query the Permission Check API before taking actions — permissions are structured data, not legalese
Agents act on the result — proceed, skip, or surface to a human depending on the returned permission value

Core Fields

Field	Type	Status	Description
`openterms_version`	`string`	Required	Spec version (e.g. `"0.3.0"`). Semver format.
`service`	`string \| object`	Required	Service info. Shorthand: `"acme.com"`. Full: object with `name`, `domain`, `tos_url`, `privacy_url`, `description`, `logo_url`.
`permissions`	`object`	Required	What agents can do. See Permissions section.
`$schema`	`string (URI)`	Optional	Self-referencing schema URI. Enables editor auto-completion.
`@context`	`string \| object`	New	JSON-LD context for semantic web / linked data interoperability.
`policy_id`	`string`	New	Globally unique identifier for this terms document. Used to reference a specific version of terms in external tooling.
`requires_consent`	`boolean`	Optional	Must the agent obtain explicit consent before acting?
`jurisdiction`	`string \| string[]`	Optional	ISO 3166-1/2 jurisdiction code(s). E.g. `"US-DE"`, `["US-CA", "EU"]`.
`contact`	`string \| object`	Optional	Legal contact. Shorthand: email string. Full: object with `email`, `name`, `url`.
`last_updated`	`string (date)`	Optional	ISO 8601 date when terms were last modified.
`expires`	`string (date)`	Optional	Date these terms expire. Agents should re-fetch after this date.
`discovery`	`object`	v0.3.0	Machine-readable pointers to MCP servers and API specs. See Discovery section.
`extensions`	`object`	Optional	Namespace for custom or industry-specific fields. Use reverse-domain keys. See Extensions.

Permissions

The permissions object defines what AI agents can do. Each value is either:

true — allowed unconditionally
false — denied
Conditional object — allowed with conditions

Standard Permissions

Permission	Description
`read_content`	Read publicly available content
`scrape_data`	Scrape or bulk-download data
`api_access`	Access the service's API programmatically
`create_account`	Create user accounts programmatically
`make_purchases`	Make purchases or financial transactions
`post_content`	Post, publish, or submit content
`allow_training`	Whether external parties may train AI/ML models on the site's published content (Semantics A). Specifically covers third-party use of site-owned content for model training. Does not cover the service's internal use of user data for its own model improvement (Semantics B — Privacy Policy scope), nor whether the site trains on data submitted by agents (Semantics C — addressed in v0.4.0 proposal).

The active schema defines exactly seven canonical permission keys: read_content, scrape_data, api_access, create_account, make_purchases, post_content, allow_training. The schema uses additionalProperties: false — additional permission keys are not recognized by the current spec.

Conditional Permission Object

{
  "make_purchases": {
    "allowed": true,
    "conditions": "Max $500/day. Agent must be linked to verified human.",
    "requires_auth": true,
    "max_frequency": "50/day",
    "scope": "authenticated"
  }
}

Field	Type	Description
`allowed`	`boolean`	Required Whether the permission is granted.
`conditions`	`string`	Human-readable conditions or restrictions.
`requires_auth`	`boolean`	Whether this permission requires authentication.
`max_frequency`	`string`	Rate limit for this specific action. E.g. `"10/hour"`, `"100/day"`. New
`scope`	`string`	What data subset this applies to. E.g. `"public"`, `"authenticated"`, `"premium"`. New

Rate Limits

{
  "rate_limits": {
    "requests_per_minute": 60,
    "requests_per_hour": 1000,
    "requests_per_day": 10000,
    "concurrent_sessions": 5
  }
}

All fields are optional integers. concurrent_sessions (since v0.2.0) limits how many simultaneous agent connections are allowed.

Data Handling

{
  "data_handling": {
    "stores_agent_data": true,
    "shares_with_third_parties": false,
    "retention_days": 90,
    "gdpr_compliant": true,
    "ccpa_compliant": true,
    "hipaa_compliant": false,
    "data_residency": ["US", "EU"]
  }
}

Field	Type	Description
`stores_agent_data`	`boolean`	Stores data about agent interactions?
`shares_with_third_parties`	`boolean`	Shares agent data with third parties?
`retention_days`	`integer`	Days data is retained. 0 = no retention.
`gdpr_compliant`	`boolean`	GDPR compliant?
`ccpa_compliant`	`boolean`	CCPA compliant?
`hipaa_compliant`	`boolean`	HIPAA compliant? New
`data_residency`	`string \| string[]`	Where data is stored. ISO codes. New

Authentication

{
  "authentication": {
    "required": true,
    "methods": ["api_key", "oauth2"],
    "registration_url": "https://acme.com/developers",
    "docs_url": "https://docs.acme.com/auth"
  }
}

Supported methods: api_key, oauth2, bearer_token, basic_auth, mTLS New, none.

The docs_url field (since v0.2.0) — link directly to your auth documentation for faster agent onboarding.

Verification

Since v0.2.0. Optional verification metadata — digital signatures, policy hashes, and JWKS endpoints — for service operators who want to publish a signed, versioned policy document.

{
  "verification": {
    "jwks_url": "https://acme.com/.well-known/jwks.json",
    "signing_algorithm": "Ed25519",
    "policy_hash": "a1b2c3d4e5f6..."
  }
}

Field	Type	Description
`jwks_url`	`string (URI)`	URL to your JWKS endpoint for verifying signed policy documents.
`signing_algorithm`	`string`	One of: `Ed25519`, `RS256`, `ES256`.
`policy_hash`	`string`	SHA-256 hash of the canonical terms document. 64 hex chars.

The policy_id field in openterms.json provides a stable identifier for this terms document. It can be referenced externally by companion tooling, but is not required for the core permission check workflow.

Extensions

The extensions object is a namespace for custom or industry-specific fields. Use reverse-domain notation to avoid conflicts:

{
  "extensions": {
    "health.hipaa.baa_required": true,
    "health.hipaa.audit_log_url": "https://acme.com/api/audit",
    "com.acme.internal_tier": "enterprise",
    "org.fintech.pci_dss_level": 1
  }
}

Extensions are free-form — any JSON value is accepted. This keeps the core schema stable while allowing domain-specific needs.

com.openterms.meta (v0.3.1)

extensions.com.openterms.meta is the first official OpenTerms namespace. It records provenance — how and where this file was created. This namespace is placed inside extensions rather than at root level because the root schema uses additionalProperties: false to ensure forward compatibility and strict validation.

Why not a top-level field? The root schema is intentionally closed. New root fields require a spec version bump and breaking changes. The extensions namespace lets tools add structured metadata without modifying the core spec contract.

{
  "extensions": {
    "com.openterms.meta": {
      "source": "self",
      "generator": "openterms.com/v0.3.0"
    }
  }
}

Field	Type	Description
`source`	`string`	Origin of this file. Typically `"self"` (written by the domain owner) or `"openterms.com"` (auto-generated).
`generator`	`string`	Tool or service that generated this file, in reverse-domain/version format. E.g. `"openterms.com/v0.3.0"`.
`generated_at`	`string (datetime)`	ISO 8601 timestamp of when this file was generated. E.g. `"2025-06-01T12:00:00Z"`.

The validator displays a note when com.openterms.meta is present, indicating the file was auto-generated and showing the generator version.

Discovery (v0.3.0)

The discovery object positions openterms.json as both the legal permissions layer and the technical discovery entry point for a domain's agent-facing infrastructure. It provides machine-readable signposts to existing technical resources — MCP servers, OpenAPI specs — that an agent can connect to directly.

Discovery does not describe what those servers do. It points to them. OpenTerms doesn't duplicate what MCP manifests or OpenAPI specs already define. It simply says: "these endpoints exist and are permitted."

Field	Type	Description
`mcp_servers`	`array`	List of MCP (Model Context Protocol) server endpoints. Each entry has `url`, `transport`, and optional `description`.
`api_specs`	`array`	List of API specification documents. Each entry has `url`, `type`, and optional `description`.

MCP Server entry fields

Field	Required	Values	Description
`url`	Required	`string (URI)`	URL of the MCP server endpoint.
`transport`	Required	`"sse" \| "stdio" \| "streamable-http"`	Transport protocol used by this MCP server.
`description`	Optional	`string`	Human-readable summary of what this server provides.

API Spec entry fields

Field	Required	Values	Description
`url`	Required	`string (URI)`	URL to the API specification document.
`type`	Required	`"openapi_3" \| "swagger_2" \| "graphql_schema"`	The specification format.
`description`	Optional	`string`	Human-readable description of this API spec.

Complete v0.3.0 Example

A full openterms.json showing both permissions and discovery populated:

openterms.json (v0.3.0)

{
  "$schema": "https://openterms.com/schema/openterms.schema.json",
  "openterms_version": "0.3.0",
  "service": "acme-corp.com",
  "permissions": {
    "read_content": true,
    "scrape_data": false,
    "api_access": {
      "allowed": true,
      "requires_auth": true,
      "max_frequency": "1000/hour"
    }
  },
  "discovery": {
    "mcp_servers": [
      {
        "url": "https://acme-corp.com/mcp/sse",
        "transport": "sse",
        "description": "Provides tools for checking order status and inventory."
      }
    ],
    "api_specs": [
      {
        "url": "https://api.acme-corp.com/v1/openapi.json",
        "type": "openapi_3",
        "description": "Full REST API for catalog and user management."
      }
    ]
  }
}

Discovery is a signpost, not a description layer. MCP servers already have manifests. OpenAPI specs already describe endpoints. The discovery field simply makes those resources findable via a single, standardized location — no duplicated documentation required.

Examples

Complete, validated examples for common use cases:

Use Case	File	Key Features
SaaS API	`saas-api.json`	Full API with OAuth, rate limits, conditional purchases, sandboxed code execution
E-Commerce	`ecommerce.json`	Purchase limits, product scraping with conditions, multi-jurisdiction
Social Platform	`social-platform.json`	AI disclosure requirements, DM opt-in, frequency limits per permission
Open/Public API	`open-api.json`	Minimal restrictions, high rate limits, no auth required
Healthcare (HIPAA)	`healthcare.json`	HIPAA-scope fields, BAA requirement, extensions namespace, mTLS auth

Load any example directly in the Validator to explore it interactively.

Adoption Guide

Step 1: Create your openterms.json

Start with the Quick Start template. Add permissions that match your service's terms of service. Be explicit — false is better than omitting a permission.

Step 2: Host it

Place the file at https://yourdomain.com/openterms.json — the standard discovery path. Alternatively, reference it from your existing robots.txt:

robots.txt

# AI Agent Terms
OpenTerms: https://yourdomain.com/openterms.json

Step 3: Validate

Use the interactive validator or the programmatic API:

curl -X POST https://openterms.com/api/validate \
  -H "Content-Type: application/json" \
  -d '{"content": <your openterms.json>}'

Step 4: Keep it updated

Update last_updated whenever you change terms. Set expires to force agents to re-fetch periodically.

Framework Integrations

Add OpenTerms permission checks to any agent framework using a single HTTP call. All examples below use GET /api/v1/check directly — no SDK required.

Public alpha note: During the public alpha, most domains don't yet have an openterms.json file. The API returns result: "no_openterms_json" for those domains — this is the expected response, not an error. See the handling section below.

curl — one-liner check

The simplest integration: a single HTTP GET before any automated action.

shell

$ curl -s "https://openterms.com/api/v1/check?domain=example.com&action=scrape_data"
{
  "success": true,
  "domain": "example.com",
  "action": "scrape_data",
  "result": "no_openterms_json",
  "checked_at": "2026-05-08T14:00:00.000Z"
}

During public alpha, result: "no_openterms_json" is the expected response for most domains. The domain simply hasn't published an openterms.json yet. See the README for current usage examples: openterms-py on GitHub.

Handling `no_openterms_json`

Your integration should handle three result states:

Result	Meaning	Recommended action
`allowed`	Domain has openterms.json; action is permitted	Proceed
`denied`	Domain has openterms.json; action is not permitted	Skip or surface to user
`no_openterms_json`	Domain hasn't published openterms.json yet — expected during public alpha	Fall back to your own default policy

Python (requests)

Direct HTTP call — no third-party SDK needed:

agent_guard.py

import requests

def check_action(domain: str, action: str) -> str:
    """Returns 'allowed', 'denied', or 'no_openterms_json'."""
    resp = requests.get(
        "https://openterms.com/api/v1/check",
        params={"domain": domain, "action": action},
        timeout=10,
    )
    resp.raise_for_status()
    data = resp.json()
    return data.get("result", "no_openterms_json")

# Usage — works whether or not the domain has published openterms.json
status = check_action("example.com", "scrape_data")
if status == "denied":
    print("Action not permitted by site policy.")
elif status == "no_openterms_json":
    print("No openterms.json found — apply your default policy.")
else:
    print("Permitted.")

Node.js (fetch)

Works with Node 18+ native fetch or any HTTP client:

agentGuard.js

// Returns 'allowed', 'denied', or 'no_openterms_json'
async function checkAction(domain, action) {
  const url = `https://openterms.com/api/v1/check?domain=${domain}&action=${action}`;
  const res = await fetch(url);
  if (!res.ok) throw new Error(`OpenTerms check failed: ${res.status}`);
  const data = await res.json();
  return data.result ?? 'no_openterms_json';
}

// Usage
const status = await checkAction('example.com', 'scrape_data');
if (status === 'denied') {
  console.log('Action not permitted by site policy.');
} else if (status === 'no_openterms_json') {
  console.log('No openterms.json found — apply your default policy.');
}

LangChain / tool wrapper

Drop a permission check into any tool-calling framework as a pre-execution guard. The pattern below uses the HTTP API directly — substitute your framework's HTTP client:

openterms_tool.py

import requests

# Framework-agnostic guard — call this before executing any agent action
def openterms_guard(domain: str, action: str) -> dict:
    resp = requests.get(
        "https://openterms.com/api/v1/check",
        params={"domain": domain, "action": action},
        timeout=10,
    ).json()

    result = resp.get("result", "no_openterms_json")
    # During public alpha, no_openterms_json is normal — domain hasn't published yet
    return {
        "permitted": result == "allowed",
        "result": result,
        "domain": domain,
        "action": action,
    }

# Wire into your framework — example shows a generic tool-call pattern
# See the README for current usage examples with specific frameworks.

Framework function names: This section uses the direct HTTP API only. For framework-specific helper names (LangChain Tool class, CrewAI task wrappers, etc.), see the README for current usage examples.

CrewAI

crewai-openterms is an independent community package that provides CrewAI-compatible tools for the OpenTerms permission check API.

Install the package:

shell

$ pip install crewai-openterms

args_schema requirement: CrewAI tools must declare a Pydantic args_schema so the framework can validate inputs before invoking the tool. crewai-openterms ships with schemas pre-defined — pass your domain and action arguments to the tool call and the schema validation is handled automatically.

crewai_example.py

from crewai import Agent, Task, Crew
from crewai_openterms import OpenTermsCheckTool

# Instantiate the permission-check tool
check_tool = OpenTermsCheckTool()

# Wire it into a CrewAI agent as a pre-action guard
agent = Agent(
    role="Web Research Agent",
    goal="Check site permissions before scraping",
    backstory="Respects publisher terms before acting.",
    tools=[check_tool],
    verbose=True,
)

task = Task(
    description="Check whether scrape_data is permitted for example.com",
    expected_output="Permission result from OpenTerms API",
    agent=agent,
)

# The tool calls GET https://openterms.com/api/v1/check internally
crew = Crew(agents=[agent], tasks=[task])
result = crew.kickoff(
    inputs={"domain": "example.com", "action": "scrape_data"}
)

Registry records, not rulings: Returned permission values are registry records derived from machine-readable files. Treat them as one structured input to agent decision logic, not a substitute for review in regulated contexts.

PyPI: pypi.org/project/crewai-openterms — GitHub: github.com/jstibal/crewai-openterms

LangChain

langchain-openterms provides permission-aware tools for LangChain agents. Install the package:

shell

$ pip install langchain-openterms

# With openterms-py SDK (recommended — requires openterms-py>=0.3.1):
$ pip install "langchain-openterms[sdk]"

Three integration patterns ship in the package:

OpenTermsGuard — wraps any LangChain tool; blocks the wrapped tool unless the check returns allow
OpenTermsChecker — standalone tool an agent can call to check permissions
OpenTermsCallbackHandler — passive observer; logs permission checks without blocking (monitoring only)

langchain_guard_example.py

from langchain_openterms import OpenTermsGuard

# Wrap any LangChain tool — fail-closed by default
# from langchain_community.tools import BraveSearch
# search = BraveSearch.from_api_key(api_key="...", search_kwargs={"count": 3})
guarded_search = OpenTermsGuard(
    tool=search,
    action="read_content",
)

result = guarded_search.invoke("https://example.com/pricing")
if "blocked" in result.lower():
    print("Cannot proceed:", result)
else:
    print("Allowed:", result)

PyPI: pypi.org/project/langchain-openterms — GitHub: github.com/jstibal/langchain-openterms

For a full guide covering all three packages, integration selection guidance, and the security model, see the SDK & Integrations page.

Permission Check API

The Permission Check API (GET /api/v1/check) returns whether a domain's openterms.json permits a specific agent action. Use it as a guard before performing any automated operation.

Endpoint

Request

GET https://openterms.com/api/v1/check?domain=example.com&action=scrape_data

Parameter	Type		Description
`domain`	string	Required	Domain to check (e.g. `stripe.com`). URLs are stripped to hostname automatically.
`action`	string	Required	Action to check. Can be an exact permission key or a free-text action that will be semantically mapped. Examples: `scrape_data`, `api_access`, `allow_training`, `scrape_pricing`.

Response

A successful response:

{
  "success": true,
  "domain": "example.com",
  "action": "scrape_data",
  "result": {
    "value": false,
    "confidence": "medium",
    "caveats": []
  },
  "permission": "denied",
  "confidence": 0.99,
  "policy_excerpt": "Explicitly denied.",
  "openterms_version": "0.3.0",
  "checked_at": "2026-04-24T09:00:00.000Z"
}

Response fields

Field	Type	Description
`result`	`object`	The permission result with field-level confidence metadata. Contains: `value` (`true` = allowed, `false` = denied, `null` = not specified), `confidence` (`"high"`, `"medium"`, or `"low"`), and `caveats` (array of known failure mode strings, empty for high/medium fields).
`permission`	`string`	One of: `allowed`, `denied`, `not_specified`. Mirrors `result.value` as a string for convenience.
`confidence`	`number`	Match confidence score for this specific domain lookup (0.0–1.0). Exact permission key matches → 0.99. Semantic prefix matches → 0.7–0.8. Lower when the domain's openterms.json doesn't list the permission. Distinct from `result.confidence`, which is the static field-level accuracy tier.
`policy_excerpt`	`string`	Short human-readable explanation of the decision, drawn from the domain's openterms.json.
`resolved_permission`	`string`	Present when the action was semantically mapped (not an exact match). Shows the matched permission key.

Per-field confidence levels

The result.confidence value reflects empirical accuracy from controlled measurement (Haiku 4.5, Tests 7-18 baseline, 4-domain sample, external LLM judges). Three tiers:

Tier	Threshold	Meaning
high	95%+	Verified against independent external LLM judgment at 95% or higher accuracy. Safe to use for automated decision-making.
medium	80–94%	Verified at 80-94% accuracy. Appropriate for automated decision-making with understanding that edge cases exist. Spot-checking recommended for high-stakes use cases.
low	<80%	Verified at below 80% accuracy. Should not be used as sole input for automated decisions. Human review recommended for high-stakes use cases.

Permission Field	Accuracy	Confidence	Caveats
`read_content`	100%	high	—
`post_content`	100%	high	—
`scrape_data`	88%	medium	—
`create_account`	88%	medium	—
`make_purchases`	88%	medium	—
`api_access`	75%	low	Below 80% accuracy threshold. Human review recommended for high-stakes decisions.
`allow_training`	~50%	low	Explicit training prohibitions may be missed when stated as exclusive-channel restrictions (e.g., 'only via our API'). Known failure cases: instagram.com, deepgram.com. Extended disclosure: The allow_training field has approximately 50% accuracy in empirical testing. This field is particularly prone to false negatives — platforms that prohibit AI training may not be detected if their terms use indirect language (e.g., exclusive-channel restrictions rather than explicit training prohibitions). Known failure cases include instagram.com and deepgram.com. This field should not be used as the sole input for automated decisions. Human review is required.

Note on confidence methodology: Confidence levels are based on empirical measurement against external LLM judges across a 4-domain sample (Tests 7-18). These are v1 accuracy estimates — they will be recalibrated when scale validation completes. A field's confidence level is constant across all domains; it reflects the generator's general accuracy for that field, not the quality of any one domain's openterms.json file.

Code examples

curl — check if scraping is allowed

$ curl "https://openterms.com/api/v1/check?domain=github.com&action=scrape_data"

Python — permission check with caveat handling

permission_check.py

import requests

def check_permission(domain: str, action: str):
    resp = requests.get(
        "https://openterms.com/api/v1/check",
        params={"domain": domain, "action": action},
        timeout=30
    )
    data = resp.json()
    result = data.get("result", {})

    # Always check for caveats warnings
    for caveat in result.get("caveats", []):
        print(f"⚠️  {data['action']} on {data['domain']}: {caveat}")

    # Log low-confidence fields for human review
    if result.get("confidence") == "low":
        print("❗ Manual verification recommended.")

    return result.get("value")  # True / False / None

# Usage
decision = check_permission("example.com", "allow_training")

Rate limits

Same limits as the public generator API: 100 requests/hour and 1,000 requests/day per IP address.

Bulk Download

Download the entire registry as a single ZIP file — 500+ openterms.json entries, organized by validation status. Ideal for bootstrapping local policy enforcement, training datasets, or offline analysis.

OTA-verified: Cross-referenced against Open Terms Archive ground truth data, a third-party legal document archive.

⬇ Download the full dataset
openterms.com/registry/download — live ZIP, generated fresh from the registry on every request.

ZIP Structure

openterms-registry-seed.zip

openterms-registry-seed/
├── validated/
│   ├── github-com.json
│   ├── stripe-com.json
│   └── ... (all validated entries)
├── unvalidated/
│   ├── example-com.json
│   └── ... (all unvalidated entries)
├── flagged/
│   └── ... (entries with data quality issues)
├── index.json     ← manifest: domain, category, validation_status, confidence
└── README.md      ← schema version, generated timestamp, usage

index.json Schema

The index.json manifest lists every entry with metadata, making it easy to filter locally without parsing each file:

index.json (excerpt)

{
  "generated_at": "2026-04-16T14:00:00.000Z",
  "schema_version": "0.3.1",
  "total": 511,
  "counts": { "validated": 53, "unvalidated": 458, "flagged": 0 },
  "entries": [
    {
      "domain": "github.com",
      "filename": "github-com.json",
      "category": "Developer Tools",
      "validation_status": "validated",
      "confidence": 0.9
    }
  ]
}

Use Cases

Local policy enforcement — embed the dataset in your agent runtime for zero-latency permission checks
Offline environments — air-gapped or latency-sensitive deployments that can't call the live API
Training data — structured ToS signals for fine-tuning AI models
Policy snapshots — snapshot the registry state at a point in time

Open Receipt Specification — Companion Spec External / Future context

The Open Receipt Specification is an external companion spec, not part of the OpenTerms runtime. It describes a pattern for generating structured records when AI agents acknowledge policies before acting.

Current-product scope: The OpenTerms public alpha provides machine-readable permission data via GET /api/v1/check. The Open Receipt Specification is referenced here as future/external context. No such infrastructure is deployed or required for current-product usage.

OpenTerms and the Open Receipt Specification describe complementary layers:

OpenTerms — declares what agents are permitted to do (the policy)
Open Receipt Specification — describes a record structure for when an agent acknowledged that policy (an external companion spec)
The policy_id field in openterms.json is the linking identifier if you implement both

Internal Logging

What to log

When an agent calls the Permission Check API, consider logging the result in your own internal systems:

Domain — the domain whose permissions were checked
Action — the specific permission requested (e.g., scrape_data, api_access)
Result — allowed, denied, or no_openterms_json
Timestamp — ISO 8601, when the check was performed
Source — whether the record came from the registry, a live fetch, or a cached entry

These internal log records document what your agent checked before acting. What records are required for your use case is a decision for your team.

Implementation

Python — log permission checks to internal log

import requests
import json
from datetime import datetime, timezone

def check_and_log(domain: str, action: str) -> str:
    """Check permission and log the result. Returns result string."""
    resp = requests.get(
        "https://openterms.com/api/v1/check",
        params={"domain": domain, "action": action},
        timeout=10,
    )
    resp.raise_for_status()
    data = resp.json()
    result = data.get("result", "no_openterms_json")

    # Build an internal log record
    record = {
        "domain": domain,
        "action": action,
        "result": result,
        "checked_at": datetime.now(timezone.utc).isoformat(),
    }
    with open("permission_checks.jsonl", "a") as f:
        f.write(json.dumps(record) + "\n")
    return result

# Usage
result = check_and_log("example.com", "api_access")
if result == "denied":
    print("Action not permitted by site policy.")
elif result == "no_openterms_json":
    print("No openterms.json found — apply your default policy.")

OpenTerms provides machine-readable permission data as one input to agent operation and review processes. It does not provide legal advice.

Training Semantics: A, B, and C

The word "training" in AI terms of service is ambiguous. It can refer to three distinct scenarios with different legal and operational implications. OpenTerms separates them precisely.

Semantics A — Third-party training on site content

Definition: Can external parties (AI companies, crawlers, developers) use the site's published content to train machine learning models?

This is what allow_training captures in v0.3.1. The service is the content publisher; the question is whether it grants third parties a training license.

v0.3.1 coverage: allow_training — fully addressed.

Semantics B — Service internal use of user data

Definition: Does the service itself use user-submitted data — including profile data, usage patterns, messages, or transactions — to improve its own products, train internal models, or enhance recommendations?

This is a data processing relationship between the service and its users. It is intentionally out of scope for openterms.json. The permissions protocol governs what third parties (including AI agents) may do; it does not govern the service's own data practices. Semantics B belongs in Privacy Policy and DPA agreements, not in openterms.json.

v0.3.1 coverage: Intentionally excluded. See service Privacy Policy.

Semantics C — Site training on agent-submitted data v0.4.0

Definition: Can the site train AI/ML models on data that an agent submits to it — including form inputs, API payloads, file uploads, chat messages, and structured data submissions?

This is the reverse direction from Semantics A. Instead of asking "can I train on your content?", the agent asks "will you train on my content when I submit it to you?" This is specifically relevant to enterprise agent deployments where agents submit proprietary or sensitive data to SaaS platforms.

Semantics C is a known gap in v0.3.1. A site may simultaneously prohibit third-party training on its content (allow_training: false) while reserving the right to train on submitted inputs. These are legally distinct and must not be conflated.

v0.4.0 proposal: allow_training_on_submissions field — see planning document.

Summary

Semantic	Question	Protocol Coverage
A — Third-party on site content	Can external parties train on the site's published content?	`allow_training` — v0.3.1 ✓
B — Service internal use	Does the service train on its users' data internally?	Privacy Policy scope — intentionally out of scope
C — Site on agent submissions	Does the site train on data agents submit to it?	`allow_training_on_submissions` — v0.4.0 proposal

CI/CD Validation

Validate your openterms.json in CI/CD pipelines using the API endpoint:

GitHub Actions

- name: Validate openterms.json
  run: |
    RESULT=$(curl -s -X POST https://openterms.com/api/validate \
      -H "Content-Type: application/json" \
      -d "{\"content\": $(cat openterms.json)}")
    echo "$RESULT" | jq .
    VALID=$(echo "$RESULT" | jq -r '.valid')
    if [ "$VALID" != "true" ]; then
      echo "openterms.json validation failed!"
      exit 1
    fi

npm script

// package.json
{
  "scripts": {
    "validate:terms": "curl -sf -X POST https://openterms.com/api/validate -H 'Content-Type: application/json' -d '{\"content\":'$(cat openterms.json)'}' | jq -e '.valid'"
  }
}

OpenTerms Specification

For Agents

What to check before acting

API example

Decision policy

Logging guidance

Human escalation guidance

Copy/Paste Agent Instructions

Common Agent Action Examples

Scrape a pricing page

Call an API

Create an account

Post content

Use content for model training

Quick Start

How It Works

Core Fields

Permissions

Standard Permissions

Conditional Permission Object

Rate Limits

Data Handling

Authentication

Verification

Extensions

com.openterms.meta (v0.3.1)

Discovery (v0.3.0)

MCP Server entry fields

API Spec entry fields

Complete v0.3.0 Example

Examples

Adoption Guide

Step 1: Create your openterms.json

Step 2: Host it

Step 3: Validate

Step 4: Keep it updated

Framework Integrations

curl — one-liner check

Handling no_openterms_json

Python (requests)

Node.js (fetch)

LangChain / tool wrapper

CrewAI

LangChain

Permission Check API

Endpoint

Response

Response fields

Per-field confidence levels

Code examples

curl — check if scraping is allowed

Python — permission check with caveat handling

Rate limits

Bulk Download

ZIP Structure

index.json Schema

Use Cases

Open Receipt Specification — Companion Spec External / Future context

Internal Logging

What to log

Implementation

Training Semantics: A, B, and C

Semantics A — Third-party training on site content

Semantics B — Service internal use of user data

Semantics C — Site training on agent-submitted data v0.4.0

Summary

CI/CD Validation

Handling `no_openterms_json`