Vocabulary specification for backward-compatible JSON-LD 1.1 extensions targeting AI/ML data exchange
Status: Draft Specification v0.1.0
Date: 2026-02-12
Part of: JSON-LD Extensions for AI/ML (jsonld-ex)
This document specifies the security extensions for jsonld-ex. Three mechanisms — context integrity verification, context allowlists, and resource limits — defend JSON-LD processing against context injection, resource exhaustion, and unauthorized remote context loading.
The formal property definition for @integrity (IRI, type, format) is in the Vocabulary specification, §15. This document defines the threat model, processing algorithms, configuration semantics, and enforcement behavior.
JSON-LD’s power derives from remote contexts: a single @context URL maps compact terms to full IRIs, enabling interoperable data exchange. However, remote context loading introduces an attack surface. A compromised or malicious context can silently redefine the meaning of every term in a document, turning a valid financial transfer into a fraudulent one or a benign medical record into a harmful instruction.
JSON-LD 1.1 acknowledges these risks in its security considerations but provides no standardized defense mechanisms. Processors are left to implement ad hoc protections — or none at all. The jsonld-ex security extensions fill this gap with three composable, backward-compatible mechanisms that can be adopted incrementally.
The key words “MUST”, “MUST NOT”, “SHOULD”, and “MAY” in this document are to be interpreted as described in RFC 2119.
algorithm-base64hash (e.g., sha256-n4bQgY...) that cryptographically identifies the expected content of a context.This section describes the threats that the jsonld-ex security extensions are designed to mitigate. Each threat is paired with the mechanism that addresses it.
Threat: An attacker compromises the resolution of a context URL — through DNS poisoning, BGP hijacking, or man-in-the-middle interception — and serves a malicious context that redefines term mappings.
Example: A document declares @context: "https://payments.example.org/v1". The legitimate context maps "source" to schema:sender and "destination" to schema:recipient. An attacker-controlled context swaps these mappings. A processor using the poisoned context interprets the document with reversed sender and recipient, potentially misdirecting a financial transfer.
Mitigation: Context integrity verification (§3). The document author embeds a cryptographic hash of the expected context content. Any modification — including semantically targeted term swaps — changes the hash, causing the processor to reject the context.
Threat: An attacker crafts a document with extreme nesting depth — deeply nested @graph containers, arrays of arrays, or recursive object structures — to exhaust stack space or memory and crash the processor.
Example:
{"@graph": {"@graph": {"@graph": {"@graph": "... hundreds of levels deep ..."}}}}
Mitigation: Resource limits (§5). The max_graph_depth parameter bounds the nesting depth of any document. Documents exceeding the limit are rejected before full processing begins.
Threat: An attacker submits an extremely large document to exhaust memory or disk space during parsing. This is especially relevant for services that accept JSON-LD input from untrusted sources (e.g., APIs, data pipelines, IoT gateways).
Mitigation: Resource limits (§5). The max_document_size parameter bounds the byte size of the serialized document.
Threat: A context references another context, which references another, forming a long or circular chain. This can exhaust network connections, memory, or processing time.
Mitigation: Resource limits (§5). The max_context_depth parameter bounds the depth of transitive context references.
Threat: A document from an untrusted source references an attacker-controlled context URL. Loading that URL may leak information (via the HTTP request itself), introduce malicious term mappings, or expose the processor to server-side attacks (e.g., slow responses causing resource exhaustion).
Mitigation: Context allowlists (§4). The processor is configured to load only contexts from approved URLs or URL patterns.
Context integrity verification enables a document author to declare the expected cryptographic hash of a context document. A conforming processor MUST verify the hash before using the context, and MUST reject the context if verification fails.
An integrity string has the format:
algorithm-base64hash
Where:
algorithm is one of the supported hash algorithms (see §3.2).- character separates the algorithm name from the hash.base64hash is the Base64-encoded (standard alphabet, with padding) digest of the context content.Examples:
sha256-n4bQgYhMfWWaL+qgxVrQFaO/TxsrC4Is0V1sFbDwCgg=
sha384-OLBgp1GsljhM2TJ+sbHjaiH9txEUvgdDTAzHv2P24donTt6/529l+9Ua0vFImLlb
sha512-MJ7MSJwS1utMxA9QyQLytNDtd+5RGnx6m808qG1M2G+YndNbxf9JlnDaNCVbRbDP2DDoH2Bdz33FVC6TrpzXbw==
| Algorithm | Hash Length (bits) | Base64 Length (chars) |
|---|---|---|
sha256 |
256 | 44 |
sha384 |
384 | 64 |
sha512 |
512 | 88 |
Processors MUST support all three algorithms. sha256 is RECOMMENDED as the default.
Processors MUST reject integrity strings that specify an unsupported algorithm.
To compute the integrity hash of a context:
Input: A context value (string or JSON object) and an algorithm identifier.
Procedure:
algorithm-base64hash.Example: Given the context {"name": "http://schema.org/name"}:
{"name": "http://schema.org/name"} (already sorted).sha256-<base64hash>.A context reference with integrity verification uses an object with @id and @integrity:
{
"@context": {
"@id": "https://schema.org/",
"@integrity": "sha256-n4bQgYhMfWWaL+qgxVrQFaO/TxsrC4Is0V1sFbDwCgg="
}
}
The @id identifies the context URL. The @integrity declares the expected hash.
When used in a context array, the integrity-verified context appears as an object element:
{
"@context": [
{
"@id": "https://schema.org/",
"@integrity": "sha256-n4bQgYhMfWWaL+qgxVrQFaO/TxsrC4Is0V1sFbDwCgg="
},
{
"@id": "https://example.org/custom-context",
"@integrity": "sha384-OLBgp1GsljhM2TJ+sbHjaiH9txEUvgdDTAzHv2P24donTt6/529l+9Ua0vFImLlb"
}
]
}
Input: A context reference (object with @id and @integrity) and the retrieved context content.
Procedure:
- character.sha256, sha384, sha512). If not, raise an error.When @integrity is declared on a context reference:
@integrity keyword.The integrity_context convenience function produces a context reference with a computed integrity hash:
Input: A context URL (string), the context content (string or JSON object), and an optional algorithm (default: sha256).
Output: A JSON object with @id and @integrity.
Example:
// Input: url="https://schema.org/", content=<schema.org context>, algorithm="sha256"
// Output:
{
"@id": "https://schema.org/",
"@integrity": "sha256-n4bQgYhMfWWaL+qgxVrQFaO/TxsrC4Is0V1sFbDwCgg="
}
Context allowlists restrict which remote context URLs a processor is permitted to load. This prevents documents from untrusted sources from introducing arbitrary remote contexts.
An allowlist configuration is a JSON object with the following properties:
| Property | Type | Default | Description |
|---|---|---|---|
allowed |
Array of strings | [] |
Exact context URLs that are permitted |
patterns |
Array of strings | [] |
URL patterns with wildcards |
block_remote_contexts |
Boolean | false |
If true, reject ALL remote contexts |
Example configuration:
{
"allowed": [
"https://schema.org/",
"https://w3id.org/security/v2"
],
"patterns": [
"https://example.org/contexts/*"
],
"block_remote_contexts": false
}
Patterns use two wildcard characters:
| Wildcard | Meaning | Example |
|---|---|---|
* |
Matches zero or more characters | https://example.org/* matches https://example.org/v1, https://example.org/v2/context |
? |
Matches exactly one character | https://example.org/v? matches https://example.org/v1, https://example.org/v2 but not https://example.org/v10 |
Patterns are matched against the entire URL. A pattern https://example.org/* does NOT match https://other.org/.
All other characters in a pattern are matched literally. Special regex characters in the URL or pattern (e.g., ., +, () are escaped — they carry no special meaning.
Input: A context URL (string) and an allowlist configuration (object).
Procedure:
block_remote_contexts is true, return denied.allowed array (exact string match), return permitted.patterns array:
a. Convert the pattern to a regular expression by escaping all regex-special characters, then replacing escaped \* with .* and escaped \? with ..
b. Anchor the regex with ^ and $.
c. If the URL matches the regex, return permitted.allowed array is non-empty OR the patterns array is non-empty (i.e., an allowlist is actively configured), return denied. The URL was not matched by any rule.allowed and patterns are empty and block_remote_contexts is false, return permitted. No allowlist is configured, so all URLs are allowed by default.When no allowlist is configured (empty allowed, empty patterns, block_remote_contexts is false), the processor permits all remote contexts. This preserves backward compatibility with standard JSON-LD 1.1 processing.
Allowlist enforcement is opt-in: it activates only when at least one of allowed, patterns, or block_remote_contexts is explicitly set.
Setting block_remote_contexts to true rejects all remote context URLs, regardless of the contents of allowed and patterns. This is the most restrictive configuration, suitable for environments where all contexts are embedded inline or pre-loaded.
Resource limits bound the size and complexity of JSON-LD documents before full processing begins. They defend against resource exhaustion attacks from oversized or deeply nested input.
| Parameter | Type | Default | Description |
|---|---|---|---|
max_context_depth |
Integer | 10 | Maximum depth of transitive context chains (context A references B which references C, etc.) |
max_graph_depth |
Integer | 100 | Maximum nesting depth of the document structure (nested objects, arrays, @graph containers) |
max_document_size |
Integer (bytes) | 10,485,760 (10 MB) | Maximum byte size of the serialized document |
max_expansion_time |
Integer (seconds) | 30 | Maximum wall-clock time for expansion processing |
The defaults are designed to accommodate the vast majority of legitimate JSON-LD documents while providing meaningful protection:
max_context_depth = 10: Real-world context chains rarely exceed 3–4 levels. A limit of 10 provides generous headroom while preventing unbounded chains.max_graph_depth = 100: Legitimate documents occasionally reach 20–30 levels of nesting (e.g., deeply nested organizational hierarchies). A limit of 100 provides ample margin.max_document_size = 10 MB: The schema.org context is approximately 1.5 MB. A 10 MB limit accommodates large but legitimate documents while preventing memory exhaustion from multi-gigabyte payloads.max_expansion_time = 30 seconds: Complex expansion with multiple remote context fetches may take several seconds. A 30-second limit prevents indefinite hangs while allowing for network latency.Processors SHOULD use these defaults when no explicit limits are provided. Processors MAY allow callers to override any or all parameters.
Input: A document (string, dict, or list) and an optional limits configuration (object with any subset of the four parameters).
Procedure:
max_document_size, reject the document with a size error.max_graph_depth, reject the document with a depth error.Note: max_context_depth and max_expansion_time are enforced during context resolution and expansion respectively, not during the initial document validation step. The enforcement algorithm above covers the pre-processing checks (max_document_size and max_graph_depth).
The nesting depth of a JSON structure is measured recursively:
Input: A JSON value and a current depth counter (initially 0).
Procedure:
null, a string, a number, or a boolean, return the current depth.current depth + 1. Return the maximum depth found.current depth + 1. Return the maximum depth found.Example: The document {"a": {"b": {"c": 1}}} has depth 3. The document {"a": [{"b": 1}, {"c": 2}]} has depth 2 (the array adds one level, each element adds one more, and the maximum is 2).
When a resource limit is exceeded, the error SHOULD include:
max_document_size, max_graph_depth).Example error message:
Document size 15728640 exceeds limit 10485760
Document depth 150 exceeds limit 100
When all three security mechanisms are in use, they are applied in the following order:
@integrity hash. If verification fails, reject the context.This ordering is designed to minimize unnecessary work: cheap checks (allowlist, size) run before expensive ones (hashing, depth traversal, expansion).
Each mechanism is independently optional. A processor MAY implement any combination:
A standard JSON-LD 1.1 processor that does not implement the jsonld-ex security extensions will:
@integrity: Treat the context reference object {"@id": "...", "@integrity": "..."} as a context identified by the @id value. The @integrity property is ignored. The context is loaded and used without verification. This is semantically correct but not security-hardened — the document is processed normally, just without the tamper-detection guarantee.The security extensions degrade gracefully:
@integrity are valid JSON-LD with or without integrity verification.Adopting the security extensions requires no changes to existing JSON-LD documents. Authors can add @integrity to context references incrementally, one context at a time. Processor operators can enable allowlists and resource limits independently.
Subresource Integrity (W3C Recommendation, 2016) defines integrity verification for web resources loaded via HTML <script> and <link> elements. The jsonld-ex @integrity mechanism is directly inspired by SRI:
| Aspect | SRI | jsonld-ex @integrity |
|---|---|---|
| Scope | HTML subresources (<script>, <link>) |
JSON-LD context references |
| Format | algorithm-base64hash |
algorithm-base64hash (identical) |
| Algorithms | sha256, sha384, sha512 | sha256, sha384, sha512 (identical) |
| Multiple hashes | Supported (space-separated) | Single hash per context reference |
| Failure behavior | Block resource loading | Reject context (fail-closed) |
The deliberate alignment with SRI’s format means that tools and libraries for computing SRI hashes can be reused for jsonld-ex integrity strings. The only difference is the application context: SRI protects browser subresources; jsonld-ex @integrity protects JSON-LD contexts.
Content Security Policy restricts which sources a browser may load resources from. The jsonld-ex context allowlist serves an analogous role for JSON-LD processors:
| Aspect | CSP | jsonld-ex Allowlists |
|---|---|---|
| Scope | Browser resource loading | JSON-LD context loading |
| Granularity | Per-resource-type (script-src, style-src, etc.) | Context URLs only |
| Wildcards | * in domain position |
* and ? in URL patterns |
| Block-all mode | default-src 'none' |
block_remote_contexts: true |
| Configuration | HTTP header or <meta> tag |
Processor configuration object |
The JSON-LD 1.1 specification (§5, Security Considerations) identifies the risks of remote context loading and recommends that implementations “provide mechanisms to limit” context loading. The jsonld-ex security extensions provide concrete, standardized implementations of these recommendations.
The reference implementation is in the jsonld_ex.security module of the jsonld-ex Python package.
| Function | Spec Section |
|---|---|
compute_integrity(context, algorithm) |
§3.3 |
verify_integrity(context, declared) |
§3.5 |
integrity_context(url, content, algorithm) |
§3.7 |
is_context_allowed(url, config) |
§4.3 |
enforce_resource_limits(document, limits) |
§5.3 |