Trainspot API

Overview

The Trainspot API provides a scalable solution for AI builders to verify the rights and policies regarding the usage of content for AI training. It serves as an authoritative source for determining whether rights holders allow their works to be used for AI training.

The API returns results based on (a) what’s been registered on Trainspot, and (b) other publicly available declarations of AI usage policy; we now support robots.txt in this way, even for sites that are not registered by their owners on Trainspot.

The API supports policy lookup for all content types that can be registered on Trainspot: Websites, YouTube channels, books, and GitHub repositories. It will be extended to include additional content types over time.

AI builders can see if the owner of given piece of content has declared their content to be open for AI usage, blocked, or available for commercial licensing. Content with no set policy will be identified as such.

By using the Trainspot API, AI developers can ensure compliance with content rights policies efficiently and at scale. The streamlined design of the API allows for robust interaction with content listings, ensuring clarity and legal use of materials for AI training purposes.

Key Features

  • Rights Verification: Rights holders specify policies, which determine whether content is freely available or restricted. The API verifies these policies for AI builders, helping them understand whether they can use a specific piece of content.
  • Highly Scalable and Available: The API is designed to support thousands of simultaneous clients and offers a highly available infrastructure.
  • JSONL Payload Format: API requests are made in JSONL (application/jsonl) format with UTF-8 encoding, and all queries are structured as JSON objects.

Usage

Connecting to the API

  • Base URL: The API is accessed through the specified URL (https://api.trainspot.ai/policy).
  • POST Requests Only: All interactions with the API are handled through POST requests. Any other request methods will be rejected.
  • Headers: Optional request headers can be used to optimize performance:
    • TrainSpot-Default-Query-Type: Specifies a default type for the query. One of website, youtube, book, or resource.
    • TrainSpot-Result-Order: Specifies whether query results should be returned in the order they were sent or as soon as they are processed. If specified, the value must be one of ordered or unordered.

Request Format

  • Payload Format: Requests are expected in the form of JSONL (JSON Lines), where each line is a separate query.
  • Required Fields: Each query must include at least the r field (reference to the work).
  • Optional Fields: A query may include a t field (query type) and the i field (query identifier). In the case of a book, the a field (author) may be provided in the case of the r field specifying the title of the book.
  • Query Type Field Values: The t field can be one of the following values: website, youtube, book, or resource. If the type is not specified, the header value for `TrainSpot-Default-Query-Type` will be used. If that is not set, the type will be considered unspecified and will fall back to the `resource` type. By not specifying a type, the could take longer to process the query.
  • Query Identifier Field Values: This field is an optional value provided by the caller. This can be used to track the query and match it to its reply. Specifying this field along with unordered in the header will allow the API to return the replies in which they are computed. As some queries may take longer to process, this can be useful for getting replies for other queries and keeping a pipeline moving.

Example Query Structure:

{
  "r": "reference_to_work",
  "t": "query_type",
  "i": "unique_identifier"
}

API Responses

  • The API will return a single JSONL response with a reply JSON object for each query, indicating whether the resource is available for use.
  • The reply includes a p field with one of three possible values:BLOCKED, OPEN, or UNSPECIFIED.
  • If available, the API may also return a l field containing the fully qualified URL of the listing.
  • An i field is included in the reply to match the query identifier provided in the request. If no identifier was provided, this field will contain a server-generated identifier.

Example Response Structure:

{
  "i": "unique_identifier",
  "p": "OPEN",
  "l": "https://www.trainspot.ai/listing/11111111-2222-4444-8888-000000000000"
}

Query Example

POST /policy HTTP/2
Content-Type: application/jsonl; charset=utf-8

{
  "r": "cnn.com",
  "t": "website",
  "i": "123"
}

Performance Considerations

  • Timeout: Each request session has a 60-second idle timeout, meaning the server will terminate the connection if no valid queries are received within that time.
  • Query Length: Individual queries can be up to 32,768 bytes in length.
  • Unordered Results: By using unordered results, it's possible to get replies for other queries while waiting for longer queries to complete. This enables a more efficient pipeline.
  • Query Type: Specifying the query type can help the API process the query more efficiently. If the type is not specified, the API will default to the resource type, which may take longer.

Error Handling

  • HTTP 405 - Method Not Allowed: Non-POST requests will result in this error.
  • HTTP 415 - Unsupported Media Type: Requests not using application/jsonl; charset=utf-8 as the Content-Type will be rejected.
  • Session Termination: Queries that are too long, malformed, or contain invalid field names will result in the immediate termination of the session.

API Explorer