How it Works
The Problem
Trainspot is the first highly scalable marketplace for business-ready AI training data, and solves both:
The Opportunity
Eliminate friction for both developers and rights holders
The demand for high-quality, on-topic, and legally compliant training data is skyrocketing as AI becomes ubiquitous. Until now, access to commercial high-quality datasets has been restricted to multi-million dollar deals between giant company technology platforms, brokered over months by armies of lawyers.
Trainspot makes it easy for product developers and researchers to search for and acquire data from the entire web, including specialized content and datasets, directly from their owners. As a result, developers and businesses instantly get the data they need while enabling creators and content owners to prosper in the rapidly growing AI economy.
Solve the problem at internet scale
Individual creators and content owners such as photographers, writers, YouTubers, developers, etc, are vulnerable to data theft or exploitation. On Trainspot, owners sell, donate, or block the use of their data. This empowers data and content owners with the control they need to protect their rights and profit from their intellectual property.
Listing data on Trainspot is free.
We are a global community imagining and building the AI future where everyone is welcome. Join us.
FAQ: Content Owners
- What kind of content can be registered on Trainspot?
- When you register content on Trainspot, you create what’s called a listing. You can create a listing by registrering content representing any form of human creativity or knowledge that can be expressed in digital form.
- You can register these types of listings:
- A website
- A YouTube Channel
- A book
- A GitHub repo
- A collection, which can include any of the above listing types
- For example, listing a YouTube channel starts with authenticating with the Google account associated with a channel, then going through a few steps to choose a policy:
- Commercial to make your listing immediately available for commercial licensing to AI developers
- Block to signal to AI developers that it is noOpenage, or
- Open in the case where you are making your content available AI use without limitation.
- After topics and descriptions are added, the listing is available via the Trainspot marketplace and the Trainspot registry for AI developers to query the listing’s policy via API.
- Does Trainspot host my content?
- In most cases, a listing represents content or data already available on the internet, so that content or data is not hosted on Trainspot.
- However, there are cases where the listing represents content or data that is not fully available on the internet already, as in the case of a book, or a company’s private dataset.
- In these cases where the content or data being licensed is not available on the internet, Trainspot will host the necessary files and provide them to the licensee.
- How does Trainspot know that content can only be registered by its rightful owners?
- Trainspot uses a variety of techniques to determine if a given user has the rights to content that they’re registering.
- For websites, users are required to write a file to their website or make an addition to their domain settings to demonstrate that they have control over the website in question.
- For YouTube channels, we require that users authenticate with the Google account that holds their channel.
- For books, we require that users provide us with the author(s), title, and ISBNs (optionally) for each book they’re registering. Book listings are manually approved by Trainspot, and we ask that users register their author website (if they have one) and/or authenticate with their X/Twitter account to assist us with the process.
- For GitHub repositories, we require that users authenticate with GitHub and select a repository that they own, and that is not a fork.
- In all cases, we require that users legally confirm that they have the rights to register their content by agreeing to our Content Provider Agreement. In addition, we require that users listing commercial content sign up with Stripe in order to be paid, and in doing so verify their actual identity.
- How do I get paid when my content is licensed?
- Trainspot uses Stripe for payments. Content owners who list content for commercial licensing need to register with Stripe in order to be paid. This is done within the Payment Settings screen on your account page.
- Once a listing is registered as commercial, and payment settings have been enabled for the account, Stripe will distribute payments to your bank account within 48 hours of a licensee’s purchase.
- In the case of a Buy Now price, this will be automatic. In the case of Make Offer, the transaction will be considered complete when either the buyer or seller agrees to the offer made by the other party.
- While it is not required in order to receive offers for your content, Trainspot suggests enabling Payment Settings so as to not introduce delays in the buying process, which could lead to lost sales. In addition, setting a Buy Now price requires that Payment Settings have been enabled for your account.
- What does Trainspot charge? How does it make money?
- Listing content on Trainspot is free. We charge a platform fee of 15% on transactions, which cover transaction fees and our own costs. If you sell a license for your content for $1000, you will receive $850 in payment.
- What rights do licensees get?
- When a buyer licenses content on Trainspot, they are purchasing the right to use that content for AI use for one year.
- “AI use” includes model training, fine-tuning, and use as grounding data in conjunction with techniques such as RAG (retrieval-augmented generation). Licensees do not obtain the rights for non-AI uses: reproduction, redistribution and other rights do not constitute AI use.
- Content owners can choose to offer tiered pricing based on the number of end users who will be allowed to access the AI developer’s product(s) for the term of the license. We offer three tiers:
- Up to 10,000 users
- Up to 10,000,000 users
- Unlimited users
- Please read the Content License Agreement for the full set of rights granted.
- What does it mean when I set my AI usage policy to “block”? Does Trainspot block unauthorized AI access?
- The block policy in Trainspot is either inferred from robots.txt on a website or directly expressed as a policy choice in Trainspot. The purpose of setting a block policy when registering a listing is to clearly publicly advertise your intent to prohibit AI usage of your content.
- A block policy acts as a “no trespassing” sign that’s visible both to people searching the Trainspot Marketplace and to callers of the Trainspot API. AI developers looking for a convenient, centralized way to determine AI usage policies can use the Trainspot API for this pupose.
- When you register a website on Trainspot, we offer you an updated robots.txt that reflects your “block” policy, and AI crawlers that respect robots.txt will not scrape your website. However, not all AI crawlers can be blocked via robots.txt, and it should not be considered to be an airtight defense against AI crawlers. Companies such as Cloudflare offer a free service to block these AI scrapers more effectively via their proxy network.
FAQ AI Developers
- How do I purchase a license on Trainspot?
- Purchasing a license on Trainspot is as easy as buying something on eBay or Amazon. You can pay with a credit card and other familiar payment methods.
- How does “make offer” work?
- Make Offer is a powerful feature of the Trainspot Marketplace. If a Buy Now price is not listed, or if you want to make an offer below the listed price, Make Offer is a notification-based mechanism to offer a price at which you would license the listing.
- Your offer may be accepted, at which point your purchase can be completed.
- However, it may also be rejected, countered, or potentially not acknowledged. Make Offer is designed to remove friction in the buying process, to enable more price transparency to both buyers and sellers and capture transactions that may otherwise not occur without this direct buyer-to-seller structured price negotiation.
- What can I do with “open” content listed on Trainspot?
- “Open” listings on Trainspot are open and free for AI use, not necessarily for other purposes. “AI use” in Trainspot means that content can be used for
- Model training
- Fine tuning
- As grounding data, in conjunction with RAG or similar techniques
- An AI developer can use content in open listings for AI purposes, without limits on the number of users who can see them, and without a requirement of attribution.
- “AI use” does not include redistribution, copying, or other purposes beyond the typical modes in which data is used in AI products.
- There are two types of open listings available on Trainspot.
- First, there are open listings that Trainspot has listed, for content that is either public domain in the US, or licensed under a Creative Commons CC0 Universal license or equivalent “no rights reserved” license, which permit any use, without limitation, globally.
- Second, there are open listings registered by their owners on Trainspot. Rights holders can list their own content as open for AI use while retaining other rights if they so desire. In these cases, the listing policy must be consistent with the rights and terms of use granted on the original content — that is, the license or terms of use published with or accompanying the content itself must allow for AI usage as defined above, without a limitation on scale of use or a requirement of attribution.
- We hope that the combination of commercial, blocked, and open listings together provide a one-stop shop for developers seeking the most comprehensive source of content and data for their needs.
- I’d like to license a large collection of data. Does Trainspot allow that?
- The Trainspot Marketplace is composed of individual listings that have been registered by the individual rights holders.
- This is very powerful when combined with the ability to create collections of listings. Collections enable very large sets of individually registered lists to be licensed with a single transaction.
- For example, a given author may create a listing for 10 individual books, but they may also create a listing of all 10 books in a single listing as a collection which will also be available to license with one aggregate Buy Now transaction. In this case, the collection only contains individual lists owned by a single rights holder.
- A powerful feature of the Trainspot marketplace enables the Trainspot community to curate lists or saved searches as collections which may be public or private.
- Public collections may have a one-click Buy Now price if it consists of listing all of which have a Buy Now price (the aggregate value of all the individual listings combined), or to take advantage of the Make Offer machinery to settle on the whole or partial collection of individual listing.
- As the Trainspot marketplace grows, we believe AI model developers will be able to conveniently license very large-scale collections that have been authenticated and registered by a very large community of rights holders.
- If I’ve trained a model during the term of my license, can I continue to use that model after my license expires?
- Content licensed for AI use on Trainspot can be used only during the period of time in which the license is active. Trained models or other derivative artifacts created using the licensed content can only be used while the license is active.
- I don’t see the content I need. Can Trainspot help me find it?
- If you can’t find the content you’re looking for, Trainspot is now developing a new feature that will allow you to request bids on the content you need. Content creators and rights holders will be able to respond to your request with content they already own, or content they can create to fit your requirements. We’ll have more to say when the feature is ready to be used.
- I know of some open content that should be listed on Trainspot. How can I get it listed?
- An Open listing on Trainspot represents content or data that is available via the open internet and offered by the rights holder without limitation under a license such as Creative Commons 0 or similar “no rights reserved” licenses.
- We offer Open listings as part of our mission to provide a clean, well-lighted place to find any content available commercially, or free to developers. The Trainspot community may suggest listing for open listings even if they are not the rights holder, so long as the content or data is available via the Internet and meets the licensing requirements outlined above.