What is /robots.txt?

The robots.txt movement, introduced in 1994, is a great success story in online copyright compliance. It allows website owners to manage how their content is accessed by web crawlers through a simple text file that specifies which parts of the site are off-limits. This empowered both individuals and businesses to protect their sensitive data, manage their server loads, and shape their online presence.

Over time, this technical solution (formally, the Robots Exclusion Protocol) has become a legal precedent, establishing a principle of consent in automated data usage that resonates deeply with DTOM.

A Powerful Precedent

A number of legal victories tied to the robots.txt framework have demonstrated that robots.txt can be legally enforced. Two examples are:

  • (2017) hiQ Labs Inc. v. LinkedIn Corp. The Ninth Circuit Court of Appeals upheld LinkedIn's right to use its robots.txt file to block unauthorized scraping of its data, citing it as a critical mechanism for enforcing consent boundaries.

  • (2008) Perfect 10, Inc. v. Google, Inc. The Ninth Circuit Court of Appeals recognized the role of robots.txt in managing intellectual property disputes by limiting the visibility of copyrighted materials on search engines.

In the past few years, robots.txt has been used specifically to opt-out of AI training. Most major AI companies respect robots.txt, including: OpenAI, Google, Apple, Anthropic, and more.

DTOM: Built for Content

The robots.txt movement was built for websites in an age when web indexing (a.k.a. Google) was just getting started, and primarily allows websites finer control over their search engine results. In the case of AI training, robots.txt is only useful at the website-level, meaning that if your content is found on a website that does not have a suitable robots.txt configuration, there is no protection.

Building on the principles of robots.txt, the DTOM Declaration is instead applied at the content-level. This gives control directly to the creators of the content: wherever the content is published, the DTOM Declaration will go with it. To achieve this, DTOM uses a mix of innovative watermarking techniques and robust metadata tagging to create a universally-recognizable “opt-out” signal.