Vocabulary For Expressing AI Substitutive Usage

Rationale Existing mechanisms for expressing preferences, including those under consideration by the AI Preferences WG do not address concerns which have been strongly articulated about the practice of using digital assets as input to AI models to generate outputs which substitute for or undermine the value of the original assets. This gap leaves a broad group of stakeholders (including creators, journalists and publishers) without a means to express a preference regarding a type of use which is already having a material adverse impact on their rights. Developers and deployers of AI systems are also left without a clear, standardized preference signal regarding such uses, which results in "blunt" approaches to gathering such content - exposing them to legal risk. This proposal intends to define a tailored preference category to address the specific need, improve visibility across the board and support continued broad access to information and content. The use of digital assets for inferencing "in real time" is widespread as a means of improving the accuracy and contextual relevance of outputs, such as through the use of techniques such as Retrieval-Augmented Generation (RAG) . The flipside of that value is that such outputs are inclined to substitute for or dilute the value of the original asset, which decreases user engagement with the original asset. This harms revenue opportunities and undermines the ability of the owner or distributor of the original asset to connect directly with their intended audience. For example, the use of journalistic material to create AI-generated summaries which have resulted in the substantial reduction of internet traffic to online publications. In the longer term, this jeopardises the sustainability of those enterprises and the underlying incentives to create and publish such material. To mitigate this, some are moving content behind paywalls and deploying other means of limiting open access - diminishing access to information and content. Should incentives to create diminish, AI innovation will also suffer as a result of less quality content on which to build a distribution funnel. This would also undermine the sustainability and verifiability of news and information services relied upon by the public and government institutions. Where the AI model or platform takes on the role of information gatekeeper and shaper, connections between the public and original sources can be severed (or warped), which undermines the ability and willingness of internet users to ensure what they are reading, hearing or watching matches the original source(s), allowing factual misrepresentations to propagate and go unchecked. Creators have also justifiably expressed the need for a preference that addresses the use of their assets to create derivative works "in the style of" such original assets. Creators are harmed by the unfettered use of their works as inputs to AI Models to create outputs which dilute the market for their works, adopting distinctive elements and styles established by the creators themselves - which also harms their moral rights and interests to protect the integrity of their works and ensure attribution.

Conventions and Definitions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 when, and only when, they appear in all capitals, as shown here. For the purposes of this document, the following terms are used:

Post-training (inference-time): Uses of an AI/ML model that occur after the model has been trained and frozen, typically when generating outputs in response to inputs at runtime.
Retrieval-Augmented Generation (RAG): A technique where external content is retrieved at query time and supplied to a model to condition the generated output. This document references RAG as a common mechanism by which substitutive outputs may be produced .

Vocabulary Definition

AI Substitutive Use Category (New) The Act of using one or more assets as input to a trained AI/ML model (as opposed to the training of the model) which results in an output which incorporates, summarizes, aggregates or reproduces the assets, including stylistic elements thereof; provided, however that this category does not cover the use of a lawfully acquired digital asset where carried out directly by an end user (as opposed to a search application or bot) as input to a trained model to create a summary of such digital asset. The use of assets for AI Substitutive Use is a proper subset of Automated Processing usage . This category is distinct from AI Training or Generative AI Training, as it addresses uses that occur after a model has been trained, during inference. It is also distinct from Search, which covers uses that direct users back to the original asset. Substitutive Use, by contrast, describes outputs that replace, reduce the utility of, or make the source asset redundant to users by summarizing, reproducing, or restyling its contents. Consistent with that objective, this category would not apply where end users are summarising digital assets which they have already acquired, outside of the context of search or retrieving such assets from online locations in summarized form.

IANA Considerations This document has no IANA actions.