<?xml version="1.0" encoding="UTF-8"?>
<rfc category="exp" consensus="true" docName="draft-silver-aipref-vocab-substitutive-00" ipr="trust200902" sortRefs="true" submissionType="IETF" symRefs="true" tocInclude="true" version="3" xmlns:xi="http://www.w3.org/2001/XInclude">
  <front>
    <title abbrev="aipref-autoctl">Vocabulary For Expressing AI Substitutive Usage</title>
    <seriesInfo name="Internet-Draft" value="draft-silver-aipref-vocab-substitutive-00"/>
    <author fullname="Bradley Silver">
      <organization>Advance</organization>
      <address>
        <postal>
          <country>United States of America</country>
        </postal>
        <email>bsilver@advance.com</email>
      </address>
    </author>
    <date year="2025" month="September" day="8"/>
    <area>Applications</area>
    <workgroup>AI Preferences</workgroup>
    <keyword>AI Preferences</keyword>
    <keyword>Opt-Out</keyword>
    <keyword>Vocabulary</keyword>
    <abstract>
      <t>
        This Internet Draft proposes a category entitled "AI Substitutive Use" which would enable parties to express a preference 
        regarding how digital assets are used by automated processing systems, with a focus on post-training (inference-time) uses that are likely to 
        result in the creation of AI-generated outputs that substitute for the original asset.  
        The proposal is for this category to nest within the larger category of Automated Processing, 
        currently envisaged in the working group draft <xref target="AIPREF-VOCAB"/> (21 July 2025).
      </t>
    </abstract>
    <note removeInRFC="true">
      <name>About This Document</name>
      <t>The latest revision of this draft can be found at <eref target="https://datatracker.ietf.org/doc/draft-silver-aipref-vocab-substitutive/"/>.
      Status information for this document may be found at <eref target="https://datatracker.ietf.org/doc/draft-silver-aipref-vocab-substitutive/"/>.</t>
      <t>Discussion of this document takes place on the
      AI Preferences Working Group mailing list (<eref target="mailto:ai-control@ietf.org"/>),
      which is archived at <eref target="https://mailarchive.ietf.org/arch/browse/ai-control/"/>.
      Subscribe at <eref target="https://www.ietf.org/mailman/listinfo/ai-control/"/>.</t>
    </note>
  </front>

  <middle>
    <section anchor="rationale">
      <name>Rationale</name>
      <t>
        Existing mechanisms for expressing preferences, including those under consideration by the AI Preferences WG do not address 
        concerns which have been strongly articulated about the practice of using digital assets as input to AI models to generate 
        outputs which substitute for or undermine the value of the original assets.  This gap leaves a broad group of stakeholders 
        (including creators, journalists and publishers) without a means to express a preference regarding a type of use which is 
        already having a material adverse impact on their rights.  Developers and deployers of AI systems are also left without a 
        clear, standardized preference signal regarding such uses, which results in "blunt" approaches to gathering such content - 
        exposing them to legal risk.  This proposal intends to define a tailored preference category to address the specific need, 
        improve visibility across the board and support continued broad access to information and content.    
      </t>
      <t>
        The use of digital assets for inferencing "in real time" is widespread as a means of improving the accuracy and contextual 
        relevance of outputs, such as through the use of techniques such as Retrieval-Augmented Generation (RAG) <xref target="RAG2020"/>. The flipside of 
        that value is that such outputs are inclined to substitute for or dilute the value of the original asset, which decreases 
        user engagement with the original asset.  This harms revenue opportunities and undermines the ability of the owner or 
        distributor of the original asset to connect directly with their intended audience. For example, the use of journalistic 
        material to create AI-generated summaries which have resulted in the substantial reduction of internet traffic to online 
        publications. In the longer term, this jeopardises the sustainability of those enterprises and the underlying incentives 
        to create and publish such material. To mitigate this, some are moving content behind paywalls and deploying other means of 
        limiting open access - diminishing access to information and content.  
      </t>
      <t>
        Should incentives to create diminish, AI innovation will also suffer as a result of less quality content on which to build 
        a distribution funnel.  This would also undermine the sustainability and verifiability of news and information services 
        relied upon by the public and government institutions.  Where the AI model or platform takes on the role of information 
        gatekeeper and shaper, connections between the public and original sources can be severed (or warped), which undermines 
        the ability and willingness of internet users to ensure what they are reading, hearing or watching matches the original 
        source(s), allowing factual misrepresentations to propagate and go unchecked.   
      </t>
      <t>
        Creators have also justifiably expressed the need for a preference that addresses the use of their assets to create 
        derivative works "in the style of" such original assets.  Creators are harmed by the unfettered use of their works as 
        inputs to AI Models to create outputs which dilute the market for their works, adopting distinctive elements and styles 
        established by the creators themselves - which also harms their moral rights and interests to protect the integrity of 
        their works and ensure attribution.
      </t>


    </section>

    <section anchor="conventions-and-definitions">
      <name>Conventions and Definitions</name>
      <t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL
      NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
      "<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as
      described in BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they
      appear in all capitals, as shown here.</t>
      <t>
        For the purposes of this document, the following terms are used:
      </t>
      <ul spacing="normal">
        <li>
          <t>
            <strong>Post-training (inference-time)</strong>: Uses of an AI/ML model that occur after the model has been trained and frozen, typically
            when generating outputs in response to inputs at runtime.
          </t>
        </li>
        <li>
          <t>
            <strong>Retrieval-Augmented Generation (RAG)</strong>: A technique where external content is retrieved at query time and supplied to a
            model to condition the generated output. This document references RAG as a common mechanism by which substitutive outputs may be produced
            <xref target="RAG2020"/>.
          </t>
        </li>
      </ul>
    </section>

    <section anchor="vocabulary-definition">
      <name>Vocabulary Definition</name>
     
      <section anchor="ai-substitutive-use">
        <name>AI Substitutive Use Category (New)</name>
        <t>
          The Act of using one or more assets as input to a trained AI/ML model (as opposed to the training of the model) which results 
          in an output which incorporates, summarizes, aggregates or reproduces the assets, including stylistic elements thereof; 
          provided, however that this category does not cover the use of a lawfully acquired digital asset where carried out directly 
          by an end user (as opposed to a search application or bot) as input to a trained model to create a summary of such digital asset.   
        </t>
        <t>
          The use of assets for AI Substitutive Use is a proper subset of Automated Processing usage <xref target="AIPREF-VOCAB"/>.
        </t>
        <t>
          This category is distinct from AI Training or Generative AI Training, as it addresses uses that occur after a model has been trained, 
          during inference. It is also distinct from Search, which covers uses that direct users back to the original asset. Substitutive Use, 
          by contrast, describes outputs that replace, reduce the utility of, or make the source asset redundant to users by summarizing, reproducing, 
          or restyling its contents. 
        </t>
        <t>
          Consistent with that objective, this category would not apply where end users are summarising digital assets which they have already acquired, 
          outside of the context of search or retrieving such assets from online locations in summarized form. 
        </t>
      </section>
      
    </section>

    <section anchor="iana-considerations">
      <name>IANA Considerations</name>
      <t>This document has no IANA actions.</t>
    </section>
  </middle>

  <back>
    <references anchor="normative">
      <name>Normative References</name>
      <reference anchor="RFC2119">
        <front>
          <title>Key words for use in RFCs to Indicate Requirement Levels</title>
          <author initials="S." surname="Bradner" fullname="Scott Bradner">
            <organization>Harvard University</organization>
          </author>
          <date year="1997" month="March" />
        </front>
        <seriesInfo name="BCP" value="14" />
        <seriesInfo name="RFC" value="2119" />
        <seriesInfo name="DOI" value="10.17487/RFC2119" />
      </reference>
      <reference anchor="RFC8174">
        <front>
          <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
          <author initials="B." surname="Leiba" fullname="Barry Leiba">
            <organization>Huawei Technologies</organization>
          </author>
          <date year="2017" month="May" />
        </front>
        <seriesInfo name="BCP" value="14" />
        <seriesInfo name="RFC" value="8174" />
        <seriesInfo name="DOI" value="10.17487/RFC8174" />
      </reference>
      <reference anchor="RFC9309" target="https://www.rfc-editor.org/info/rfc9309" quoteTitle="true" derivedAnchor="RFC9309">
        <front>
          <title>Robots Exclusion Protocol</title>
          <author initials="M." surname="Koster" fullname="Martijn Koster"/>
          <author initials="G." surname="Illyes" fullname="Gary Illyes"/>
          <author initials="H." surname="Zeller" fullname="Henner Zeller"/>
          <author initials="L." surname="Sassman" fullname="Lizzi Sassman"/>
          <date year="2022" month="September" />
        </front>
        <seriesInfo name="RFC" value="9309" />
        <seriesInfo name="DOI" value="10.17487/RFC9309" />
      </reference>
    </references>
    <references anchor="informative">
      <name>Informative References</name>
      <reference anchor="RAG2020" target="https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html">
        <front>
          <title>Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks</title>
          <author>
            <organization>Facebook AI Research</organization>
          </author>
          <author>
            <organization>University College London</organization>
          </author>
          <author>
            <organization>New York University</organization>
          </author>
          <date year="2020"/>
        </front>
        <seriesInfo name="NeurIPS" value="2020"/>
        <format type="HTML" target="https://proceedings.neurips.cc/paper/2020/hash/6b493230205f780e1bc26945df7481e5-Abstract.html"/>
      </reference>
      <reference anchor="AIPREF-VOCAB" target="https://datatracker.ietf.org/doc/draft-ietf-aipref-vocab/">
        <front>
          <title>AI Preferences Vocabulary</title>
          <author>
            <organization>IETF AI Preferences Working Group</organization>
          </author>
          <date year="2025" month="July" day="21"/>
        </front>
        <seriesInfo name="Internet-Draft" value="draft-ietf-aipref-vocab" />
      </reference>
    </references>
  </back>
</rfc>