<?xml version="1.0"?>

<?rfc sortrefs="yes"?>
<?rfc subcompact="no"?>
<?rfc symrefs="yes"?>
<?rfc toc="yes"?>
<?rfc tocdepth="3"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>

<rfc
   xmlns:xi="http://www.w3.org/2001/XInclude"
   category="bcp"
   consensus="true"
   submissionType="IETF"
   docName="draft-spaghetti-sidrops-avoid-rpki-state-in-bgp-00"
   ipr="trust200902">
<!-- For consideration: updates="7115" -->
  
<front>

<!-- sources:
https://mailarchive.ietf.org/arch/msg/sidrops/YV1WfoxQNiwfOjtKIY1d6YJjRxM/
-->
    
  <title abbrev="Avoid RPKI State in BGP">Guidance to Avoid Carrying RPKI Validation States in Transitive BGP Path Attributes</title>

  <author fullname="Job Snijders" initials="J." surname="Snijders">
    <organization>Fastly</organization>
    <address>
      <postal>
        <street />
        <city>Amsterdam</city>
        <code />
        <country>Netherlands</country>
      </postal>
      <email>job@fastly.com</email>
    </address>
  </author>

  <author fullname="Tobias Fiebig" initials="T." surname="Fiebig">
    <organization abbrev="MPI-INF">Max-Planck-Institut fuer Informatik</organization>
    <address>
      <postal>
        <street>Campus E14</street>
        <city>Saarbruecken</city>
        <code>66123</code>
        <country>Germany</country>
      </postal>
      <phone>+49 681 9325 3527</phone>
      <email>tfiebig@mpi-inf.mpg.de</email>
    </address>
  </author>
  
  <author fullname="Massimiliano Stucchi" initials="M. A." surname="Stucchi">
    <organization>AS58280.net</organization>
    <address>
      <postal>
        <city>Bruettisellen</city>
        <country>Switzerland</country>
      </postal>
      <email>max@stucchi.ch</email>
    </address>
  </author>
  <date />

  <abstract>
    <t>
      This document provides guidance to avoid carrying Resource Public Key Infrastructure (RPKI) derived Validation States in Transitive Border Gateway Protocol (BGP) Path Attributes.
      Annotating routes with attributes signalling validation state may flood needless BGP UPDATE messages through the global Internet routing system, when, for example, Route Origin Authorizations are issued, revoked, or RPKI-To-Router sessions are terminated.
    </t>
    <t>
      Operators <bcp14>SHOULD</bcp14> ensure Validation States are not signalled in transitive BGP Path Attributes.
      Specifically, Operators <bcp14>SHOULD NOT</bcp14> group BGP routes by their Prefix Origin Validation state into distinct BGP Communities.
    </t>
  </abstract>
  
</front>

<middle>
  
  <section title="Introduction">

    <t>
      The Resource Public Key Infrastructure (RPKI) <xref target="RFC6480"/> allows for validating received routes, e.g., for their Route Origin Validation (ROV) state.
      Some operators and vendors suggest to use distinct BGP Communities <xref target="RFC1997"/> <xref target="RFC8092"/> to annotate received routes with their validations state.
      The claim is that this practice is useful, as validation state can be signalled, e.g., to iBGP speakers, without requirering each iBGP speaker to perform their own route origin validation.
    </t>
    <t>
      However, annotating a route with a transitive attribute means that a BGP update message has to be send to each neighbor if such an attribute changes.
      This means that when, for example, Route Origin Authorizations <xref target="RFC6482"/> are issued, revoked, or RPKI-To-Router <xref target="RFC8210"/> sessions are terminated, a BGP UPDATE message will be sent for a route that was previously annotated with a BGP Community.
      Furthermore, given that BGP Communities are a transitive attribute, this BGP UPDATE will have to propagate through the whole default free zone (DFZ).
    </t>
    <t>
      Hence, this document provides guidance to avoid carrying Resource Public Key Infrastructure (RPKI) <xref target="RFC6480"/> derived Validation States in Transitive Border Gateway Protocol (BGP) Path Attributes <xref target="RFC4271" section="5"/>.
      Specifically, Operators are <bcp14>SHOULD NOT</bcp14> group BGP routes by their Prefix Origin Validation state <xref target="RFC6811"/> into distinct BGP Communities <xref target="RFC1997"/> <xref target="RFC8092"/>.
      Not using BGP Communities to signal RPKI validation state prevent needless BGP UPDATE messages from being flooded through the global Internet routing system.
    </t>

    <section title="Requirements Language">

      <t>
        The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 <xref format="default" pageno="false" target="RFC2119"/> <xref format="default" pageno="false" target="RFC8174"/> when, and only when, they appear in all capitals, as shown here.
      </t>

    </section>

  </section>

  <section title="Scope" anchor="scope">
    <t>
      This document discusses signalling of RPKI validation state to BGP neighbors using transitive BGP attributes.
      At the time of writing, this pertains to the use of BGP Communities <xref target="RFC1997"/> <xref target="RFC8092"/> to signal RPKI ROV using ROAs.
      Note that this includes all operator specific BGP Communities to signal validation state, as well as any current or future documented well-known BGP Communities marking validation state, as, e.g., described for extended BGP Communities in <xref target="RFC8097"/>.
    <t>
    </t>
      However, beyond that, this document also applies to all current and future transitive BGP attributes that may be implicitly or explicitly used to signal validation state to neighbors.
      Similarly, it applies to all future validation mechanics of RPKI, e.g., ASPA <xref target="I-D.ietf-sidrops-aspa-profile"/> and any other future validation mechanic build upon the RPKI.
    </t>
  </section>

  <section title="Risks of Signaling Validation State With Transitive Attributes" anchor="signalling-risks">
    <t>
      This section outlines the risks of signalling RPKI Validation State using BGP Communities.
      While the current description is specific to BGP communities, the observations hold similar for all transitive attributes that may be added to a route.
      Furthermore, we will present data on the measured current impact of BGP Communities being used to signal RPKI Validation state.
    </t>
    <section title="Triggers for Large-Scale Validation Changes" anchor="outage">
      <t>
        Here, we describe examples for how a large amount of RPKI ROV changes may occur in a short time, leading to a large amount of BGP Updates being send.
      </t>
      <section title="ROA Issuance" anchor="outage-roa-issuance">
        <t>
          Large-Scale ROA issuance should be a comparatively rare event for individual networks.
          However, several cases exist where issuance by individual operators or (malicious) coordinated issuance of ROAs by multiple operators may lead to a high churn triggering a continuous flow of BGP Update messages caused by operators using transitive BGP attributes to signal RPKI validation state.
        </t>
        <t>
          Specifically:
        </t>
        <ul spacing="normal">
          <li>
            When one large operator newly starts issuing ROAs for their netblocks, possibly by issuing one ROA with a long maxLength covering a large number of prefixes.
            This may also occur when incorrectly migrating to minimally covering ROAs <xref target="RFC9319"/>, i.e., when the previous ROA is first revoked (see <xref target="outage-roa-revocation"/>) and the new ROAs are only issued after this revocation has been propagated, e.g., due to an operational error or bug in the issuance pipeline used by the operator.
          </li>
          <li>
            When multiple smaller operators coordinate to issue new ROAs at the same time.
          </li>
          <li>
            When a CA has been unavailable or unable to publish for some time, but then publishes all updates at once, or--as unlikely as it is--if a key-rollover encounters issues.
          </li>
        </ul>
      </section>
      <section title="ROA Revocation" anchor="outage-roa-revocation">
        <t>
          Large-Scale ROA revocation should be a comparatively rare event for individual networks.
          However, several cases exist where revocations by individual operators or (malicious) coordinated revocation of ROAs by multiple operators may lead to a high churn triggering a continuous flow of BGP Update messages caused by operators using transitive BGP attributes to signal RPKI validation state.
        </t>
        <t>
          Specifically:
        </t>
        <ul spacing="normal">
          <li>
            When one large operator revokes all ROAs for their netblocks at once, for example, when migrating to minimally covering ROAs <xref target="RFC9319"/>, or when revoking one ROA with a long maxLength covering a large number of prefixes.
          </li>
          <li>
            When multiple smaller operators coordinate to revoke ROAs at the same time.
          </li>
          <li>
            When a CA becomes unavailable or unable to publish for some time, e.g., due to the CA expiring (<xref target="CA-Outage1"/>, <xref target="CA-Outage2"/>, <xref target="CA-Outage3"/>, <xref target="CA-Outage4"/>).
          </li>
        </ul>
      </section>
      <section title="Validator Loss" anchor="outage-validator-loss">
        <t>
          Similar to the issuance/revocation of routes, the validation pipeline of an operator may encounter issues.
          For example, any of the following events may lead to RTR services used by an operator no longer providing validation state to routers, leading to routes changing from VALID to UNKNOWN:
        </t>
        <ul spacing="normal">
          <li>
            The RTR service may have to be taken offline due to local issues (<xref target="CVE-2021-3761"/>, <xref target="CVE-2021-41531"/>, <xref target="CVE-2021-43114"/>), or, even worse, a misconfiguration may lead to the service flapping, e.g., when the system runs out of memory after a few minutes of communicating validation state to routers.
          </li>
          <li>
            Validation state may seemingly lapse due to issues with time syncronization if, e.g., the clock of the validator diverts significantly, starting to consider CA's certificates invalid.
          </li>
          <li>
            Multiple operators use one central RTR service hosted by an external party, or depend on a similar validator, which becomes unavailable, e.g., due to maintenance or an outage, and local instances are not able to handle loss of this external service without changing validation state, i.e., do not serve from cache.
          </li>
        </ul>
      </section>
      <section title="Outage Scenario Summary" anchor="outage-summary">
        <t>
          The above non-exhaustive listing suggests that issues in general operations, CA operations, and RPKI cache implementations simply are unavoidable.
          Hence, Operators <bcp14>MUST</bcp14> plan and design accordingly.
        </t>
      </section>
    </section>

    <section title="Scaling issues">
      <t>
        Following an RPKI service affecting <xref target="outage">outage</xref>, and considering roughly half the global Internet routing table nowadays is covered by RPKI ROAs <xref target="NIST"/>, any Autonomous System in which the local routing policy sets a BGP Community based on the ROV-Valid validation state, would need to send BGP UPDATE messages for roughly half the global Internet routing table if the validation state changes to ROV-NotFound.
        The same, reversed case, would be true for every new ROA created by the address space holders, whereas a new BGP update would be generated, as the validation state would change to ROV-Valid.
      </t>
      <t>
        As the global Internet routing table currently contains close to 1,000,000 prefixes <xref target="CIDR Report"/>, such convergence events represent a significant burden.
        See <xref target="How-to-break"/> for an elaboration on this phenomenon.
      </t>
      <t>
        Furthermore, adding additional attributes to routes increases their size and memory consumption in the RIB of BGP routers.
        Given the continuous growth of the global routing table, operators should be--in general--conservative regarding the additional information they add to routes.
      </t>
    </section>

    <section title="Cascading of BGP UPDATES" anchor="cascade">
      <t>
        The aforementioned issues that may lead to changes in validation state for a large number of routes are not confined to singular UPDATE events.
        Instead, given that routers' view of the RPKI with RTR is only eventually consistent, update messages may cascade, i.e., one change in validation state may actually trigger multiple subsequent BGP UPDATE storms.
        If, for example, AS65536 is a downstream of AS65537 (both annotating validation state with BGP Communities), and a major CA fails, but AS65537 has their validator's cache updated before AS65536, AS65536 will first receive updates for all formerly valid routes learned from AS65537 when validation state changes there, and propagate these down its cone.
        Then, when the cache of AS65536 is updated as well, the community of AS65536 will again change for these routes, while also being propagated down the cone again.
      </t>
    </section>

    <section title="Observed data">
      <t>
        In February 2024, a data-gathering initiative <xref target="Side-Effect"/> reported that between 8% and 10% of BGP updates seen on the Routing Information Service - RIS, contained well-known communities from large ISPs signalling either ROV-NotFound or ROV-Valid BGP Validation states.
        The study also demonstrated that the creation or removal of a ROA object triggered a chain of updates in a period of circa 1 hour following the change.
      </t>
      <t>
        Such a high percentage of unneeded BGP updates constitutes a considerable level of noise, impacting the capacity of the global routing system while generating load on router CPUs and occupying more RAM than necessary.
        Keeping this information inside the realms of the single autonomous system would help reduce the burden on the rest of the global routing platform, reducing workload and noise.
      </t>
    </section>
    <section title="Lacking Value of Signaling Validation State">
      <t>
        RTR has been developped to communicate validation information to routers.
        BGP Attributes are not signed, and provide no assurance against third parties adding them, apart from BGP communities--ideally--being filtered at a networks edge.
        So, even in iBGP scenarios, their benefit in comparison to using RTR on all BGP speakers is limited.
      </t>
      <t>
        For eBGP, given they are not signed, they provide even less information to other parties except introspection into an ASes internal validation mechanics.
        Crucially, they provide no actionable information for BGP neighbors.
        If an AS validates and enforces based on RPKI, INVALID routes should never be imported and, hence, never be send to neighbors.
        Hence, the argument that adding validation state to communities enables, e.g., downstreams to filter RPKI INVALID routes is mute, as the only routes a downstream should see are UNKNOWN and VALID.
        Furthermore, in any case, the operators <bcp14>SHOULD</bcp14> run their own validation infrastructure and not rely on centralized services or attributes communicated by their neighbors.
        Everything else circumvents the purpose of RPKI.
      </t>
    </section>

  </section>

  <section title="Advantages of Dissociating Validation States and BGP Path Attributes">
    <t>
      As outlined in <xref target="signalling-risks"/>, signalling validation state with transitive attributes carries significant risks for the stability of the global routing ecosystem.
      Not signalling validation state, hence, has tangible benefits, specifically:
    </t>
    <ul spacing="normal">
      <li>
        Reduction of memory consumption on customer/peer facing PE routers (less BGP communities == less memory pressure).
      </li>
      <li>
        No effect on the age of a BGP route when a ROA or ASPA <xref target="I-D.ietf-sidrops-aspa-profile"/> is issued or revoked.
      </li>
      <li>
        Avoids having to resend, e.g., more than 500,000 BGP routes towards BGP neighbors (for the own cone to peers and upstreams, for the full table towards customers) if the RPKI cache crashes and RTR sessions are terminated, or if flaps in validation are caused by other events.
      </li>
    </ul>
    <t>
      Hence, operators <bcp14>SHOULD NOT</bcp14> signal RPKI validation state using transitive BGP attributes.
    </t>
  </section>

  <section title="Security Considerations">
    <t>
      The use of transitive attributes to signal RPKI validation state may enable attackers to cause notable route churn by issuing and widthdrawing, e.g., ROAs for their prefixes.
      DFZ routers may not be equiped to handle churn in all directions at global scale, especially if said churn cascades or repeats periodically.
    </t>
    <t>
      To prevent this, operators <bcp14>SHOULD NOT</bcp14> signal validation state to neighbors.
      Furthermore, validation state signaling <bcp14>SHOULD NOT</bcp14> be accepted from a neighbor AS.
      Instead, the validation state of a received announcement has only local scope due to issues such as scope of trust and RPKI synchrony.
    </t>
  </section>

  <section title="IANA Considerations">
    <t>
      None.
    </t>
  </section>

  <section title="Acknowledgements">

    <t>
      The authors would like to thank
      ...
      and
      ...
      for their helpful review of this document.
    </t>

  </section>

</middle>

<back>
  <references title="Normative References">
    <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
    <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
  </references>

  <references title="Informative References">
    <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.1997.xml"/>
    <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4271.xml"/>
    <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6480.xml"/>
    <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6482.xml"/>
    <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.6811.xml"/>
    <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8092.xml"/>
    <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8097.xml"/>
    <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8210.xml"/>
    <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.9319.xml"/>
    <xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.ietf-sidrops-aspa-profile.xml"/>

    <reference anchor="How-to-break" target="https://cds.cern.ch/record/2805326">
      <front>
        <title>How to break the Internet: a talk about outages that never happened</title>
        <author fullname="Job Snijders"/>
        <date month="March" year="2022"/>
      </front>
      <refcontent>CERN Academic Training Lecture Regular Programme; 2021-2022</refcontent>
    </reference>

    <reference anchor="CVE-2021-3761" target="https://github.com/cloudflare/cfrpki/security/advisories/GHSA-c8xp-8mf3-62h9">
      <front>
        <title>OctoRPKI lacks contextual out-of-bounds check when validating RPKI ROA maxLength values</title>
        <author>
          <organization/>
        </author>
        <date month="September" year="2021"/>
      </front>
    </reference>

    <reference anchor="CVE-2021-41531" target="https://www.nlnetlabs.nl/downloads/routinator/CVE-2021-41531.txt">
      <front>
        <title>Routinator prior to 0.10.0 produces invalid RTR payload if an RPKI CA uses too large values in the max-length parameter in a ROA</title>
        <author>
          <organization>NLnet Labs</organization>
        </author>
        <date month="September" year="2021"/>
      </front>
    </reference>

    <reference anchor="CVE-2021-43114" target="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2021-43114">
      <front>
        <title>FORT Validator versions prior to 1.5.2 will crash if an RPKI CA publishes an X.509 EE certificate</title>
        <author>
          <organization>FORT project</organization>
        </author>
        <date month="November" year="2021"/>
      </front>
    </reference>

    <reference anchor="CA-Outage1" target="https://www.arin.net/announcements/20200813/">
      <front>
        <title>RPKI Service Notice Update</title>
        <author>
          <organization>ARIN</organization>
        </author>
        <date month="August" year="2020"/>
      </front>
    </reference>

    <reference anchor="CA-Outage2" target="https://www.ripe.net/ripe/mail/archives/routing-wg/2021-April/004314.html">
      <front>
        <title>Issue affecting rsync RPKI repository fetching</title>
        <author>
          <organization>RIPE NCC</organization>
        </author>
        <date month="April" year="2021"/>
      </front>
    </reference>

    <reference anchor="CA-Outage3" target="https://mail.lacnic.net/pipermail/lacnog/2023-April/009471.html">
      <front>
        <title>problemas con el TA de RPKI de LACNIC</title>
        <author fullname="Job Snijders"/>
        <date month="April" year="2023"/>
      </front>
    </reference>

    <reference anchor="CA-Outage4" target="https://lists.afrinic.net/pipermail/dbwg/2023-November/000493.html">
      <front>
        <title>AFRINIC RPKI VRP graph for November 2023 - heavy fluctuations affecting 2 members</title>
        <author fullname="Job Snijders"/>
        <date month="November" year="2023"/>
      </front>
    </reference>

    <reference anchor="NIST" target="https://rpki-monitor.antd.nist.gov/">
      <front>
        <title>NIST RPKI Monitor</title>
        <author>
          <organization>NIST</organization>
        </author>
        <date month="January" year="2024"/>
      </front>
    </reference>

    <reference anchor="Side-Effect" target="https://labs.ripe.net/author/stucchimax/a-bgp-side-effect-of-rpki/">
      <front>
        <title>A BGP Side Effect of RPKI</title>
        <author fullname="Massimiliano Stucchi"/>
        <date month="February" year="2024"/>
      </front>
    </reference>

    <reference anchor="CIDR Report" target="https://www.cidr-report.org/as2.0/">
      <front>
        <title>CIDR REPORT</title>
        <author fullname="Geoff Huston"/>
        <date month="January" year="2024"/>
      </front>
    </reference>

  </references>

</back>

</rfc>
