<?xml version="1.0" encoding="US-ASCII"?>
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-ietf-idr-link-bandwidth-14"
     ipr="trust200902">
  <front>
    <title abbrev="BGP Link Bandwidth Extended Community">BGP Link Bandwidth
    Extended Community</title>

    <author fullname="Pradosh Mohapatra" initials="P." surname="Mohapatra">
      <organization>Sproute Networks</organization>

      <address>
        <email>pradosh@sproute.com</email>
      </address>
    </author>

    <author fullname="Reshma Das" initials="R." role="editor" surname="Das">
      <organization>Juniper Networks, Inc.</organization>

      <address>
        <postal>
          <street>1133 Innovation Way,</street>

          <city>Sunnyvale</city>

          <region>CA</region>

          <code>94089</code>

          <country>US</country>
        </postal>

        <email>dreshma@juniper.net</email>
      </address>
    </author>

    <author fullname="Satya Mohanty" initials="S." role="editor"
            surname="Mohanty">
      <organization>Zscaler</organization>

      <address>
        <postal>
          <street>120 Holger Way,</street>

          <city>San Jose</city>

          <region>CA</region>

          <code>95134</code>

          <country>US</country>
        </postal>

        <email>smohanty@zscaler.com</email>
      </address>
    </author>

    <author fullname="Serge Krier" initials="S." surname="Krier">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street> Pegasus Parc, De Kleetlaan 6a</street>

          <country>Belgium</country>
        </postal>

        <email>sekrier@cisco.com</email>
      </address>
    </author>

    <author fullname="Rafal Jan Szarecki" initials="R.J." surname="Szarecki">
      <organization>Google LLC</organization>

      <address>
        <postal>
          <street>1160 N Mathilda Ave,</street>

          <city>Sunnyvale</city>

          <region>CA</region>

          <code>94089</code>

          <country>US</country>
        </postal>

        <email>rszarecki@gmail.com</email>
      </address>
    </author>

    <author fullname="Akshay Gattani" initials="A." surname="Gattani">
      <organization>Arista Networks</organization>

      <address>
        <postal>
          <street>5453 Great America Parkway</street>

          <city>Santa Clara</city>

          <region>CA</region>

          <code>95054</code>

          <country>US</country>
        </postal>

        <email> akshay@arista.com</email>
      </address>
    </author>

    <date day="01" month="08" year="2025"/>

    <abstract>
      <t>This document describes an application of BGP extended communities
      that allows a router to perform WECMP (Weighted Equal-Cost
      Multipath).</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119"/>.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>Load balancing is a critical aspect of network design, enabling
      efficient utilization of available bandwidth and improving overall
      network performance. Traditional equal-cost multi-path (ECMP) routing
      does not account for the varying capacities of different paths. This
      document suggests that the external link bandwidth be carried in the
      network using one of two new extended communities <xref
      target="RFC4360"/> - the transitive and non-transitive Link Bandwidth
      Extended Community. The Link Bandwidth Extended Community provides a
      mechanism for routers to advertise the bandwidth of their downstream
      path(s), facilitating maximum utilization of network resources.</t>
    </section>

    <section title="Link Bandwidth Extended Community">
      <t>The Link Bandwidth Extended Community is defined as a BGP extended
      community that carries the bandwidth information of a router,
      represented by BGP Protocol Next Hop, connecting to remote network. This
      community can be used to inform other routers about the available
      bandwidth through a given route.</t>

      <t>The Link Bandwidth Extended Community can be either transitive or
      non-transitive. Therefore the value of the high-order octet of the
      extended Type Field can be 0x00 or 0x40, respectively. The value of the
      low-order octet of the extended type field for this communities is 0x04.
      The value of the Global Administrator subfield in the Value Field SHOULD
      represent the Autonomous System of the router that attaches the Link
      Bandwidth Extended Community, but it can be set to any 2-byte value. If
      the Autonomous System number cannot be represented in two octets, as
      enabled by <xref target="RFC6793"/>, AS_TRANS should be used in the
      Global Administrator subfield. The encoding of 4-octet ASN is out of
      scope of this document. The bandwidth of the link is expressed as 4
      octets in <xref target="IEEE.754-2019"/> floating point format, units
      being bytes (not bits!) per second. It is carried in the Local
      Administrator subfield of the Value Field.</t>

      <figure anchor="LBWExtCom" suppress-title="false"
              title="Link Bandwidth Extended Community">
        <artwork align="left" xml:space="preserve">
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Type=0x00/0x40 | SubType= 0x04 |       AS Number               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Link Bandwidth Value                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 Type:   1-octet field MUST be set to 0x00 or 0x40 
         to indicate transitive/non-transitive.

 SubType: 1-octet field MUST be set to 0x04 
          to indicate 'Link-Bandwidth'.

 Global Administrator sub-field: 
          2-octet represent the Autonomous System.

 Local Administrator sub-field: 
          Bandwidth value (bytes per sec) encoded as 4 octets
          in IEEE floating point format.
</artwork>
      </figure>
    </section>

    <section title="Protocol Procedures">
      <section anchor="Originator"
               title="Sender (Originating Link Bandwidth Extended Community)">
        <t>An originator of Link Bandwidth Extended Community SHOULD be able
        to originate either a transitive or a non-transitive Link Bandwidth
        Extended Community. Implementations SHOULD provide configuration to
        set the transitivity type of the Link Bandwidth Extended Community, as
        well as the Global Administrator and bandwidth values in (Local
        Administrator field), using local policy. For backward compatibility,
        different implementations MAY use different default values for the
        transitivity type of the Link Bandwidth Extended Community. The
        provided configuration SHOULD allow operators to override the default
        transitivity value as needed. An implementation MAY advertise a link
        bandwidth value as zero.</t>

        <t>No more than one Link Bandwidth Extended Community SHOULD be
        attached to a route. For purpose of backward compatibility during
        transition, a BGP speaker MAY attach one Link Bandwidth Extended
        Community per transitivity (transitive/non-transitive) both having the
        same 'Link Bandwidth Value' field.</t>

        <t>A Link Bandwidth Extended Community MAY be attached or updated for
        a BGP route upon receipt during Adj-RIB-In processing. The Link
        Bandwidth Extended Community MAY be attached or updated for a BGP
        route's Adj-RIB-Out entry while being advertised to a neighboring BGP
        speaker.</t>

        <t>Note: Implementations MAY provide a configuration option to send
        non-transitive Link Bandwidth extended communities on external BGP
        sessions.</t>
      </section>

      <section anchor="Receiver"
               title="Receiver (Receiving Link Bandwidth Extended Community)">
        <t>A BGP receiver MUST be able to process Link Bandwidth Extended
        Community of both transitive and non-transitive types. The receiver
        MUST NOT flap or treat the route as malformed based on the
        transitivity of the Link Bandwidth Extended Community and/or BGP
        session type (internal vs. external). Implemention MUST be able to
        process and accept a Link Bandwidth Extended Community where the
        bandwidth value is set to zero.</t>

        <t>Note: Implementations MAY provide configuration to accept
        non-transitive Link Bandwidth extended communities from external BGP
        sessions.</t>
      </section>

      <section anchor="Re-advertisement" title="Re-advertisement Procedures">
        <section anchor="NextHopSelf"
                 title="Re-advertisement with Next hop Self">
          <t>When a BGP speaker re-advertises a route with Link Bandwidth
          Extended Community and sets the next hop to itself, it SHOULD follow
          the same procedures as outlined in <xref target="Originator"/>.</t>

          <t>In the absence of any import or export policies that alter the
          Link Bandwidth Extended Community, any received Link Bandwidth
          Extended Community on the route will be re-advertised unchanged, in
          accordance with standard BGP procedures.</t>
        </section>

        <section anchor="NextHopUnchanged"
                 title=" Re-advertisement with Next Hop Unchanged">
          <t>A BGP speaker that receives a route with a Link Bandwidth
          Extended Community, re-advertises or reflects the same without
          changing its next hop, SHOULD NOT change the Link Bandwidth Extended
          Community in any way.</t>
        </section>
      </section>

      <section title="Link Bandwidth Extended Community Arithmetic and BGP Multipath">
        <t>In a BGP multipath ECMP environment, the link bandwidth value that
        is sent or re-advertised may be calculated based on the Link Bandwidth
        Extended Community of the routes contributing to multipath in the
        Local Routing Information Base (Local-RIB). This topic is beyond the
        scope of this document.</t>
      </section>
    </section>

    <section anchor="Error" title="Error Handling">
      <t>If a BGP speaker receives a route with more than one Link Bandwidth
      extended communities and uses the route to compute WECMP, it SHOULD use
      the extended community with the lowest "Link Bandwidth Value", ignoring
      the transitivity. Implementations MAY provide configuration to change
      the above preference.</t>

      <t>Between transitive and non-transitive types of Link Bandwidth
      extended communities that have the same 'Link Bandwidth Value', the
      transitivity doesn't matter for purpose of computing WECMP or
      programming to FIB (Forwarding Information Base).</t>

      <t>Note that these procedures mean that a BGP speaker reflecting a route
      with next hop unchanged (e.g. RR) will re-advertise the Link Bandwidth
      extended communities received on the route as-is without any
      modification, while following the extended community transitivity
      rules.</t>

      <t>Link bandwidth extended communities with a negative value SHALL be
      ignored and MUST NOT be originated.</t>

      <t>WECMP can be utilized only when all contributing paths have a
      non-zero value in the Link Bandwidth Extended Community. If any of the
      paths lack a valid Link Bandwidth Extended Community, ECMP (Equal-Cost
      Multi-Path) MUST be used instead.</t>
    </section>

    <section anchor="History" title="Document History">
      <t>BGP Link Bandwidth Extended Community has evolved over several
      versions of the IETF draft. In the earlier versions up to
      draft-ietf-idr-link-bandwidth-08, only the non-transitive version of
      Link Bandwidth Extended Community was supported. However, starting from
      draft-ietf-idr-link-bandwidth-09, both transitive and non-transitive
      versions of Link Bandwidth Extended Community are supported.</t>

      <t>An old sender/receiver is a BGP speaker that uses procedures up to
      draft
      (https://datatracker.ietf.org/doc/html/draft-ietf-idr-link-bandwidth-08)
      or any undocumented behavior for Link Bandwidth Extended Community.</t>

      <t>A new sender/receiver is a BGP speaker that implements procedures
      specified in this document.</t>

      <t>A BGP speaker (Sender or Receiver) needs to be upgraded to support
      the procedures defined in this document to provide full interoperability
      for both transitive and non-transitive versions of Link Bandwidth
      Extended Community. In order to simplify implementations, it is not a
      goal to provide interoperability by upgrading only the RR.</t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This document defines a specific application of the two-octet AS
      specific extended community.</t>

      <t>IANA is requested to update the Transitive Two-Octet AS-Specific
      Extended Community Sub-Types registry (Type 0x00) and Sub-Type 0x04
      to:</t>

      <figure>
        <artwork align="left" xml:space="preserve">    Name
    ----
    transitive Link Bandwidth Extended Community</artwork>
      </figure>

      <t>IANA is requested to update the Non-Transitive Two-Octet AS-Specific
      Extended Community Sub-Types registry (Type 0x40) and Sub-Type 0x04
      to:</t>

      <figure>
        <artwork align="left" xml:space="preserve">    Name
    ----
    non-transitive Link Bandwidth Extended Community</artwork>
      </figure>

      <t>Both updates are to Reference this document.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>There are no additional security risks introduced by this design.</t>
    </section>

    <section title="Operational Considerations">
      <section title="Inconsistent Deployment">
        <t>Prior deployments of the feature specified in this document have
        involved implementations that only understood one of the two extended
        community transitivity types. As a result, such implementations would
        treat the use of the other transitivity type in a "ships in the night"
        fashion. The procedures in this document govern how multiple
        transitivity types for link bandwith should operate.</t>

        <t>In circumstances where networks have deployed a mixture of
        implementations supporting this document's current procedures for both
        transitivity types, and older implementations that only understand one
        transitivity type, inconsistent behavior could result. A primary
        example is when a route received by a BGP speaker contains both a
        transitive and a non-transitive Link Bandwidth Extended Community and
        that BGP speaker performs an operation that updates only one of the
        Link Bandwidth Extended Communities, the other community may be have
        an inconsistent value. As a result, downstream BGP speakers that may
        receive such routes may perform inappropriate ECMP load balancing.</t>

        <t>To mitigate such issues, when operators are aware that older
        implementations are in present in their networks, they may wish to
        take actions to address such inconsistencies. One example would be to
        filter either at advertisement time on the older BGP speaker the
        unsupported transitivity type of Link Bandwidth Extended Community -
        if the implementation is capable of such filtering. Alternatively, a
        receiving BGP speaker, knowing that the sending speaker is incapable
        of doing such operations, could strip the Link Bandwidth Extended
        Community type that is unsupported by the sender.</t>

        <t>Ideally this operational consideration is short-lived until the
        network has been upgraded to implementations that consistently support
        the procedures in this draft.</t>
      </section>
    </section>

    <section anchor=" Contributors" title=" Contributors">
      <author fullname="Kaliraj Vairavakkalai" initials="K."
              surname="Vairavakkalai">
        <organization>Juniper Networks, Inc.</organization>

        <address>
          <postal>
            <street>1133 Innovation Way,</street>

            <city>Sunnyvale</city>

            <region>CA</region>

            <code>94089</code>

            <country>US</country>
          </postal>

          <email>kaliraj@juniper.net</email>
        </address>
      </author>

      <author fullname="Natrajan Venkataraman" initials="N."
              surname="Venkataraman">
        <organization>Juniper Networks, Inc.</organization>

        <address>
          <postal>
            <street>1133 Innovation Way,</street>

            <city>Sunnyvale</city>

            <region>CA</region>

            <code>94089</code>

            <country>US</country>
          </postal>

          <email>natv@juniper.net</email>
        </address>
      </author>

      <author fullname="Rex Fernando" initials="R." surname="Fernando">
        <organization>Cisco Systems</organization>

        <address>
          <postal>
            <street>170 W. Tasman Drive</street>

            <city>San Jose</city>

            <region>CA</region>

            <code>95134</code>

            <country>US</country>
          </postal>

          <email>rex@cisco.com</email>
        </address>
      </author>
    </section>

    <section anchor="Acknowledgments" title="Acknowledgments">
      <t>The authors would like to thank Yakov Rekhter, Srihari Sangli and Dan
      Tappan for proposing unequal cost load balancing as one possible
      application of the extended community attribute. The authors would like
      to thank Jeff Haas for all the discussions and providing text for
      operational considerations.</t>

      <t>The authors would like to thank Bruno Decraene, Robert Raszuk, Joel
      Halpern, Aleksi Suhonen, Randy Bush, Stephane Litkowski, Mankamana
      Mishra, Moshiko Nayman, Yingzhen Qu, Anoop Ghanwani, Dongjie (Jimmy) and
      John Scudder for their comments and contributions.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"?>

      <?rfc include="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4360.xml"?>

      <?rfc include="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.6793.xml"?>

      <reference anchor="IEEE.754-2019"
                 target="https://ieeexplore.ieee.org/document/8766229">
        <front>
          <title abbrev="IEEE-754-2019">IEEE Standard for Floating-Point
          Arithmetic</title>

          <author>
            <organization showOnFrontPage="true">IEEE</organization>
          </author>

          <date day="22" month="July" year="2019"/>
        </front>
      </reference>
    </references>
  </back>
</rfc>
