<?xml version="1.0" encoding="US-ASCII"?>
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-ietf-idr-link-bandwidth-17"
     ipr="trust200902">
  <front>
    <title abbrev="BGP Link Bandwidth Extended Community">BGP Link Bandwidth
    Extended Community</title>

    <author fullname="Pradosh Mohapatra" initials="P." surname="Mohapatra">
      <organization>Google LLC</organization>

      <address>
        <email>pradosh@google.com</email>
      </address>
    </author>

    <author fullname="Reshma Das" initials="R." role="editor" surname="Das">
      <organization>Juniper Networks, Inc.</organization>

      <address>
        <postal>
          <street>1133 Innovation Way,</street>

          <city>Sunnyvale</city>

          <region>CA</region>

          <code>94089</code>

          <country>US</country>
        </postal>

        <email>dreshma@juniper.net</email>
      </address>
    </author>

    <author fullname="Satya Mohanty" initials="S." role="editor"
            surname="Mohanty">
      <organization>Zscaler</organization>

      <address>
        <postal>
          <street>120 Holger Way,</street>

          <city>San Jose</city>

          <region>CA</region>

          <code>95134</code>

          <country>US</country>
        </postal>

        <email>smohanty@zscaler.com</email>
      </address>
    </author>

    <author fullname="Serge Krier" initials="S." surname="Krier">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street> Pegasus Parc, De Kleetlaan 6a</street>

          <country>Belgium</country>
        </postal>

        <email>sekrier@cisco.com</email>
      </address>
    </author>

    <author fullname="Rafal Jan Szarecki" initials="R.J." surname="Szarecki">
      <organization>Google LLC</organization>

      <address>
        <postal>
          <street>1160 N Mathilda Ave,</street>

          <city>Sunnyvale</city>

          <region>CA</region>

          <code>94089</code>

          <country>US</country>
        </postal>

        <email>rszarecki@gmail.com</email>
      </address>
    </author>

    <author fullname="Akshay Gattani" initials="A." surname="Gattani">
      <organization>Arista Networks</organization>

      <address>
        <postal>
          <street>5453 Great America Parkway</street>

          <city>Santa Clara</city>

          <region>CA</region>

          <code>95054</code>

          <country>US</country>
        </postal>

        <email> akshay@arista.com</email>
      </address>
    </author>

    <date day="10" month="09" year="2025"/>

    <abstract>
      <t>This document specifies a type of BGP Extended Community that enables
      routers to perform weighted load-balancing in multipath scenarios.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119"/>.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>Load balancing is a critical aspect of network design, enabling
      efficient utilization of available bandwidth and improving overall
      network performance. Traditional equal-cost multi-path (ECMP) routing
      does not account for the varying capacities of different paths. This
      document suggests that the link bandwidth be carried in the network
      using one of two new extended communities <xref target="RFC4360"/> - the
      transitive and non-transitive Link Bandwidth Extended Community. The
      Link Bandwidth Extended Community provides a mechanism for routers to
      advertise the bandwidth of their downstream path that may either be a
      directly connected link or multi-hop/multipath nexthop. This mechanism
      facilitates maximizing utilization of network resources.</t>
    </section>

    <section title="Link Bandwidth Extended Community">
      <t>The Link Bandwidth Extended Community is defined as a BGP extended
      community that carries the bandwidth information of a router,
      represented by BGP Next Hop, connecting to a remote network. This
      community can be used to inform other routers about the available
      bandwidth through a given route.</t>

      <t>The Link Bandwidth Extended Community can be either transitive or
      non-transitive. Therefore the value of the high-order octet of the
      extended Type Field can be 0x00 or 0x40, respectively. The value of the
      low-order octet of the extended type field for this communities is 0x04.
      The value of the Global Administrator subfield in the Value Field SHOULD
      represent the Autonomous System of the router that attaches the Link
      Bandwidth Extended Community, but it can be set to any 2-byte value. If
      the Autonomous System number cannot be represented in two octets,
      AS_TRANS <xref target="RFC6793"/>, SHOULD be used in the Global
      Administrator subfield. The encoding of 4-octet ASN is out of scope of
      this document. The bandwidth of the link is expressed as 4 octets in
      <xref target="IEEE.754-2019"/> floating point format, units being bytes
      (not bits!) per second. It is carried in the Local Administrator
      subfield of the Value Field.</t>

      <figure anchor="LBWExtCom" suppress-title="false"
              title="Link Bandwidth Extended Community">
        <artwork align="left" xml:space="preserve">
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Type=0x00/0x40 | SubType= 0x04 |       AS Number               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                    Link Bandwidth Value                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 Type:   1-octet field MUST be set to 0x00 or 0x40 
         to indicate transitive/non-transitive.

 SubType: 1-octet field MUST be set to 0x04 
          to indicate 'Link-Bandwidth'.

 Global Administrator sub-field: 
          2-octet represent the Autonomous System.

 Local Administrator sub-field: 
          Bandwidth value (bytes per sec) encoded as 4 octets
          in IEEE floating point format.
</artwork>
      </figure>
    </section>

    <section title="Protocol Procedures">
      <t>The procedures cover both the transitive and non-transitive variants
      of the Link Bandwidth Extended Community so that implementations can
      handle both variants in a way that supports existing deployments. Please
      refer to <xref target="IANA"/> and <xref target="Appendix"/> for more
      details.</t>

      <section anchor="Originator"
               title="Sender (Originating Link Bandwidth Extended Community)">
        <t>A BGP speaker that attaches a Link Bandwidth Extended Community
        SHOULD be able to advertise either a transitive or a non-transitive
        Link Bandwidth Extended Community. Implementations SHOULD provide
        configuration to set the transitivity type of the Link Bandwidth
        Extended Community, as well as the Global Administrator and bandwidth
        values in the Local Administrator field, using local policy. Different
        implementations MAY use different default values for the transitivity
        type of the Link Bandwidth Extended Community. The provided
        configuration SHOULD allow operators to override the default
        transitivity value as needed. An implementation MAY advertise a link
        bandwidth value as zero.</t>

        <t>Generally, a single Link Bandwidth Extended Community of the
        transitivity type that is desired in a deployment is attached to a
        route. However during transition (refer <xref
        target="Operational Condiderations"/> for details), a BGP speaker MAY
        attach one Link Bandwidth Extended Community per transitivity
        (transitive/non-transitive) both having the same 'Link Bandwidth
        Value' field.</t>

        <t>A Link Bandwidth Extended Community MAY be attached or updated for
        a BGP route upon receipt during Adj-RIB-In processing. The Link
        Bandwidth Extended Community MAY be attached or updated for a BGP
        route's Adj-RIB-Out entry while being advertised to a neighboring BGP
        speaker.</t>

        <t>Implementations MAY provide a configuration option to send
        non-transitive Link Bandwidth Extended Communities on external BGP
        sessions.</t>
      </section>

      <section anchor="Receiver"
               title="Receiver (Receiving Link Bandwidth Extended Community)">
        <t>A BGP receiver MUST be able to process Link Bandwidth Extended
        Community of both transitive and non-transitive types. The receiver
        MUST NOT flap or treat the route as malformed based on the
        transitivity of the Link Bandwidth Extended Community and/or BGP
        session type (internal vs. external). </t>

        <t>Implementations MAY provide configuration to accept non-transitive
        Link Bandwidth Extended Communities from external BGP sessions.</t>

        <t>A BGP update with an attached Link Bandwidth Extended Community
        with a bandwidth value of zero is valid. Weighted ECMP (WECMP)
        described in section 6.3 <xref target="RFC7938"/> can be utilized when
        all contributing paths have a non-zero value in the Link Bandwidth
        Extended Community. However, in the case where the paths have a mix of
        zero and non-zero values, or all zero values, the behavior is
        determined by local policy. For example, implementations MAY exclude
        the paths with zero value from WECMP formation as long as at least one
        path with non-zero value exists or they MAY fallback to ECMP.</t>
      </section>

      <section anchor="Re-advertisement" title="Re-advertisement Procedures">
        <t>This section describes the procedures to be followed when a BGP
        speaker receives a route with an attached Link Bandwidth Extended
        Community and subsequently re-advertises that route.</t>

        <section anchor="NextHopSelf"
                 title="Re-advertisement with Next hop Self">
          <t>When a BGP speaker re-advertises a route with Link Bandwidth
          Extended Community and sets the next hop to itself, it SHOULD follow
          the same procedures as outlined in <xref target="Originator"/>.</t>

          <t>In the absence of any route policies that alter the Link
          Bandwidth Extended Community, any received Link Bandwidth Extended
          Community on the route will be re-advertised unchanged. Please also
          refer to <xref target="LinkBandwidthArithmetic"/> for use in a BGP
          multipath environment.</t>
        </section>

        <section anchor="NextHopUnchanged"
                 title=" Re-advertisement with Next Hop Unchanged">
          <t>A BGP speaker that receives a route with a Link Bandwidth
          Extended Community and re-advertises or reflects the same without
          changing its next hop, SHOULD NOT change the Link Bandwidth Extended
          Community in any way.</t>
        </section>
      </section>

      <section anchor="LinkBandwidthArithmetic"
               title="Link Bandwidth Extended Community Arithmetic and BGP Multipath">
        <t>In a BGP multipath ECMP environment, the link bandwidth value that
        is sent or re-advertised may be calculated based on the Link Bandwidth
        Extended Community of the routes contributing to multipath in the
        Local Routing Information Base (Local-RIB). This topic is beyond the
        scope of this document.</t>
      </section>
    </section>

    <section anchor="Error" title="Error Handling">
      <t>If a BGP speaker receives a route with more than one Link Bandwidth
      Extended Communities and uses the route to compute WECMP, it SHOULD use
      the extended community with the lowest "Link Bandwidth Value", ignoring
      the transitivity. Implementations MAY provide configuration to change
      the above preference.</t>

      <t>Between transitive and non-transitive types of Link Bandwidth
      Extended Communities that have the same 'Link Bandwidth Value', the
      transitivity doesn't matter for purpose of computing WECMP or
      programming to FIB (Forwarding Information Base).</t>

      <t>Note that these procedures mean that a BGP speaker reflecting a route
      with next hop unchanged (e.g. RR) will re-advertise the Link Bandwidth
      Extended Communities received on the route as-is without any
      modification, while following the extended community transitivity
      rules.</t>

      <t>Link Bandwidth Extended Communities with a negative value SHALL be
      ignored and MUST NOT be advertised.</t>

      <t>Link Bandwidth Extended Communities with a zero value MUST NOT be
      considered malformed.</t>

      <t>If any of the paths lack a valid Link Bandwidth Extended Community,
      ECMP (Equal-Cost Multi-Path) MUST be used instead. </t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>IANA is requested to update the Transitive Two-Octet AS-Specific
      Extended Community Sub-Types registry (Type 0x00) and Sub-Type 0x04
      to:</t>

      <figure>
        <artwork align="left" xml:space="preserve">    Name
    ----
    transitive Link Bandwidth Extended Community</artwork>
      </figure>

      <t>IANA is requested to update the Non-Transitive Two-Octet AS-Specific
      Extended Community Sub-Types registry (Type 0x40) and Sub-Type 0x04
      to:</t>

      <figure>
        <artwork align="left" xml:space="preserve">    Name
    ----
    non-transitive Link Bandwidth Extended Community</artwork>
      </figure>

      <t>Both updates are to reference this document.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>This extension to BGP has similar security implications as BGP
      Extended Communities <xref target="RFC4360"/></t>

      <t>The Link Bandwidth Extended Community conveys bandwidth and capacity
      information that may be sensitive. Exporting this community outside of
      an administrative domain can expose private network resource details.
      When propagating the routes with Link Bandwidth Extended Community
      towards an untrusted network or outside of an administrative domain, it
      is recommended operators use policy to filter out this community.</t>
    </section>

    <section anchor="Operational Condiderations"
             title="Operational Considerations">
      <section title="Inconsistent Deployment">
        <t>Prior deployments of the feature specified in this document have
        involved implementations that only understood one of the two extended
        community transitivity types. As a result, such implementations would
        treat the use of the other transitivity type in a "ships in the night"
        fashion. The procedures in this document govern how multiple
        transitivity types for link bandwidth should operate.</t>

        <t>In circumstances where networks have deployed a mixture of
        implementations supporting this document's procedures for both
        transitivity types, and older implementations that only understand one
        transitivity type, inconsistent behavior could result. A prime example
        is when a route received by a BGP speaker contains both a transitive
        and a non-transitive Link Bandwidth Extended Community and that BGP
        speaker performs an operation that updates only one of the Link
        Bandwidth Extended Communities, the other community may have an
        inconsistent value. As a result, downstream BGP speakers that may
        receive such routes may perform inappropriate WECMP load
        balancing.</t>

        <t>To mitigate such issues, when operators are aware that older
        implementations are present in their networks, they may wish to take
        actions to address such inconsistencies. One option would be to filter
        either at advertisement time on the older BGP speaker the unsupported
        transitivity type of Link Bandwidth Extended Community - if the
        implementation is capable of such filtering. Alternatively, a
        receiving BGP speaker, knowing that the sending speaker is incapable
        of doing such operations, could strip the Link Bandwidth Extended
        Community type that is unsupported by the sender.</t>

        <t>Ideally this operational consideration is short-lived until all the
        routers in the network have been upgraded to implementations that
        consistently support the procedures in this document.</t>
      </section>
    </section>

    <section anchor=" Contributors" title=" Contributors">
      <author fullname="Kaliraj Vairavakkalai" initials="K."
              surname="Vairavakkalai">
        <organization>Juniper Networks, Inc.</organization>

        <address>
          <postal>
            <street>1133 Innovation Way,</street>

            <city>Sunnyvale</city>

            <region>CA</region>

            <code>94089</code>

            <country>US</country>
          </postal>

          <email>kaliraj@juniper.net</email>
        </address>
      </author>

      <author fullname="Natrajan Venkataraman" initials="N."
              surname="Venkataraman">
        <organization>Juniper Networks, Inc.</organization>

        <address>
          <postal>
            <street>1133 Innovation Way,</street>

            <city>Sunnyvale</city>

            <region>CA</region>

            <code>94089</code>

            <country>US</country>
          </postal>

          <email>natv@juniper.net</email>
        </address>
      </author>

      <author fullname="Rex Fernando" initials="R." surname="Fernando">
        <organization>Cisco Systems</organization>

        <address>
          <postal>
            <street>170 W. Tasman Drive</street>

            <city>San Jose</city>

            <region>CA</region>

            <code>95134</code>

            <country>US</country>
          </postal>

          <email>rex@cisco.com</email>
        </address>
      </author>
    </section>

    <section anchor="Acknowledgments" title="Acknowledgments">
      <t>The authors would like to thank Yakov Rekhter, Srihari Sangli and Dan
      Tappan for proposing unequal cost load balancing as one possible
      application of the extended community attribute. The authors would like
      to thank Jeff Haas for all the discussions and providing text for
      operational considerations.</t>

      <t>The authors would like to thank Bruno Decraene, Robert Raszuk, Joel
      Halpern, Aleksi Suhonen, Randy Bush, Stephane Litkowski, Mankamana
      Mishra, Moshiko Nayman, Yingzhen Qu, Anoop Ghanwani, Dongjie (Jimmy) and
      John Scudder for their comments and contributions.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"?>

      <?rfc include="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4360.xml"?>

      <?rfc include="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.6793.xml"?>

      <?rfc include="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.7938.xml"?>

      <reference anchor="IEEE.754-2019"
                 target="https://ieeexplore.ieee.org/document/8766229">
        <front>
          <title abbrev="IEEE-754-2019">IEEE Standard for Floating-Point
          Arithmetic</title>

          <author>
            <organization showOnFrontPage="true">IEEE</organization>
          </author>

          <date day="22" month="July" year="2019"/>
        </front>
      </reference>
    </references>

    <section anchor="Appendix" title="Document History">
      <t>BGP Link Bandwidth Extended Community has evolved over several
      versions of the IETF draft. In the earlier versions up to
      draft-ietf-idr-link-bandwidth-08, only the non-transitive version of
      Link Bandwidth Extended Community was supported. However, starting from
      draft-ietf-idr-link-bandwidth-09, both transitive and non-transitive
      versions of Link Bandwidth Extended Community are supported.</t>

      <t>A BGP speaker (Sender or Receiver) needs to be upgraded to support
      the procedures defined in this document to provide full interoperability
      for both transitive and non-transitive versions of Link Bandwidth
      Extended Community. In order to simplify implementations, it is not a
      goal to provide interoperability by upgrading only the RR.</t>
    </section>
  </back>
</rfc>
