<?xml version="1.0" encoding="US-ASCII"?>
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-ietf-idr-link-bandwidth-19"
     ipr="trust200902">
  <front>
    <title abbrev="BGP Link Bandwidth Extended Community">BGP Link Bandwidth
    Extended Community</title>

    <author fullname="Pradosh Mohapatra" initials="P." surname="Mohapatra">
      <organization>Google LLC</organization>

      <address>
        <email>pradosh@google.com</email>
      </address>
    </author>

    <author fullname="Reshma Das" initials="R." role="editor" surname="Das">
      <organization>Juniper Networks, Inc.</organization>

      <address>
        <postal>
          <street>1133 Innovation Way,</street>

          <city>Sunnyvale</city>

          <region>CA</region>

          <code>94089</code>

          <country>US</country>
        </postal>

        <email>dreshma@juniper.net</email>
      </address>
    </author>

    <author fullname="Satya Mohanty" initials="S." role="editor"
            surname="Mohanty">
      <organization>Zscaler</organization>

      <address>
        <postal>
          <street>120 Holger Way,</street>

          <city>San Jose</city>

          <region>CA</region>

          <code>95134</code>

          <country>US</country>
        </postal>

        <email>smohanty@zscaler.com</email>
      </address>
    </author>

    <author fullname="Serge Krier" initials="S." surname="Krier">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street> Pegasus Parc, De Kleetlaan 6a</street>

          <country>Belgium</country>
        </postal>

        <email>sekrier@cisco.com</email>
      </address>
    </author>

    <author fullname="Rafal Jan Szarecki" initials="R.J." surname="Szarecki">
      <organization>Google LLC</organization>

      <address>
        <postal>
          <street>1160 N Mathilda Ave,</street>

          <city>Sunnyvale</city>

          <region>CA</region>

          <code>94089</code>

          <country>US</country>
        </postal>

        <email>rszarecki@gmail.com</email>
      </address>
    </author>

    <author fullname="Akshay Gattani" initials="A." surname="Gattani">
      <organization>Arista Networks</organization>

      <address>
        <postal>
          <street>5453 Great America Parkway</street>

          <city>Santa Clara</city>

          <region>CA</region>

          <code>95054</code>

          <country>US</country>
        </postal>

        <email> akshay@arista.com</email>
      </address>
    </author>

    <date day="06" month="10" year="2025"/>

    <abstract>
      <t>This document specifies a type of BGP Extended Community that enables
      routers to perform weighted load-balancing in multipath scenarios.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119"/>.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>Load balancing is a critical aspect of network design, enabling
      efficient utilization of available bandwidth and improving overall
      network performance. Traditional equal-cost multi-path (ECMP) routing
      does not account for the varying capacities of different paths. This
      document suggests that the bandwidth be carried in the network using one
      of two new extended communities <xref target="RFC4360"/> - the
      transitive and non-transitive Link Bandwidth Extended Community. The
      Link Bandwidth Extended Community provides a mechanism for routers to
      advertise the bandwidth of their downstream path that may either be a
      directly connected link or multi-hop/multipath nexthop. This mechanism
      facilitates maximizing utilization of network resources.</t>
    </section>

    <section title="Link Bandwidth Extended Community">
      <t>The Link Bandwidth Extended Community is defined as a BGP extended
      community that carries the bandwidth information of a router,
      represented by BGP Next Hop, connecting to a remote network. This
      community can be used to inform other routers about the available
      bandwidth through a given route.</t>

      <t>The Link Bandwidth Extended Community can be either transitive or
      non-transitive. Therefore the value of the high-order octet of the
      extended Type Field can be 0x00 or 0x40, respectively. The value of the
      low-order octet of the extended type field for this communities is 0x04.
      The value of the Global Administrator subfield in the Value Field SHOULD
      represent the Autonomous System of the router that attaches the Link
      Bandwidth Extended Community, but it can be set to any 2-byte value. If
      the Autonomous System number cannot be represented in two octets,
      AS_TRANS <xref target="RFC6793"/>, SHOULD be used in the Global
      Administrator subfield. The encoding of 4-octet ASN is out of scope of
      this document. The bandwidth value is expressed as 4 octets in <xref
      target="IEEE.754-2019"/> floating point format, units being bytes (not
      bits!) per second. It is carried in the Local Administrator subfield of
      the Value Field.</t>

      <figure anchor="LBWExtCom" suppress-title="false"
              title="Link Bandwidth Extended Community">
        <artwork align="left" xml:space="preserve">
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Type=0x00/0x40 | SubType= 0x04 |       AS Number               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                     Bandwidth Value                           |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 Type:   1-octet field MUST be set to 0x00 or 0x40 
         to indicate transitive/non-transitive.

 SubType: 1-octet field MUST be set to 0x04 
          to indicate 'Link-Bandwidth'.

 Global Administrator sub-field: 
          2-octet represent the Autonomous System.

 Local Administrator sub-field: 
          Bandwidth value (bytes per sec) encoded as 4 octets
          in IEEE floating point format.
</artwork>
      </figure>
    </section>

    <section title="Protocol Procedures">
      <t>The procedures cover both the transitive and non-transitive variants
      of the Link Bandwidth Extended Community so that implementations can
      handle both variants in a way that supports existing deployments. Please
      refer to <xref target="IANA"/> and <xref target="Appendix"/> for more
      details.</t>

      <section anchor="Originator"
               title="Sender (Originating Link Bandwidth Extended Community)">
        <t>A BGP speaker that attaches a Link Bandwidth Extended Community
        SHOULD be able to advertise either a transitive or a non-transitive
        Link Bandwidth Extended Community. Implementations SHOULD provide
        configuration to set the transitivity type of the Link Bandwidth
        Extended Community, as well as the Global Administrator and bandwidth
        values in the Local Administrator field, using local policy. Different
        implementations MAY use different default values for the transitivity
        type of the Link Bandwidth Extended Community. The provided
        configuration SHOULD allow operators to override the default
        transitivity value as needed. An implementation MAY advertise
        bandwidth value as zero.</t>

        <t>Generally, a single Link Bandwidth Extended Community of the
        transitivity type that is desired in a deployment is attached to a
        route. However during transition (refer <xref
        target="Operational Condiderations"/> for details), a BGP speaker MAY
        attach one Link Bandwidth Extended Community per transitivity
        (transitive/non-transitive) both having the same 'Bandwidth Value'
        field.</t>

        <t>A Link Bandwidth Extended Community MAY be attached or updated for
        a BGP route upon receipt during Adj-RIB-In processing. The Link
        Bandwidth Extended Community MAY be attached or updated for a BGP
        route's Adj-RIB-Out entry while being advertised to a neighboring BGP
        speaker.</t>

        <t>Implementations MAY provide a configuration option to send
        non-transitive Link Bandwidth Extended Communities on external BGP
        sessions.</t>
      </section>

      <section anchor="Receiver"
               title="Receiver (Receiving Link Bandwidth Extended Community)">
        <t>A BGP receiver MUST be able to process Link Bandwidth Extended
        Community of both transitive and non-transitive types. The receiver
        MUST NOT flap or treat the route as malformed based on the
        transitivity of the Link Bandwidth Extended Community and/or BGP
        session type (internal vs. external). </t>

        <t>Implementations MAY provide configuration to accept non-transitive
        Link Bandwidth Extended Communities from external BGP sessions.</t>

        <t>A BGP update with an attached Link Bandwidth Extended Community
        with a bandwidth value of zero is valid. When all contributing paths
        have a non-zero value in the Link Bandwidth Extended Community, the
        bandwidth values of those paths (or their ratio) can be utilized as
        weights to enable weighted load-balancing. Details of weighted
        load-balancing are outside the scope of this document. However, in the
        case where the paths have a mix of zero and non-zero values, or all
        zero values, the behavior is determined by local policy. For example,
        implementations MAY exclude the paths with zero value from weighted
        load balancing formation as long as at least one path with non-zero
        value exists or they MAY fallback to ECMP.</t>
      </section>

      <section anchor="Re-advertisement" title="Re-advertisement Procedures">
        <t>This section describes the procedures to be followed when a BGP
        speaker receives a route with an attached Link Bandwidth Extended
        Community and subsequently re-advertises that route.</t>

        <section anchor="NextHopSelf"
                 title="Re-advertisement with Next hop Change">
          <t>When a BGP speaker re-advertises a route received with Link
          Bandwidth Extended Community and sets the next hop to itself or to
          another address, it MAY do any one of the following as its default
          behavior -remove the Link Bandwidth Extended Community, re-advertise
          it unchanged, or regenerate it with an appropriate value.
          Implementations SHOULD provide a local configuration method to alter
          their default behavior to the other options with per-session
          granularity.</t>

          <t>When regenerating Link Bandwidth Extended Community, the same
          procedures as outlined in <xref target="Originator"/> apply. Please
          also refer to <xref target="LinkBandwidthArithmetic"/> for use in a
          BGP multipath environment.</t>
        </section>

        <section anchor="NextHopUnchanged"
                 title=" Re-advertisement with Next Hop Unchanged">
          <t>A BGP speaker that receives a route with a Link Bandwidth
          Extended Community and re-advertises or reflects the same without
          changing its next hop, SHOULD NOT change the Link Bandwidth Extended
          Community in any way.</t>
        </section>
      </section>

      <section anchor="LinkBandwidthArithmetic"
               title="Link Bandwidth Extended Community Arithmetic and BGP Multipath">
        <t>In a BGP multipath environment, the bandwidth value that is sent or
        re-advertised MAY be calculated based on the Link Bandwidth Extended
        Community associated with each constituent path contributing to
        multipath in the Local Routing Information Base (Local-RIB). This
        topic is beyond the scope of this document. Refer to <xref
        target="draft-ietf-bess-ebgp-dmz"/> which describes how this could be
        done in specific scenarios.</t>
      </section>
    </section>

    <section anchor="Error" title="Error Handling">
      <t>If a BGP speaker receives a route with more than one Link Bandwidth
      Extended Communities and uses the route to compute weighted load
      balancing, it SHOULD use the extended community with the lowest
      "Bandwidth Value", ignoring the transitivity. Implementations MAY
      provide configuration to change the above preference.</t>

      <t>Between transitive and non-transitive types of Link Bandwidth
      Extended Communities that have the same 'Bandwidth Value', the
      transitivity doesn't matter for purpose of computing weighted load
      balancing or programming to FIB (Forwarding Information Base).</t>

      <t>Note that these procedures mean that a BGP speaker reflecting a route
      with next hop unchanged (e.g. RR) will re-advertise the Link Bandwidth
      Extended Communities received on the route as-is without any
      modification, while following the extended community transitivity
      rules.</t>

      <t>Link Bandwidth Extended Communities with a negative value SHALL be
      ignored and MUST NOT be advertised.</t>

      <t>Link Bandwidth Extended Communities with a zero value MUST NOT be
      considered malformed.</t>

      <t>If any of the paths lack a valid Link Bandwidth Extended Community,
      ECMP (Equal-Cost Multi-Path) MUST be used instead. </t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>IANA is requested to update the Transitive Two-Octet AS-Specific
      Extended Community Sub-Types registry (Type 0x00) and Sub-Type 0x04
      to:</t>

      <figure>
        <artwork align="left" xml:space="preserve">    Name
    ----
    transitive Link Bandwidth Extended Community</artwork>
      </figure>

      <t>IANA is requested to update the Non-Transitive Two-Octet AS-Specific
      Extended Community Sub-Types registry (Type 0x40) and Sub-Type 0x04
      to:</t>

      <figure>
        <artwork align="left" xml:space="preserve">    Name
    ----
    non-transitive Link Bandwidth Extended Community</artwork>
      </figure>

      <t>Both updates are to reference this document.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>This extension to BGP has similar security implications as BGP
      Extended Communities <xref target="RFC4360"/></t>

      <t>The Link Bandwidth Extended Community conveys bandwidth and capacity
      information that may be sensitive. Exporting this community outside of
      an administrative domain can expose private network resource details.
      When propagating the routes with Link Bandwidth Extended Community
      towards an untrusted network or outside of an administrative domain, it
      is recommended operators use policy to filter out this community.</t>
    </section>

    <section anchor="Operational Condiderations"
             title="Operational Considerations">
      <section title="Inconsistent Deployment">
        <t>Prior deployments of the feature specified in this document have
        involved implementations that only understood one of the two extended
        community transitivity types. As a result, such implementations would
        treat the use of the other transitivity type in a "ships in the night"
        fashion. The procedures in this document govern how multiple
        transitivity types for bandwidth should operate.</t>

        <t>In circumstances where networks have deployed a mixture of
        implementations supporting this document's procedures for both
        transitivity types, and older implementations that only understand one
        transitivity type, inconsistent behavior could result. A prime example
        is when a route received by a BGP speaker contains both a transitive
        and a non-transitive Link Bandwidth Extended Community and that BGP
        speaker performs an operation that updates only one of the Link
        Bandwidth Extended Communities, the other community may have an
        inconsistent value. As a result, downstream BGP speakers that may
        receive such routes may perform inappropriate weighted load
        balancing.</t>

        <t>To mitigate such issues, when operators are aware that older
        implementations are present in their networks, they may wish to take
        actions to address such inconsistencies. One option would be to filter
        either at advertisement time on the older BGP speaker the unsupported
        transitivity type of Link Bandwidth Extended Community - if the
        implementation is capable of such filtering. Alternatively, a
        receiving BGP speaker, knowing that the sending speaker is incapable
        of doing such operations, could strip the Link Bandwidth Extended
        Community type that is unsupported by the sender.</t>

        <t>Ideally this operational consideration is short-lived until all the
        routers in the network have been upgraded to implementations that
        consistently support the procedures in this document.</t>
      </section>
    </section>

    <section anchor=" Contributors" title=" Contributors">
      <author fullname="Kaliraj Vairavakkalai" initials="K."
              surname="Vairavakkalai">
        <organization>Juniper Networks, Inc.</organization>

        <address>
          <postal>
            <street>1133 Innovation Way,</street>

            <city>Sunnyvale</city>

            <region>CA</region>

            <code>94089</code>

            <country>US</country>
          </postal>

          <email>kaliraj@juniper.net</email>
        </address>
      </author>

      <author fullname="Natrajan Venkataraman" initials="N."
              surname="Venkataraman">
        <organization>Juniper Networks, Inc.</organization>

        <address>
          <postal>
            <street>1133 Innovation Way,</street>

            <city>Sunnyvale</city>

            <region>CA</region>

            <code>94089</code>

            <country>US</country>
          </postal>

          <email>natv@juniper.net</email>
        </address>
      </author>

      <author fullname="Rex Fernando" initials="R." surname="Fernando">
        <organization>Cisco Systems</organization>

        <address>
          <postal>
            <street>170 W. Tasman Drive</street>

            <city>San Jose</city>

            <region>CA</region>

            <code>95134</code>

            <country>US</country>
          </postal>

          <email>rex@cisco.com</email>
        </address>
      </author>
    </section>

    <section anchor="Acknowledgments" title="Acknowledgments">
      <t>The authors would like to thank Yakov Rekhter, Srihari Sangli and Dan
      Tappan for proposing unequal cost load balancing as one possible
      application of the extended community attribute. The authors would like
      to thank Jeff Haas for all the discussions and providing text for
      operational considerations.</t>

      <t>The authors would like to thank Bruno Decraene, Robert Raszuk, Joel
      Halpern, Aleksi Suhonen, Randy Bush, Stephane Litkowski, Mankamana
      Mishra, Moshiko Nayman, Keon Vafai, Ketan Talaulikar, Yingzhen Qu, Anoop
      Ghanwani, Dongjie (Jimmy) and John Scudder for their comments and
      contributions.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"?>

      <?rfc include="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4360.xml"?>

      <?rfc include="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.6793.xml"?>

      <reference anchor="IEEE.754-2019"
                 target="https://ieeexplore.ieee.org/document/8766229">
        <front>
          <title abbrev="IEEE-754-2019">IEEE Standard for Floating-Point
          Arithmetic</title>

          <author>
            <organization showOnFrontPage="true">IEEE</organization>
          </author>

          <date day="22" month="July" year="2019"/>
        </front>
      </reference>
    </references>

    <references title="Informative References">
      <reference anchor="draft-ietf-bess-ebgp-dmz"
                 target="https://tools.ietf.org/html/draft-ietf-bess-ebgp-dmz">
        <front>
          <title abbrev="ebgp-dmz">Cumulative DMZ Link Bandwidth and
          load-balancing</title>

          <author fullname="Satya Ranjan Mohanty" initials="S"
                  surname="Mohanty"/>

          <date day="20" month="07" year="2025"/>
        </front>
      </reference>
    </references>

    <section anchor="Appendix" title="Document History">
      <t>BGP Link Bandwidth Extended Community has evolved over several
      versions of the IETF draft. In the earlier versions up to
      draft-ietf-idr-link-bandwidth-08, only the non-transitive version of
      Link Bandwidth Extended Community was supported. However, starting from
      draft-ietf-idr-link-bandwidth-09, both transitive and non-transitive
      versions of Link Bandwidth Extended Community are supported.</t>

      <t>A BGP speaker (Sender or Receiver) needs to be upgraded to support
      the procedures defined in this document to provide full interoperability
      for both transitive and non-transitive versions of Link Bandwidth
      Extended Community. In order to simplify implementations, it is not a
      goal to provide interoperability by upgrading only the RR.</t>
    </section>
  </back>
</rfc>
