<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-kaliraj-idr-multinexthop-attribute-02"
     ipr="trust200902">
  <front>
    <title abbrev="BGP MultiNexthop attribute">BGP MultiNexthop
    attribute</title>

    <author fullname="Kaliraj Vairavakkalai" initials="K."
            surname="Vairavakkalai">
      <organization>Juniper Networks, Inc.</organization>

      <address>
        <postal>
          <street>1194 N. Mathilda Ave.</street>

          <city>Sunnyvale</city>

          <region>CA</region>

          <code>94089</code>

          <country>US</country>
        </postal>

        <email>kaliraj@juniper.net</email>
      </address>
    </author>

    <author fullname="Minto Jeyananth" initials="M." surname="Jeyananth">
      <organization>Juniper Networks, Inc.</organization>

      <address>
        <postal>
          <street>1194 N. Mathilda Ave.</street>

          <city>Sunnyvale</city>

          <region>CA</region>

          <code>94089</code>

          <country>US</country>
        </postal>

        <email>minto@juniper.net</email>
      </address>
    </author>

    <author fullname="Gyan Mishra" initials="G." surname="Mishra">
      <organization>Verizon Communications Inc.</organization>

      <address>
        <postal>
          <street>13101 Columbia Pike</street>

          <city>Silver Spring</city>

          <region>MD</region>

          <code>20904</code>

          <country>USA</country>
        </postal>

        <email>gyan.s.mishra@verizon.com</email>
      </address>
    </author>

    <date day="28" month="December" year="2021"/>

    <abstract>
      <t>Today, a BGP speaker can advertise one nexthop for a set of NLRIs in
      an Update. This nexthop can be encoded in either the BGP-Nexthop
      attribute (code 3), or inside the MP_REACH attribute (code 14).</t>

      <t>For cases where multiple nexthops need to be advertised, BGP-Addpath
      is used. Though Addpath allows basic ability to advertise
      multiple-nexthops, it does not allow the sender to specify desired
      relationship between the multiple nexthops being advertised e.g.,
      relative-preference, type of load-balancing. These are local decisions
      at the receiving speaker based on local configuration and path-selection
      between the various additional-paths, which may tie-break on some
      arbitrary step like Router-Id or BGP nexthop address.</t>

      <t>Some scenarios with a BGP-free core may benefit from having a
      mechanism, where egress-node can signal multiple-nexthops along with
      their relationship, in one BGP route, to ingress nodes. This document
      defines a new BGP attribute "MultiNexthop (MNH)" that can be used for
      this purpose.</t>

      <t>This attribute can be used for both labeled and unlabled BGP
      families. The MNH can be used to advertise MPLS label along with nexthop
      for unlabeled families (e.g. Inet Unicast, Inet6 Unicast). Such that,
      mechanisms at the transport layer can work uniformly on labeled and
      unlabled BGP families. Service route scale can be confined closer to the
      service edge nodes, making the transport layer nodes light and nimble.
      They dont have any service route state, only have service end-point
      state.</t>

      <t>The MNH plays different role in "downstream allocation" scenario than
      "upstream allocation" scenario. E.g. for RFC8277 families that advertise
      downstream allocated labels, the MNH can play the "Label Descriptor"
      role, describing the forwarding semantics of the label being advertised.
      This can be useful in network visualization and controller based traffic
      engineering (e.g. EPE).</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119">RFC 2119</xref>.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>Today, a BGP speaker can advertise one nexthop for a set of NLRIs in
      an Update. This nexthop can be encoded in either the top-level
      BGP-Nexthop attribute (code 3), or inside the MP_REACH attribute (code
      14).</t>

      <t>For cases where multiple nexthops need to be advertised, BGP-Addpath
      is used. Though Addpath allows basic ability to advertise
      multiple-nexthops, it does not allow the sender to specify desired
      relationship between the multiple nexthops being advertised e.g.,
      relative-ordering, type of load-balancing, fast-reroute. These are local
      decision at the receiving node based on local configuration and
      path-selection between the various additional-paths, which may tie-break
      on some arbitrary step like Router-Id or BGP nexthop address.</t>

      <t>Some scenarios with a BGP-free core may benefit from having a
      mechanism, where egress-node can signal multiple-nexthops along with
      their relationship to ingress nodes. This document defines a new BGP
      attribute "MultiNexthop (MNH)" that can be used for this purpose.</t>

      <t>This attribute can be used for both labeled and unlabled BGP
      families. The MNH can be used to advertise MPLS label along with nexthop
      for unlabeled families (e.g. Inet Unicast, Inet6 Unicast). Such that,
      mechanisms at the transport layer can work uniformly on labeled and
      unlabled BGP families. Service route scale can be confined closer to the
      service edge nodes, making the transport layer nodes light and nimble.
      They dont have any service route state, only have service end-point
      state.</t>

      <t>The MNH plays different role in "downstream allocation" scenario than
      "upstream allocation" scenario. E.g. for RFC8277 families that advertise
      downstream allocated labels, the MNH can play the "Label Descriptor"
      role, describing the forwarding semantics of the label being advertised.
      This can be useful in network visualization and controller based traffic
      engineering (e.g. EPE).</t>

      <t>A new BGP capability (<xref target="RFC3392"/>) called "MultiNexthop
      (MNH" is defined with type code: IANA TBD. This capability is used to
      express the ability to send and receive MNH attribute.</t>

      <t/>
    </section>

    <section title="Use-cases examples">
      <t/>

      <section title="Optimal forwarding exit-points signaling to ingress-node">
        <t>In a BGP free core, one can dynamically signal to the ingress-node,
        how traffic should be load-balanced towards a set of exit-nodes, in
        one BGP-route containing this attribute.</t>

        <t>Example, for prefix1, perform equal cost load-balancing towards
        exit-nodes A, B; where-as for prefix2, perform unequal-cost
        load-balancing (40%, 30%, 30%) towards exit-nodes A, B, C.</t>

        <t>Example, for prefix1, use PE1 as primary-nexthop and use PE2 as a
        backup-nexthop.</t>
      </section>

      <section title="Choosing a received label based on it's forwarding-semantic at advertising node">
        <t>In Downstream label allocation case, the MNH plays role of "Label
        descriptor" and describes the forwarding treatment given to the label
        at the advertising speaker. The receiving speaker can benefit from
        this information as in the following examples:</t>

        <t>- For a Prefix, a label with FRR enabled nexthop-set can be
        preferred to another label with a nexthop-set that doesn't provide
        FRR.</t>

        <t>- For a Prefix, a label pointing to 10g nexthop can be preferred to
        another label pointing to a 1g nexthop</t>

        <t>- Set of labels advertised can be aggregated, if they have same
        forwarding semantics (e.g. VPN per-prefix-label case)</t>
      </section>

      <section title="Signaling desired forwarding behavior when installing MPLS Upstream labels at receiving node">
        <t>In Upstream label allocation case, the receiving speaker's
        forwarding-state can be controlled by the advertising speaker, thus
        enabling a standardized API to program desired MPLS forwarding-state
        at the receiving node. This is described in the <xref
        target="MPLS-NAMESPACES"/></t>
      </section>

      <section title="Load-balancing over EBGP parallel links">
        <t>Consider N parallel links between two EBGP speakers. There are
        different models possible to do load balancing over these links:<list>
            <t>N single-hop EBGP sessions over the N links. Interface
            addresses are used as next-hops. N copies of the RIB are exchanged
            to form N-way ECMP paths. The routes advertised on the N sessions
            can be attached with Link bandwidth comunity to perform weighted
            ECMP.</t>

            <t>1 multi-hop EBGP session between loopback addresses, reachable
            via static route over the N links. Loopback addresses are used as
            next-hops. 1 copy of the RIB is exchanged with loopback address as
            nexthop. And a static route can be configured to the loopback
            address to perform desired N-way ECMP path. M loopbacks are
            configured in this model, to achieve M different load balancing
            schemes: ECMP, weighted ECMP, Fast-reroute enabled paths etc.</t>

            <t>1 multi-hop EBGP session between loopback addresses, reachable
            via static route over the N links. Interface addresses are used as
            next-hops, without using additional loopbacks. 1 copy of the RIB
            is exchanged with MNH attribute to form N-way ECMP paths, weighted
            ECMP, Fast-reroute backup paths etc. BFD may be used to these
            directly connected BGP nexthops to detect liveness.</t>
          </list></t>
      </section>

      <section title="Flowspec routes with multiple Redirect-IP nexthops">
        <t>There are existing protocol machinery which can benefit from the
        ability of MNH to clearly specify fallback behavior when multiple
        nexthops are involved. One example is the scenario described in <xref
        target="FLWSPC-REDIR-IP"/> where multiple Redirect-to-IP nexthop
        addresses exist for a Flowspec prefix. In such a scenario, the
        receiving speakers may redirect the traffic to different nexthops,
        based on variables like IGP-cost. If instead, the MNH was used to
        specify the redirect-to-IP nexthop, then the order of preference
        between the different nexthops can be clearly specified using one
        flowspec route carrying a MNH containing those different
        nexthop-addresses specifying the desired preference-order. Such that,
        irrespective of IGP-cost, the receiving speakers will redirect the
        flow towards the same traffic collector device.</t>
      </section>

      <section title="Color-Only resolution nexthop">
        <t>Another existing protocol machinery that manufactures nexthop
        addresses from overloaded extended color community is specified in
        <xref target="SRTE-COLOR-ONLY"/>. In a way, the color field is
        overloaded to carry one anycast BGP next-hop with pre-specified
        fallback options. This approach gives us only two next-hops to play
        with. The 'BGP nexthop address' and the 'Color-only nexthop'</t>

        <t>Instead, the MNH could be used to achieve the same result with more
        flexibility. Multiple BGP nexthops can be carried, each resolving over
        a desired Transport class (Color), and with customizable fallback
        order. And the solution will work for non-SRTE networks as-well.</t>
      </section>
    </section>

    <section title="The &quot;MultiNexthop (MNH)&quot; BGP attribute encoding">
      <t>"MultiNexthop (MNH)" is a new BGP optional non-transitive attribute
      (code TBD), that can be used to convey multiple-nexthops to a
      BGP-speaker. This attribute describes forwarding semantics using one or
      more Nexthop-Forwarding-Semantics TLV.</t>

      <figure>
        <artwork>     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |1 0 0 1(Flags) |Attr. Type Code|          Length               |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |     MNH-Flags                 |   PNH-Len     |  ..Advertising|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | PNH Address /32 or /128..     |       Num-Nexthops            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |     ...one or more "Nexthop-Forwarding-Semantics TLV"...      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        </artwork>

        <postamble>Fig 1: MultiNexthop - BGP Attribute</postamble>
      </figure>

      <t/>

      <figure>
        <artwork>- Flags 
        BGP Path-attribute flags. 1001 to indicate Optional
        Non-Transitive, Extended-length field.

- Attr. Type Code
        IANA TBD.

- Length
       Two bytes field stating length of attribute value in bytes.

- MNH-Flags
       16 bit flag (UR..R)
       Only one bit MSB is defined currently, others are reserved.
           R: Reserved 
           U: 1 means the Upstream-allocation, attribute describes 
              forwarding state desired at receiving speaker. 
              0 means the Downstream-allocation, attribute describes
              forwarding state present at advertising-speaker.
- PNH-Len
       Protocol-NH Length in bits (= 32 or 128) Advertising PNH IPv4 or IPv6

- PNH-address
       BGP Protocol Nexthop address (Len = 32 or 128) advertised in NEXT_HOP or
       MP_REACH_NLRI attr. Used to sanity-check this attribute.

- Num-Nexthops
       Number of nexthop addresses carried in the MNH.
       &gt;1 if ECMP or Alternate-paths. </artwork>
      </figure>

      <t/>

      <t>Sec 3.2 describes the Nexthop-Forwarding-Semantics TLV.</t>

      <section title="Operations">
        <section title="BGP Capability for MNH attribute">
          <t>A new BGP capability <xref target="RFC3392"/> called
          "MultiNexthop (MNH)" is defined with type code: IANA TBD. The MNH
          attribute MUST NOT be sent to a BGP speaker that has not advertise
          the MNH capability. A BGP speaker MUST ignore the MNH attribute
          received from a peer which has not advertised the MNH attribute.</t>
        </section>

        <section title="Scope of use, and propagation">
          <t>The MNH attribute is intended to be used in a BGP free core,
          between egress and ingress BGP speakers that understand this
          attribute. </t>

          <t>Also, it is required to avoid un-intentionally leaking it to
          other AS on an EBGP session, via a BGP speaker that does not
          understand MNH attribute.</t>

          <t>To achieve this, the attribute is defined as "optional
          non-transitive", and uses a new BGP capability. If a MNH-attribute
          is received by a PE BGP-speaker that does not understand it, the
          optional non-transitive nature avoids unintentionally propagating it
          towards EBGP-peers. </t>

          <t>This also means that a RR needs to be upgraded to support this
          attribute before any PEs in the network can make use of it. When a
          RR receives the MNH-attribute from a client that supports the
          attribute, it propagates the attribute as-is when reflecting the
          route with nexthop unchanged. </t>

          <t>When a BGP speaker receives the MNH-attribute from another
          speaker that did not advertise support of the attribute, the
          attribute is ignored.</t>

          <t>The MNH attribute capability provides additonaly protection
          against receiving this attribute from EBGP peers, when not
          intended.</t>
        </section>

        <section title="Interaction of MNH with Nexthop (in attr-code 3, 14)">
          <t>When adding a MultiNexthop attribute to an advertised BGP route,
          the speaker MUST put the same next-hop address in the Advertising
          PNH field as it put in the Nexthop field inside NEXT_HOP attribute
          or MP_REACH_NLRI attribute. Any speaker that recognizes this
          attribute and changes the PNH while re-advertising the route MUST
          remove the MultiNexthop-Attribute in the re-advertisement. The
          speaker MAY however add a new MultiNexthop-Attribute to the
          re-advertisement; while doing so the speaker MUST record in the
          "Advertising-PNH" field the same next-hop address as used in
          NEXT_HOP field or MP_REACH_NLRI attribute.</t>

          <t>A speaker receiving a MNH attribute SHOULD ignore it if the
          next-hop address contained in Advertising-PNH field is not the same
          as the next-hop address contained in NEXT_HOP field or MP_REACH_NLRI
          field.</t>
        </section>

        <section title="Interaction with Addpath">
          <t><xref target="ADDPATH-GUIDELINES"/> suggests the following:</t>

          <t>"Diverse path: A BGP path associated with a different BGP
          next-hop and BGP router than some other set of paths. The BGP router
          associated with a path is inferred from the ORIGINATOR_ID attribute
          or, if there is none, the BGP Identifier of the peer that advertised
          the path."</t>

          <t>When selecting "diverse paths" for ADD_PATH as specified above,
          the MNH attribute should also be compared if it exists, to determine
          if two routes have "different BGP next-hop".</t>
        </section>

        <section title="Path-selection considerations">
          <t>While tie breaking in the path-selection as described in
          RFC-4271, 9.1.2.2. step (e) viz. the "IGP cost to nexthop", consider
          the highest cost among the nexthop-legs present in this
          attribute.</t>
        </section>

        <section title="NH-Flags U bit, denoting upstream/downstream semantics">
          <t>U-bit being Set indicates that this attribute describes what the
          forwarding semantics of an Upstream-allocated label at the
          receiving-speaker should be. All other bits in NH-Flags are
          currently reserved, MUST be set to 0 by sender and MUST be ignored
          by receiver.</t>

          <t>This attribute can be used for both labeled and unlabled BGP
          families.</t>

          <t>A MultiNexthop attribute with U=0 is called "Label Descriptor"
          role. A BGP speaker advertising a downstream-allocated label-route
          MAY add this attribute to the BGP route Update, to "describe" to the
          receiving speaker what the label's forwarding semantics at the
          sending speaker is.</t>

          <t>Today semantics of a downstream-allocated label is known only to
          the egress-node advertising the label. The speaker receiving the
          label-binding doesn't know what the label's forwarding semantic at
          the advertiser is. In some environments, it may be useful to convey
          this information to the receiving speaker. This may help in better
          debugging and manageability, or enable the receiving speaker, which
          could also be some centralized controller, make better decisions
          about which label to use, based on the label's
          forwarding-semantic.</t>

          <t>While doing upstream-label allocation, this attribute (U-bit Set)
          can be used to convey the forwarding-semantics at the receiving node
          should be. Details of the BGP protocol extensions required for
          signaling upstream-label allocation are out of scope of this
          document, and are described in <xref target="MPLS-NAMESPACES"/>.</t>

          <t>In rest of this document, the use of term "Label" will mean
          downstream allocated label, unless specified otherwise as
          upstream-allocated label.</t>

          <t>When using the MultiNexthop attribute for IP-routes, U-bit is
          Set. Since IP prefixes are by nature upstream allocated.</t>
        </section>
      </section>

      <section title="Nexthop Forwarding Semantics TLV">
        <t>Each Forwarding-Semantics TLV expresses a nexthop leg's forwarding
        action. i.e. a "FwdAction" with an associated Nexthop. The type of
        actions defined by this TLV are given below. The "Nexthop-Leg" field
        takes appropriate values based on the FwdAction.</t>

        <figure>
          <preamble/>

          <artwork>  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
 |             FwdAction         |            Len                |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |             ...Nexthop-Leg Descriptor-TLV...                  |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          </artwork>

          <postamble>Fig 2: Nexthop Forwarding Semantics TLV</postamble>
        </figure>

        <t/>

        <figure>
          <artwork> FwdAction         Meaning
 ---------      -------------
       1        Forward 
       2        Pop-And-Forward 
       3        Swap 
       4        Push 
       5        Pop-And-Lookup
       6        Replicate
 
 - Len
    Length of Nexthop Forwarding Semantics TLV including all
    Nexthop-Leg Descriptor TLVs.
</artwork>
        </figure>

        <t/>

        <t>Meaning of most of the above FwdAction semantics is well
        understood. FwdAction 1 is applicable for both IP and MPLS routes.
        FwdActions 2-5 are applicable for MPLS routes only. FwdActions 1 and 6
        are applicable for Flowspec routes for Redirect and Mirror
        actions.</t>

        <t>The "Forward" action means forward the IP/MPLS packet with the
        destination prefix (IP-dest-addr/MPLS-label) value unchanged. For IP
        routes, this is the forwarding-action given for next-hop addresses
        contained in BGP path-attributes: Nexthop (code 3) or MP_REACH_NLRI
        (code 14). For MPLS routes, usage of this action is equivalent to SWAP
        with same label-value; one such usage is explained in <xref
        target="MPLS-NAMESPACES"/> when Upstream-label-allocation is in
        use.</t>

        <t>The "Pop-And-Forward" action means Pop the MPLS-label and forward
        the payload towards the Nexthop IP-address specified in the sub-TLV,
        using appropriate encapsulation to reach the Nexthop.</t>

        <t>The "Pop-And-Lookup" action may result in a MPLS-lookup or an
        upper-layer header (like IPv4, IPv6) lookup, depending on whether the
        label that was popped was the bottom of stack label.</t>

        <t>If an incompatible FwdAction is received for a prefix-type, or an
        unsupported FwdAction is received, it is considered a semantic-error
        and MUST be dealt with as explained in section 5.</t>
      </section>

      <section title="Nexthop-Leg Descriptor TLV">
        <t>The Nexthop-Leg Descriptor TLV describes various attributes of the
        Nexthop-legs that the FwdAction is associated with.</t>

        <t/>

        <figure>
          <preamble/>

          <artwork>  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |           NhopDescrType       |            Len                |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |         Flags                 |      Relative-Preference      |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                 ..Nexthop Attributes SubTLV..                 |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ 
 |                 ..Nexthop Attributes SubTLV..                 |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          </artwork>

          <postamble>Fig 3: Nexthop-Leg Descriptor TLV</postamble>
        </figure>

        <figure>
          <artwork>  NhopDescrType  Meaning
  -------------  ---------
     1           IPv4-nexthop
     2           IPv6-nexthop
     3           Labeled-IP-Nexthop 
     4           Forwarding-Context-Nexthop


- Len (2 octets)
    Length in bytes of Nexthop-Leg Descriptor TLV, including Flags, Relative-Preference and all
    Nexthop Attributes SubTLVs.

- Flags
     2 octets. Must send zero. Must ignore on receive.
         
- Relative-Preference
     Unsigned 2 octet integer specifying relative order or
     preference, to use in FIB. Use in FIB all usable legs with lowest
     relative-weight. If multiple legs exist with that weight, form ECMP.
          </artwork>
        </figure>
      </section>

      <section title="Nexthop Attributes Sub-TLV">
        <t/>

        <figure>
          <artwork> SubTLV type       Meaning
 -----------      ---------- 
      1            IP-Address
      2            Labeled-IP-Nexthop
      3            Transport Class ID (Color)
      4            Bandwidth
      5            Load-Balance-Factor
      6            Forwarding-context Name 
      7            Forwarding-context Route-Target
      </artwork>
        </figure>

        <t/>

        <section title="IP Address">
          <t/>

          <figure>
            <preamble/>

            <artwork>  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |      Attr SubTLV Type = 1     |      Len (2 bytes)            |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |        Flags (2 bytes)        |    PfxLen    |      ..IPv4 or |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |  IPv6 Address ..              |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

- Len (2 octets)
    Length in bytes of remaining portion of SubTLV.

- Flags
     2 octets. Must send zero. Must ignore on receive.

- PfxLen (1 octet)
    Length in bits of Nexthop IP-address (32 or 128)

- IPv4 or IPv6 Address
    Remaining bytes in sub-TLV are the 32 bit or 128 bit Nexthop address.    
       </artwork>

            <postamble>Fig 4: IP-Address attribute sub-TLV</postamble>
          </figure>

          <t/>

          <t>This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV
          with FwdAction of Pop-And-Forward or Forward.</t>
        </section>

        <section title="Labeled IP nexthop">
          <t/>

          <figure>
            <preamble/>

            <artwork>
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |      Attr SubTLV Type = 2     |      Len (2 bytes)            |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |        Flags (2 bytes)        |        Label (20 bits)        | 
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |       |Rsrv |S|    PfxLen     |     ..IPv4 or IPv6 Address .. |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


- Len (2 octets)
    Length in bytes of remaining portion of SubTLV.

- Flags (2 octets):
     ELC (MSB bit): indicates if this egress NH is Entropy Label Capable.
     Remaining bits are Reserved. Must send zero. Must ignore on receive.

- Label:
     The Label field is a 20-bit field containing an MPLS label value
     (see [RFC3032]).

- Rsrv:
      This 3-bit field SHOULD be set to zero on transmission and
      MUST be ignored on reception.

- S:
      This 1-bit field MUST be set to one on last label being pushed.

- PfxLen (1 octet)
    Length in bits of Nexthop IP-address (32 or 128)

- IPv4 or IPv6 Address
    Remaining bytes in sub-TLV are the 32 bit or 128 bit Nexthop address.
            </artwork>

            <postamble>Fig 5: "Labeled nexthop" attribute sub-TLV</postamble>
          </figure>

          <t/>

          <t>This sub-TLV would be valid with Nexthop-Leg Forwarding-Semantics
          TLV with FwdAction of Swap or Push.</t>
        </section>

        <section title="Transport Class ID (Color)">
          <t>The Nexthop can be associated with a Transport Class, so as to
          resolve a path that satisfies required Transport tunnel
          characteristics. Transport Class is defined in <xref
          target="BGP-CT"/></t>

          <t>Transport Class is a per-nexthop scoped attribute. Without MNH,
          the Transport class is applied to the nexthop IP-address encoded in
          the BGP-Nexthop attribute (code 3), or inside the MP_REACH attribute
          (code 14). With MNH, the Transport Class can be specified per
          Nexthop-Leg TLV. It is applied to the IP-address encoded in the
          Nexthop Attribute Sub-TLVs of type "IP Address", "Labeled IP
          nexthop". </t>

          <t>The format of the Transport Class ID Sub-TLV is as follows:</t>

          <figure>
            <preamble/>

            <artwork>  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |      Attr SubTLV Type = 3     |      Len (2 bytes)            |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                    Transport Class ID (4 bytes)               |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

- Len (2 octets)
    Length in bytes of remaining portion of SubTLV.

- Transport Class ID (Color):
    This is a 32 bit identifier, associated with the Nexthop address.
    The Nexthop specified in "IP-address or Labeled Nexthop" TLVs
    are resolved over tunnels of this color. 
  Defined in [BGP-CT] [draft-kaliraj-idr-bgp-classful-transport-planes]
</artwork>

            <postamble>Fig 6: "Transport Class ID (Color)" attribute
            sub-TLV</postamble>
          </figure>

          <t/>

          <t>This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV
          with FwdAction of Forward, Swap or Push.</t>
        </section>

        <section title="Available Bandwidth">
          <t/>

          <figure>
            <preamble/>

            <artwork>  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |      Attr SubTLV Type = 4     |      Len (2 bytes)            |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                   Bandwidth (8 octets)                        |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |                   Bandwidth (contd.)                          | 
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

- Len (2 octets)
    Length in bytes of remaining portion of SubTLV.

- Bandwidth
    The bandwidth of the link expressed as 8 octets,
    units being bits per second.
</artwork>

            <postamble>Fig 6: "Bandwidth" attribute sub-TLV</postamble>
          </figure>

          <t/>

          <t>This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV
          with FwdAction of Forward, Swap or Push.</t>

          <t>This sub-TLV would also be valid in a Label-Descriptor-attribute
          whose U-bit is reset.</t>
        </section>

        <section title="Load balance factor">
          <t/>

          <figure>
            <preamble/>

            <artwork>  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |      Attr SubTLV Type = 5     |      Len (2 bytes)            |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |      Balance Percentage       |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

- Len (2 octets)
    Length in bytes of remaining portion of SubTLV.

- Balance Percentage:
    This is the explicit "balance percentage" requested by the sender,
    for unequal load-balancing over these Nexthop-Descriptor-TLV legs.
    This balance percentage would override the implicit
    balance-percentage calculated using "Bandwidth" attribute
    sub-TLV.
</artwork>

            <postamble>Fig 7: "Load-Balance-Factor" attribute
            sub-TLV</postamble>
          </figure>

          <t/>

          <t>This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV
          with FwdAction of Forward, Swap or Push.</t>

          <t>This is the explicit "balance percentage" requested by the
          sender, for unequal load-balancing over these Nexthop-Descriptor-TLV
          legs. This balance percentage would override the implicit
          balance-percentage calculated using "Bandwidth" attribute
          sub-TLV</t>
        </section>

        <section title="Forwarding-context name">
          <t/>

          <figure>
            <preamble/>

            <artwork>  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |      Attr SubTLV Type = 6     |      Len (2 bytes)            |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |    NameLen (2 octets)         | ..Fwd-Context-name...(unicode)|
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

 - Len (2 octets)
    Length in bytes of remaining portion of SubTLV.

 - NameLen (2 octets)
    Length in bytes of Fwd-Context-Name

 - Forwarding Context Name:
    Name of forwarding context (e.g. VRF-name) where lookup should happen.
            </artwork>

            <postamble>Fig 8: Forwarding-Context name attribute
            sub-TLV</postamble>
          </figure>

          <t/>

          <t>This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV
          with FwdAction of Pop-And-Lookup. Ref: usecase 2.3. The
          Fowarding-context-name identfies the forwarding-context (for e.g.
          the VRF-name) where the lookup should happen after pop label.</t>
        </section>

        <section title="Forwarding-context Route-Target">
          <t/>

          <figure>
            <preamble/>

            <artwork>  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |      Attr SubTLV Type = 7     |      Len (2 bytes)            |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |      Type (2 octets)          |  ...Route Target... (8 octets)|
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |               ..Route Target... (continued)                   |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |...Route Target... (8 octets)  |
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+


 - Len (2 octets)
    Length in bytes of remaining portion of SubTLV.

 - Type:
      value of 1 indicates Route Target follows.

 - Route Target:
       Import Route Target of the forwarding context
       (e.g. VRF-name) where lookup should happen.

            </artwork>

            <postamble>Fig 9: "Route-Target identifying the
            Forwarding-Context" attribute sub-TLV</postamble>
          </figure>

          <t/>

          <t>This sub-TLV would be valid with Nexthop-Forwarding-Semantics TLV
          with FwdAction of Pop-And-Lookup. Ref: usecase 2.3. The Route Target
          identfies the forwarding-context (for e.g. VRF) where the lookup
          should happen after pop label.</t>

          <t/>

          <t>If any of these sub-TLVs or FwdAction combinations are
          unrecognized or unsupported by a receiving speaker, it is considered
          a semantic error for that speaker, and in such case error-handling
          procedures described in section 4 should be followed.</t>
        </section>
      </section>
    </section>

    <section title="Error handling procedures">
      <t/>

      <t>When U-bit is Reset, this attribute is used to describe the label
      advertised by the BGP-peer. If the value in the attribute is
      syntactically parse-able, but not semantically valid, the receiving
      speaker should deal with the error gracefully and MUST NOT tear down the
      BGP session. In such cases the rest of the BGP-update can be consumed if
      possibe.</t>

      <t>When U-bit is Set, this attribute is used to specify the forwarding
      action at the receiving BGP-peer. If the value in the attribute is
      syntactically parse-able, but not semantically valid, the receiving
      speaker SHOULD deal with the error gracefully by ignoring the MNH
      attribute, and continue processing the route. It MUST NOT tear down the
      BGP session.</t>

      <t>If a MNH with U-bit Reset is received for an IP-route (SAFI Unicast),
      the MNH attribute SHOULD be ignored. Because IP route prefixes are
      upstream allocated by nature.</t>

      <t>If a MNH with U-bit Reset is received for an <xref
      target="MPLS-NAMESPACES"/> route, the MNH attribute SHOULD be ignored.
      Because the label prefix in MPLS-NAMESPACE family routes is upstream
      allocated.</t>

      <t>The receiving BGP speaker MAY consider the "Num-Nexthop" value in a
      MNH attribute (U-bit Set) not acceptable, based on it's forwarding
      capabilities. In such cases, the MNH attribute SHOULD be considered
      Unusable, and not be used, ignored on receipt. The condition SHOULD be
      dealt gracefully and MUST NOT tear down the BGP session.</t>
    </section>

    <section title="Scaling considerations">
      <t>The MNH attribute allows receiving multiple nexthops on the same BGP
      session. This flexibility also opens up the possibility that a peer can
      send large number of multipath (ECMP/UCMP/FRR) nexthops that may
      overwhelm the local system's forwarding plane. Prefix-limit based checks
      will not avoid this situation.</t>

      <t>To keep the scaling limits under check, a BGP speaker MAY keep
      account of number of unique multipath nexthops that are received from a
      BGP peer, and impose a configurable max-limit on that. This is
      especially useful for EBGP peers.</t>

      <t>A good scaling property of conveying multipath nexthops using the MNH
      attribute with N nexthop legs on one BGP session, as against BGP routes
      on N BGP sessions is that, it limits the amount of transitionary
      multipath combinatorial state in the latter model. Because the final
      multipath state is conveyed by one route update in deterministic manner,
      there is no transitionary multipath combinatorial explosion created
      during establishment of N sessions. </t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This document makes request to IANA to allocate the following codes
      in BGP attributes registry.</t>

      <t>1. MultiNexthop (MNH) BGP-attribute: A new BGP attribute code
      TBD.</t>

      <t>This document makes request to IANA to allocate the following sub
      registries for MNH attribute:.</t>

      <t>1. "FwdAction" type as defined in 3.1.</t>

      <t>2. Nexthop-Leg Descriptor TLV:"NhopDescrType" as defined in 3.2.</t>

      <t>3. "Nexthop Attributes Sub-TLV type" as defined in 3.3.</t>

      <t>This document makes request to IANA to allocate a BGP capability code
      TBD for MNH attribute:.</t>

      <t>Note to RFC Editor: this section may be removed on publication as an
      RFC.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>The attribute is defined as optional non-transitive BGP attribute,
      such that it does not accidentally get propagated or leaked via BGP
      speakers that dont support this feature, especially does not
      unintentionally leak across EBGP boundaries.</t>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>Thanks to Robert Raszuk, Gyan Mishra, Ron Bonica for the review,
      discussions and input to the draft.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"?>

      <?rfc include="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.3392.xml"?>
    </references>

    <references title="References  ">
      <reference anchor="MPLS-NAMESPACES"
                 target="https://datatracker.ietf.org/doc/html/draft-kaliraj-bess-bgp-sig-private-mpls-labels-04">
        <front>
          <title abbrev="MPLS-NAMESPACES">BGP signalled
          MPLS-namespaces</title>

          <author fullname="Kaliraj" initials="" role="editor"
                  surname="Vairavakkalai"/>

          <date day="28" month="12" year="2021"/>
        </front>
      </reference>

      <reference anchor="BGP-CT"
                 target="https://datatracker.ietf.org/doc/draft-kaliraj-idr-bgp-classful-transport-planes/12/">
        <front>
          <title abbrev="BGP-CT">BGP Classful Transport Planes</title>

          <author fullname="Kaliraj" initials="" role="editor"
                  surname="Vairavakkalai"/>

          <date day="25" month="08" year="2021"/>
        </front>
      </reference>

      <reference anchor="FLWSPC-REDIR-IP"
                 target="https://datatracker.ietf.org/doc/html/draft-ietf-idr-flowspec-redirect-ip#section-3">
        <front>
          <title abbrev="FLWSPC-REDIR-IP">BGP Flow-Spec Redirect to IP
          Action</title>

          <author fullname="Adam" initials="" role="editor" surname="Simpson"/>

          <date day="2" month="2" year="2015"/>
        </front>
      </reference>

      <reference anchor="SRTE-COLOR-ONLY"
                 target="https://tools.ietf.org/html/draft-filsfils-spring-segment-routing-policy-06#section-8.8.1">
        <front>
          <title abbrev="SRTE-COLOR-ONLY">BGP Flow-Spec Redirect to IP
          Action</title>

          <author fullname="Clarence" initials="" role="editor"
                  surname="Filsfils"/>

          <date day="21" month="2" year="2018"/>
        </front>
      </reference>

      <reference anchor="ADDPATH-GUIDELINES"
                 target="https://datatracker.ietf.org/doc/html/draft-ietf-idr-add-paths-guidelines-08#section-2">
        <front>
          <title abbrev="ADDPATH-GUIDELINES">BGP Flow-Spec Redirect to IP
          Action</title>

          <author fullname="Jim" initials="" role="editor" surname="Uttaro"/>

          <date day="25" month="4" year="2016"/>
        </front>
      </reference>
    </references>
  </back>
</rfc>
