<?xml version="1.0" encoding="utf-8"?>
<?xml-model href="rfc7991bis.rnc"?>

<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<rfc
  xmlns:xi="http://www.w3.org/2001/XInclude"
  category="std"
  docName="draft-wang-idr-next-next-hop-nodes-04"
  ipr="trust200902"
  obsoletes=""
  updates=""
  submissionType="IETF"
  xml:lang="en"
  version="3"
  consensus="true">

  <front>
    <title abbrev="NNHN">BGP Next-next Hop Nodes</title>

    <seriesInfo name="Internet-Draft" value="draft-wang-idr-next-next-hop-nodes-04"/>
   
    <author fullname="Kevin Wang" initials="K" surname="Wang">
      <organization>Juniper Networks</organization>
      <address>
        <postal>
          <street>10 Technology Park Dr</street>
          <city>Westford</city>
          <region>MA</region>
          <code>01886</code>
          <country>USA</country>
        </postal>        
        <email>kfwang@juniper.net</email>  
      </address>
    </author>
   
    <author fullname="Jeff Haas" initials="J" surname="Haas">
      <organization>Juniper Networks</organization>
      <address>
        <postal>
          <street>1133 Innovation Way</street>
          <city>Sunnyvale</city>
          <region>CA</region>
          <code>94089</code>
          <country>USA</country>
        </postal>        
        <email>jhaas@juniper.net</email>  
        <uri></uri>
      </address>
    </author>
   
    <author fullname="Changwang Lin" initials="C" surname="Lin">
      <organization>New H3C Technologies</organization>
      <address>
        <postal>
          <country>China</country>
        </postal>        
        <email>linchangwang.04414@h3c.com</email>  
        <uri></uri>
      </address>
    </author>
      
    <author fullname="Jeff Tantsura" initials="J" surname="Tantsura">
      <organization>Nvidia</organization>
      <address>
        <postal>
          <country>USA</country>
        </postal>        
        <email>jefftant.ietf@gmail.com</email>  
        <uri></uri>
      </address>
    </author>

    <date year="2025"/>
    <area>Routing</area>
    <workgroup>IDR</workgroup>

    <keyword>BGP</keyword>

    <abstract>
      <t>
        BGP speakers learn their next hop addresses for NLRI in RFC-4271
        in the NEXT_HOP field and in RFC-4760
        in the "Network Address of Next Hop" field.
	Under certain circumstances, it might
	be desirable for a BGP speaker to know both the next hops and the next-next hops of 
        NLRI to make optimal forwarding decisions.  One such example is
        global load balancing (GLB) in a Clos network.
      </t>

      <t>
	Draft-ietf-idr-entropy-label defines the "Next Hop Dependent
	Characteristics Attribute" (NHC) which allows a BGP speaker to signal the forwarding
	characteristics associated with a given next hop.
      </t>
      
      <t>
	This document defines a new NHC characteristic, the
	Next-next Hop Nodes (NNHN) characteristic, which can be used to advertise
	the next-next hop nodes	associated with a given next hop.
      </t>
    </abstract>
 
  </front>

  <middle>
    
    <section>
      <name>Introduction</name>
      <t>
        BGP speakers learn their next hop addresses for NLRI in <xref target="RFC4271"/> 
        in the NEXT_HOP field and in <xref target="RFC4760"/>
        in the "Network Address of Next Hop" field.
	Under certain circumstances, it might
	be desirable for a BGP speaker to know both the next hops and the next-next hops of 
        NLRI to make optimal forwarding decisions.  One such example is the
        global load balancing (GLB) in a Clos network <xref target="I-D.cheng-rtgwg-adaptive-routing-framework"/>.
      </t>

      <t>
        When a route's ECMP has multiple next hops, packets forwarded using
        that ECMP are hashed to the member next hops for load balancing purposes. If
        one of the member next hop links is congested due
	to uneven hashing, dynamic load balancing (DLB) allows the node to adjust the hashing
	so that the congestion on that link can be mitigated. When all next hop
	link(s) are congested, DLB on the local node will not help to
	mitigate the congestion. Such nodes will require help from the previous hop(s) to shift
	the traffic towards alternative nodes to mitigate such congestion.
	This process is called global load balancing.
      </t>

      <t>
	In a Clos network, a congested link will affect the
	load balancing decisions of the previous layer nodes equally. Because of this, the
	previous-previous layer nodes do not need to change their load balancing decisions
	towards the previous layer nodes to mitigate this link congestion. This means we
	only need to know the link congestion status of the next-next hops of given BGP route
	in order to make GLB decisions. The combined link quality of each next hop and its
	corresponding next-next hops can be used as the feedback for DLB.
      </t>
      <t>
        The purpose of this document is to provide a method for BGP to learn
        the next-next hops - or more specifically, the next-next hop
        nodes. When a next hop node has more than one next-next hops towards a
        next-next hop node, DLB helps to balance the load between the multiple
        next-next hops by locally adjusting the volume of traffic hashed over a
        given ECMP member link. Thus, only the overall link congestion between
        the next hop node and the next-next hop node is important for GLB.
      </t>
      <t>
        Note that the mechanism for detecting link congestion and communicating
        them to the previous hop nodes is out of the scope of this document.
      </t>

      <t>
        This document defines a new NHC characteristic, the Next-next Hop Nodes
        (NNHN) characteristic, for the BGP Next Hop Dependent Characteristics
        Attribute (NHC) defined in 
        <xref target="I-D.ietf-idr-entropy-label"/>.
	A downstream BGP speaker can use the NNHN to advertise the
	next-next hop nodes corresponding to the next hop of an NLRI.
	This allows the upstream BGP speaker to learn the next-next hop nodes
	corresponding to each of its next hop nodes.
      </t>
      <section>
        <name>Requirements Language</name>
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL",
          "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT
          RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be
          interpreted as described in BCP 14 <xref target="RFC2119"/>
          <xref target="RFC8174"/> when, and only when, they appear in
          all capitals, as shown here.</t>
      </section>

    </section>
    
    <section>
      <name>BGP Next-next Hop Nodes (NNHN) Characteristic</name>

      <t>
        <xref target="I-D.ietf-idr-entropy-label"/> defines NHC as a container
        for characteristic TLVs. Next-next Hop Nodes is one such characteristic. It
        specifies the next-next hop nodes corresponding to the next hop field
        in the NHC.
      </t>
      
      <section>
	<name>Encoding NNHN</name>
	
	<t>
          The NNHN TLV has the NHC characteristic code 2, as assigned in
	  <xref target="I-D.ietf-idr-entropy-label" sectionFormat="of" section="5"/>.
	  The NHC characteristic length specifies the remaining number of octets in the
	  NNHN TLV. The NNHN characteristic format is shown in
	  <xref target="nnh_attr"/>:
	</t>
      
	<figure anchor="nnh_attr">
          <name>NNHN Characteristic TLV Format</name>
          <artwork type="ascii-art">
            <![CDATA[
   0                   1                   2                   3
   0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |    Characteristic Code = 2    |Characteristic Length(variable)|
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  |                    Next-hop BGP ID                            |
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  ~               Next-next-hop BGP IDs (variable)                ~
  +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
            ]]>
          </artwork>
	</figure>

	<dl newline="true">
          <dt>
	    Next-hop BGP ID: 
	  </dt>
	  <dd>
	    32-bit BGP Identifier of the next hop node attaching this NHC characteristic.
	  </dd>
          <dt>
	    Next-next-hop BGP IDs: 
	  </dt>
	  <dd>
	    One or more 32-bit BGP Identifiers, each representing a next-next hop node used by
	    the next hop node for ECMP forwarding for the NLRI in the BGP Update. 
	  </dd>
	</dl>
      </section>

      <section>
	<name>Sending NNHN</name>

        <t>
          All procedures from 
          <xref target="I-D.ietf-idr-entropy-label" sectionFormat="of" section="2.2"/>
          apply.
        </t>
	
	<t>
          When a BGP speaker S has a BGP route R it wishes to advertise
          with next hop self to its peer, it MAY choose to originate an NNHN
          characteristic. The "Next-hop BGP ID" field MUST be set to the BGP
          Identifier this BGP speaker uses with the peer.
	</t>

	<t>
          For all the ECMP paths of route R which are used for forwarding, the
          BGP Identifiers of those BGP peers MUST be encoded as the
          "Next-next-hop BGP IDs".  When more than one paths are from the same
          BGP peer, the characteristic MUST have only one BGP Identifier of that
          peer.
	</t>

	<t>
          When there are more than one "Next-next-hop BGP IDs" in the
          characteristic, they MUST be encoded in the numerically ascending order
          treating the BGP Identifier as a network byte order encoded 32-bit
          unsigned integer.
        </t>

        <t>
          An NNHN with no "Next-next-hop BGP IDs" MUST NOT be sent.
        </t>

	<t>
          When a BGP speaker S has a BGP route R it wishes to advertise with
          next hop self to its peer, it MUST NOT forward the NNHN characteristic
          received from downstream peers. It either originates its own NNHN
          characteristic as described above or does not send one.
	</t>
	
	<t>
          When a BGP speaker S has a BGP route R it wishes to advertise
          with the next hop that has not been set to self, it MUST NOT
          originate an NNHN characteristic.
          However, if a NNHN characteristic has been
          received for route R and passed the NHC validation as defined in
          <xref target="I-D.ietf-idr-entropy-label"/>, the NNHN
          characteristic SHOULD be forwarded.
	</t>

	<t>
	  A BGP speaker MUST NOT include more than one instance of NNHN in an NHC.
        </t>
      </section>
      
      <section>
	<name>Receiving NNHN</name>

        <t>
          All procedures from 
          <xref target="I-D.ietf-idr-entropy-label" sectionFormat="of" section="2.3"/>
          apply.
        </t>
	
        <t>
          When a BGP speaker wishes to enforce hop-by-hop eBGP propagation of
          the NNHN, if the received NNHN characteristic's Next-hop BGP Identifier
          does not match the BGP Identifier of the BGP speaker the UPDATE was
          received from, it MUST be ignored and discarded.
        </t>

	<t>
	  The receiver of the NNHN characteristic MUST be able to handle any order of the
          "Next-next-hop BGP IDs".
	</t>
        
	<t>
          Duplicate BGP Identifiers in the "Next-next-hop BGP IDs" MUST be
          silently ignored.
	</t>

	<t>
          The details for the use of the NNHN characteristic for global load
          balancing is out of the scope of this document.
	</t>
      </section>
      
      <section>
	<name>NNHN Error Handling</name>
	
        <t>
          The NNHN characteristic length MUST be at least 8 and MUST be divisible
          by 4, otherwise it is malformed.  Malformed NNHN characteristics MUST be
          discarded and SHOULD be logged.
        </t>

	<t>
	  If more than one instance of NNHN is included in an NHC, instances
	  beyond the first MUST be discarded and SHOULD be logged.
        </t>
     </section>
    </section>

    <section>
      <name>Operational Considerations</name>

      <t>
	Since BGP Identifiers are used to identify the next-next hop nodes, we
	need to make sure they are unique across the network where NNHN characteristic
	is sent.
      </t>
    </section>
    
    <section anchor="IANA">
      <name>IANA Considerations</name>
      <t>
	NHC Characteristic Code 2, has been assigned in
	<xref target="I-D.ietf-idr-entropy-label" sectionFormat="of" section="5"/>,
	for the NNHN characteristic defined in this document. 
      </t>
    </section>
    
    <section anchor="Security">
      <name>Security Considerations</name>
      <t>
        Insertion of a syntactically valid but bogus NNHN characteristic by an
        attacker could potentially make the forwarding behavior of the route
        non-optimal.
      </t>
    </section>
  </middle>

  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4271.xml"/>
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.draft-ietf-idr-entropy-label-17.xml"/>

      </references>
 
      <references>
        <name>Informative References</name>
       
        <xi:include href="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4760.xml"/>
	<xi:include href="https://bib.ietf.org/public/rfc/bibxml3/reference.I-D.draft-cheng-rtgwg-adaptive-routing-framework-04.xml"/>

      </references>
    </references>

    <section>
      <name>Alternative Solutions</name>
      <t>
	An alternative way to carry next-next hops is via a separate path attribute.
	We evaluated both approaches and choose the NNHN characteristic approach for several
	reasons:
      </t>
	
      <ul>
        <li>
	  Next-next hops depend on next hops, this makes it naturally fit into the
	  existing NHC attribute.
	</li>
        <li>
	  The next hop carried in the existing NHC attribute can help to validate that the
	  next-next hop nodes are indeed for the next hop of the NLRI.
	</li>
        <li>
	  Carrying next-next hop nodes via a seperate path attribute will cost an
	  additional attribute code, which is supposed to be allocated for more
	  generally used attributes.
	</li>
       </ul>
    </section>
    
    <section anchor="Acknowledgements" numbered="false">
      <name>Acknowledgements</name>
      <t>
        TBD.
      </t>
    </section>
    
    <section anchor="Contributors" numbered="false">
      <name>Contributors</name>
      <t>
        TBD.
      </t>
    </section>
    
 </back>
</rfc>
