<?xml version="1.0" encoding="iso-8859-1" ?>
<?rfc toc="yes" ?>
<?rfc symrefs="yes" ?>
<?rfc sortrefs="yes" ?>
<?rfc compact="yes" ?>
<?rfc subcompact="no" ?>

<rfc category="std" ipr="trust200902" docName="draft-xiao-rtgwg-proxy-congestion-notification-01" consensus="true" submissionType="IETF">

<front>
        <title abbrev="Proxy for Congestion Notification"> Proxy for Congestion Notification </title>
 
  <author fullname="Xiao Min" initials="X" surname="Min">
      <organization>ZTE Corp.</organization>
     <address>
       <postal>
         <street/>

         <!-- Reorder these if your country does things differently -->

         <city>Nanjing</city>

         <region/>

         <code/>

         <country>China</country>
       </postal>

       <phone>+86 18061680168</phone>

       <email>xiao.min2@zte.com.cn</email>

       <!-- uri and facsimile elements may also be added -->
     </address>
    </author>

    <date year="2025"/>
  
    <area>Routing</area>
    <workgroup>RTGWG Working Group</workgroup>

    <keyword>Request for Comments</keyword>
    <keyword>RFC</keyword>
    <keyword>Internet Draft</keyword>
    <keyword>I-D</keyword>

    <abstract>
  <t> This document describes the necessity and feasibility to introduce a proxy network node between the congested network node and the 
  traffic sender. The proxy network node is used to translate the congestion notification. The congested network node sends the congestion 
  notification to the proxy network node in a format defined in this document, and then the proxy network node translates the received congestion 
  notification to a format known by the traffic sender and resends the translated congestion notification to the traffic sender. </t>
    </abstract>
    
</front>
  
<middle>

  <section title="Introduction">
  
  <t> <xref target="I-D.xiao-rtgwg-rocev2-fast-cnp"/> describes a congestion notification message called Fast Congestion Notification Packet 
  (Fast CNP), which can be sent by a congested network node to the traffic sender directly. Fast CNP extends the CNP <xref target="IBTA-SPEC"/> 
  consumed by the traffic sender supporting Remote Direct Memory Access (RDMA) over Converged Ethernet version 2 (RoCEv2). </t>
  
  <t> RoCEv2 has already been widely deployed, and it runs the InfiniBand transport layer over UDP and IP protocols on an Ethernet network, bringing 
  many of the advantages of InfiniBand to Ethernet networks. For a traffic sender supporting RoCEv2, congestion control is important, and the 
  RoCEv2 CNP or RoCEv2 Fast CNP must be used to alert the sender slowing down the transmission rate. For a traffic sender not supporting RoCEv2, 
  congestion control is still important, and the corresponding congestion notification message supported by the sender must be used to alert the 
  sender slowing down the transmission rate. </t>
  
  <t> Considering there are multiple different congestion notification messages existing for the traffic sender, if a congested network node would 
  send a congestion notification message to the traffic sender directly, there is a prerequisite for the congested network node to know what kind of 
  congestion notification message is supported by each specific traffic sender; Secondly, when the congested network node is a VPN Provider (P) router, 
  it's difficult for the congested network node to send a congestion notification message to the traffic sender directly, because there are different 
  routing domains for the VPN P router and VPN Customer Edge (CE) router; Thirdly, when the traffic sender supports RoCEv2, it's difficult for the 
  congested network node to construct a standard RoCEv2 CNP (please refer to Section 3 of <xref target="I-D.xiao-rtgwg-rocev2-fast-cnp"/>). </t>
  
  <t> A proxy network node between the congested network node and the traffic sender can help to resolve the problems described above, being independent 
  of the extension proposed in <xref target="I-D.xiao-rtgwg-rocev2-fast-cnp"/>. The congested network node sends a congestion notification message to 
  a proxy network node first, and then the proxy network node notifies the traffic sender about the congestion using a congestion notification message 
  known by the traffic sender (e.g., the standard RoCEv2 CNP). For the selection of the proxy network node, there are at least three rules. First one, 
  the selected proxy network node must know what kind of congestion notification message is supported by the traffic sender; Second one, the selected 
  proxy network node and the congested network node must be within the same routing domain; Third one, for RoCEv2 network, the selected proxy network 
  node must be able to learn the mapping table between Source Queue Pair and Destination Queue Pair through data traffic, which means the selected proxy 
  network node must be located where both the forward direction traffic and the backward direction traffic need to traverse. How to select a proxy network 
  node for a specific traffic sender is deployment specific and beyond the scope of this document. </t>
   
  <t> This document describes the necessity and feasibility to introduce a proxy network node between the congested network node and the traffic 
  sender. Specifically, the problem statement is described in Sections 1 and 3, and the format of the congestion notification message sent from the 
  congested network node to the proxy network node is defined in Section 4, and the solution on how the congested network node knows the address of 
  the proxy node is defined in Section 5. </t>
  
  </section>
  
  <section title="Conventions Used in This Document">
   
    <section title="Abbreviations">
      <t> ABR: Area Border Router</t>
      <t> CNP: Congestion Notification Packet</t>
      <t> DoS: Denial-of-Service</t>
      <t> ECN: Explicit Congestion Notification</t>
      <t> ELC: Entropy Label Capability</t>
      <t> ELCv3: Entropy Label Characteristic</t>
      <t> IBTA: InfiniBand Trade Association</t>
      <t> PNC: Proxy Node Capability</t>
      <t> RDMA: Remote Direct Memory Access</t>
      <t> RoCEv2: RDMA over Converged Ethernet version 2</t>
    </section>
  
    <section title="Requirements Language">  
	  <t> The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", 
	  and   "OPTIONAL" in this document are to be interpreted as described in BCP 14  <xref target="RFC2119"/> <xref target="RFC8174"/> when, 
	  and only when, they appear in all capitals, as shown here.</t>	
    </section>
  
  </section>
  
  <section title="Congestion Notification Mechanisms">

  <t> In the field of congestion control, there are at least three kinds of referenced congestion notification mechanisms. This document introduces the 
  fourth congestion notification mechanism called "Fast Congestion Notification with Proxy".</t>
  
  <t> The first congestion notification mechanism is referred to as classical congestion notification without dedicated packet, as shown in <xref target="Figure_1"/>.</t>
  
  <figure anchor="Figure_1" title="Classical Congestion Notification without Dedicated Packet">
  <artwork align="left"> <![CDATA[
                Congestion Notification by TCP Marking
    |<-------------------------------------------------------+
    |                                                        |
    |                        Congestion Notification by ECN Marking
    |                                          |------------>|
+--------+     +-------+     +-------+     +-------+     +--------+
|Traffic |<===>|Network|<===>|Network|<===>|Network|<===>|Traffic |
|Sender  |     |Node 1 |     |Node 2 |     |Node 3 |     |Receiver|
+--------+     +-------+     +-------+     +-------+     +--------+
                                           Congestion
                                           Point
]]>  </artwork>
  </figure>
  
  <t> With this congestion notification mechanism, the traffic sender indicates that it supports the congestion notification from the traffic receiver 
  by a specific Explicit Congestion Notification (ECN) marking within the IP header of the data packet, and the congested network node (Netwok Node 3 in 
  <xref target="Figure_1"/>) notifies the traffic receiver about the congestion by a specific ECN marking. After receiving a data packet with the specific ECN 
  marking, the traffic receiver would notify congestion to the traffic sender by a specific TCP marking within the TCP header of the data packet. <xref target="RFC3168"/> 
  details how this kind of congestion notification mechanism works. </t>
  
  <t> The second congestion notification mechanism is referred to as classical congestion notification with dedicated packet, as shown in <xref target="Figure_2"/>.</t>
  
  <figure anchor="Figure_2" title="Classical Congestion Notification with Dedicated Packet">
  <artwork align="left"> <![CDATA[
               Congestion Notification Packet Type 1
    |<-------------------------------------------------------+
    |                                                        |
    |                        Congestion Notification by ECN Marking
    |                                          |------------>|
+--------+     +-------+     +-------+     +-------+     +--------+
|Traffic |<===>|Network|<===>|Network|<===>|Network|<===>|Traffic |
|Sender  |     |Node 1 |     |Node 2 |     |Node 3 |     |Receiver|
+--------+     +-------+     +-------+     +-------+     +--------+
                                           Congestion
                                           Point
]]>  </artwork>
  </figure>
  
  <t> With this congestion notification mechanism, the traffic sender indicates that it supports the congestion notification from the traffic receiver 
  by a specific ECN marking within the IP header of the data packet, and the congested network node (Netwok Node 3 in <xref target="Figure_2"/>) notifies the 
  traffic receiver about the congestion by a specific ECN marking. After receiving a data packet with the specific ECN marking, the traffic receiver would notify 
  congestion to the traffic sender by a dedicated congestion notification packet. <xref target="IBTA-SPEC"/> details an example on how this kind of congestion 
  notification mechanism works. </t>
  
  <t> The third congestion notification mechanism is referred to as fast congestion notification without proxy, as shown in <xref target="Figure_3"/>.</t>
  
  <figure anchor="Figure_3" title="Fast Congestion Notification without Proxy">
  <artwork align="left"> <![CDATA[
        Congestion Notification Packet Type 2
    |<-----------------------------------------+
    |                                          |
+--------+     +-------+     +-------+     +-------+     +--------+
|Traffic |<===>|Network|<===>|Network|<===>|Network|<===>|Traffic |
|Sender  |     |Node 1 |     |Node 2 |     |Node 3 |     |Receiver|
+--------+     +-------+     +-------+     +-------+     +--------+
                                           Congestion
                                           Point
]]>  </artwork>
  </figure>
  
  <t> With this congestion notification mechanism, the congested network node (Netwok Node 3 in <xref target="Figure_3"/>) notifies the traffic sender about the 
  congestion directly by a dedicated congestion notification packet. <xref target="I-D.xiao-rtgwg-rocev2-fast-cnp"/> details an example on how this kind of 
  congestion notification mechanism works. </t>
  
  <t> The fourth congestion notification mechanism is referred to as fast congestion notification with proxy, as shown in <xref target="Figure_4"/>.</t>
  
  <figure anchor="Figure_4" title="Fast Congestion Notification with Proxy">
  <artwork align="left"> <![CDATA[
                Congestion Notification Packet Type 3
                    |<--------------------------+
                    |                           |
Congestion Notification Packet Type 4           |
     |<-------------+                           |
     |              |                           |
 +--------+     +-------+     +-------+     +-------+     +--------+
 |Traffic |<===>|Network|<===>|Network|<===>|Network|<===>|Traffic |
 |Sender  |     |Node 1 |     |Node 2 |     |Node 3 |     |Receiver|
 +--------+     +-------+     +-------+     +-------+     +--------+
                Congestion                  Congestion
                Notification                Point
                Proxy
]]>  </artwork>
  </figure>
  
  <t> With this congestion notification mechanism, the congested network node (Netwok Node 3 in <xref target="Figure_4"/>) notifies the proxy network node about the 
  congestion by a dedicated congestion notification packet, and then the proxy network node notifies the traffic sender about the congestion by a congestion notification 
  message supported by the traffic sender. This document details how this kind of congestion notification mechanism works, except that the specific congestion notification 
  message between the proxy network node and the traffic sender is beyond the scope of this document. </t>
  
  </section>
  
  <section title="Congestion Notification to the Proxy Node">
  
  <t> The congestion notification message sent from the congested network node to the proxy network node can be a UDP message or an ICMP message, if a UDP message it's 
  formatted as follows: </t>

  <figure anchor="Figure_5" title="Congestion Notification Message Format">
  <artwork align="left"> <![CDATA[
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|        UDP Source Port        |  UDP Destination Port = TBD1  |
+-------------------------------+-------------------------------+
|           UDP Length          |          UDP Checksum         |
+-------------------------------+-------------------------------+
|                                                               |
~                IP Five-Tuple + Congestion Level               ~
|                                                               |
+---------------------------------------------------------------+
|           As much of the invoking packet as possible          |
+            without the UDP packet exceeding 576 bytes         +
|               in IPv4 or the minimum MTU in IPv6              |
]]>  </artwork>
  </figure>
  
  <t> UDP Header: The UDP header as specified in <xref target="RFC768"/> includes the UDP source port, UDP destination port, UDP length, and UDP checksum. 
  A well-known UDP destination port needs to be allocated for this Congestion Notification Message. </t>
  
  <t> IP Five-Tuple: The IP five-tuple as described in <xref target="RFC6438"/> includes the source IP address, destination IP address, protocol number, 
  source port number, and destination port number. The IP five-tuple is copied from the data packet causing congestion, and it's used to identify a flow 
  for which the transmission rate needs to be reduced by the traffic sender. When the congested network node is a VPN P router, the IP five-tuple is carried 
  below the VPN encapsulation. </t>
  
  <t> Congestion Level: This 3-bit field indicates the congestion level. Value 0 of this field represents the lowest congestion level and value 7 of this 
  field represents the highest congestion level. </t>
  
  </section>
  
  <section title="Advertising Proxy Node Capability Using IGP/BGP">

  <t> Before the congested network node can send the congestion notification message to the proxy network node, the congested network node has to know 
  about the IP address of the proxy network node. The proxy network node can notify the congested network node of its IP address by advertising its 
  proxy capability in advance. </t>

  <t> Even though the Proxy Node Capability (PNC) is a property of the node, in some cases it is advantageous to associate and advertise the PNC with a 
  prefix. When PNC is advertised with a prefix, that means the congested network node should send the congestion notification packet to the proxy network 
  node but not the traffic sender associated with that prefix. </t>

  <section title="Advertising Proxy Node Capability Using IS-IS">
  
  <t> Analogous to the Entropy Label Capability (ELC) Flag (E-flag) defined in Section 3 of <xref target="RFC9088"/>, a new bit PNC Flag (P-flag) is defined, 
  which is Bit 7 in the Prefix Attribute Flags <xref target="RFC7794"/>, as shown in <xref target="Figure_6"/>. </t>
  
  <figure anchor="Figure_6" title="IS-IS Prefix Attribute Flags">
  <artwork align="center"> <![CDATA[
   0 1 2 3 4 5 6 7...
  +-+-+-+-+-+-+-+-+...
  |X|R|N|E|A|U|U|P|...
  | | | | | | |P| |...
  +-+-+-+-+-+-+-+-+...
]]>  </artwork>
  </figure>
  
     <t> P-Flag:  PNC Flag (Bit 7)
		   <list>
		   <t> Set for the local host prefix of the originating node if it's used as a congestion notification proxy node for the prefix.</t>
		   </list>
	 </t>
  
  <t> The PNC signaling MUST be preserved when a router propagates a prefix between ISIS levels <xref target="RFC5302"/>. </t>  
  
  </section>
  
  <section title="Advertising Proxy Node Capability Using OSPFv2">
	
  <t> Analogous to the ELC Flag (E-flag) defined in Section 3.1 of <xref target="RFC9089"/>, a new bit PNC Flag (P-flag) 
  is defined, which is Bit 2 in OSPFv2 Prefix Attribute Flags field <xref target="RFC9792"/>, as shown in <xref target="Figure_7"/>. </t>
  
  <figure anchor="Figure_7" title="OSPFv2 Prefix Attribute Flags">
  <artwork align="center"> <![CDATA[
   0 1 2 3 4...
  +-+-+-+-+-+...
  |U|U|P| | |...
  | |P| | | |...
  +-+-+-+-+-+...
]]>  </artwork>
  </figure>
  
     <t> P-Flag:  PNC Flag (Bit 2)
		   <list>
		   <t> Set for the local host prefix of the originating node if it's used as a congestion notification proxy node for the prefix.</t>
		   </list>
	 </t>
  
  <t> The PNC signaling MUST be preserved when an OSPFv2 Area Border Router (ABR) distributes information between areas. To do so, an 
  ABR MUST originate an OSPFv2 Extended Prefix Opaque LSA <xref target="RFC7684"/> including the received PNC setting.</t>
  
  </section>
  
  <section title="Advertising Proxy Node Capability Using OSPFv3">
	
  <t> Analogous to the ELC Flag (E-flag) defined in Section 3.2 of <xref target="RFC9089"/>, a new bit PNC Flag (P-flag) 
  is defined, which is Bit 2 in OSPFv3 Prefix Attribute Flags field <xref target="RFC9792"/>, as shown in <xref target="Figure_9"/>. </t>
    
  <figure anchor="Figure_9" title="OSPFv3 Prefix Attribute Flags">
  <artwork align="center"> <![CDATA[
   0 1 2 3 4...
  +-+-+-+-+-+...
  |U|U|P| | |...
  | |P| | | |...
  +-+-+-+-+-+...
]]>  </artwork>
  </figure>
  
     <t> P-Flag:  PNC Flag (Bit 2)
		   <list>
		   <t> Set for the local host prefix of the originating node if it's used as a congestion notification proxy node for the prefix.</t>
		   </list>
	 </t>
  
  <t> The PNC signaling MUST be preserved when an OSPFv3 Area Border Router (ABR) distributes information between areas. The setting 
  of the PNC Flag in the Inter-Area-Prefix-LSA <xref target="RFC5340"/> or in the Inter-Area-Prefix TLV <xref target="RFC8362"/>, 
  generated by an ABR, MUST be the same as the value the PNC Flag associated with the prefix in the source area.</t>
  
  </section>

  <section title="Advertising Proxy Node Capability Using BGP">

  <t> Analogous to the Entropy Label Characteristic (ELCv3) TLV defined in Section 3.1 of <xref target="I-D.ietf-idr-entropy-label"/>, a new PNC characteristic 
  TLV is defined, which uses code value TBD2 in "BGP Next Hop Dependent Characteristic Codes" registry requested by <xref target="I-D.ietf-idr-entropy-label"/>, 
  as shown in <xref target="Figure_10"/>. </t>
    
  <figure anchor="Figure_10" title="BGP Next Hop Dependent Characteristic PNC TLV Format">
  <artwork align="center"> <![CDATA[
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   Characteristic Code = TBD2  |   Characteristic Length = 0   |
+-------------------------------+-------------------------------+
]]>  </artwork>
  </figure>
  
     <t> PNC TLV:  code TBD2, length 0, and carries no value
		   <list>
		   <t> Carried for the local host prefix of the originating node if it's used as a congestion notification proxy node for the prefix.</t>
		   </list>
	 </t>
  
  </section>
  
  </section>
  
  <section title="Security Considerations">
  
  <t> The congestion notification from congested network node to the proxy network node MUST be applied in a specific controlled domain. A limited 
  administrative domain provides the network administrator with the means to select, monitor, and control the access to the network, making it a 
  trusted domain.</t>
   
  <t> To avoid potential Denial-of-Service (DoS) attacks, it is RECOMMENDED that implementations apply rate-limiting policies when generating and 
  receiving congestion notification messages.</t>
  
  <t> A deployment MUST ensure that border-filtering drops inbound congestion notification message from outside of the domain and that drops outbound 
  congestion notification message leaving the domain.</t>
  
  <t> A deployment MUST support the configuration option to enable or disable the congestion notification proxy feature defined in this document. By 
  default, the congestion notification proxy feature MUST be disabled.</t>
  
  </section>
  
  <section title="IANA Considerations"> 
  
  <t> This document requests the following allocations from IANA:
     <list>	 
	 <t> - A well-known UDP port number TBD1 in the "Service Name and Transport Protocol Port Number" registry is requested to be assigned to the 
	 Congestion Notification Message.</t>
	 <t> - Bit 7 in the "IS-IS Bit Values for Prefix Attribute Flags Sub-TLV" registry is requested to be assigned to the PNC Flag (P-Flag).</t>
	 <t> - Bit 2 in the "OSPFv2 Prefix Attribute Flags" registry is requested to be assigned to the PNC Flag (P-Flag).</t>
	 <t> - Bit 2 in the "OSPFv3 Prefix Attribute Flags" registry is requested to be assigned to the PNC Flag (P-Flag).</t>
	 <t> - Code value TBD2 in the "BGP Next Hop Dependent Characteristic Codes" registry is requested to be assigned to the PNC characteristic TLV.</t>
	 </list>
  </t>
  
  </section>

  <section title="Acknowledgements">
  <t> The author would like to acknowledge Jinghai Yu and Shaofu Peng for the very helpful discussion.</t>
  </section>  
  
</middle>
  
<back>
    <references title="Normative References">
     <?rfc include="reference.RFC.2119"?>
     <?rfc include="reference.RFC.8174"?>
     <?rfc include="reference.RFC.768"?>
     <?rfc include="reference.RFC.7794"?>
     <?rfc include="reference.RFC.5302"?>
     <?rfc include="reference.RFC.9792"?>
     <?rfc include="reference.RFC.7684"?>
     <?rfc include="reference.RFC.5340"?>
     <?rfc include="reference.RFC.8362"?>
     <?rfc include="reference.I-D.ietf-idr-entropy-label"?>
    </references>
	
    <references title="Informative References">
     <?rfc include="reference.RFC.6438"?>
     <?rfc include="reference.RFC.9088"?>
     <?rfc include="reference.RFC.9089"?>
     <?rfc include="reference.RFC.3168"?>
     <?rfc include="reference.I-D.xiao-rtgwg-rocev2-fast-cnp"?>
     <reference anchor="IBTA-SPEC" target="https://www.infinibandta.org/ibta-specification">
        <front>
            <title>InfiniBand Architecture Specification Volume 1, Release 1.8</title>
            <author>
              <organization>InfiniBand Trade Association</organization>
            </author>
            <date month="July" year="2024"/>
        </front>
     </reference>
    </references>	
</back>

</rfc>
