<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<!-- generated by https://github.com/cabo/kramdown-rfc version 1.7.8 (Ruby 3.0.2) -->
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902" docName="draft-lyu-rtgwg-coordinated-cm-01" category="std" consensus="true" submissionType="IETF" xml:lang="en" version="3">
  <!-- xml2rfc v2v3 conversion 3.21.0 -->
  <front>
    <title abbrev="CCM">Coordinated Congestion Management</title>
    <seriesInfo name="Internet-Draft" value="draft-lyu-rtgwg-coordinated-cm-01"/>
    <author initials="Y." surname="Lyu" fullname="Yunping(Lily) Lyu">
      <organization>Huawei</organization>
      <address>
        <email>lvyunping@huawei.com</email>
      </address>
    </author>
    <author initials="Y." surname="Zhang" fullname="Yuhan Zhang">
      <organization>Huawei</organization>
      <address>
        <email>zhangyuhan6@huawei.com</email>
      </address>
    </author>
    <author initials="M." surname="Liu" fullname="Mengzhu Liu">
      <organization>Huawei</organization>
      <address>
        <email>liumengzhu@huawei.com</email>
      </address>
    </author>
    <date year="2024" month="April" day="19"/>
    <area>Routing</area>
    <workgroup>RTGWG</workgroup>
    <keyword>adaptive routing</keyword>
    <keyword>congestion control</keyword>
    <abstract>
      <?line 67?>

<t>AI fabric is sensitive to bandwidth. Congestion management, including congestion control and load balancing, is a main method to fully utilize network resource. However, current congestion management mechanisms are not coordinated, which lead to throughput decreasing.  This document provides a scheme to coordinate different congestion management mechanisms. It describes the design principle, behaviors of network switches and hosts in the scheme, and gives an example to show end-to-end procedure.</t>
    </abstract>
  </front>
  <middle>
    <?line 71?>

<section anchor="intro">
      <name>Introduction</name>
      <t>ML/AI has been progressing rapidly over the last decade. ChatGpt is a milestone of generative AI. It ignites industry's enthusiasm of AI large models. A single AI accelerator or a single server with multiple AI accelerator is not capable to train the large models, due to lack of memory and lack of compute power. So it is imperative to employ distributed system with parallel processing to train those models.</t>
      <t>AI training is bandwidth sensitive. Taking data pralleslism and MOE which are commonly used prallel processing in AI training as example, the required bandwidth is GB level. That brings a big challenge to AI fabric. Increasing link speed is an important approach, from 400Gbps to 800Gbps, or even 1.6Tbps in future. What's more, how to effectively use the bandwidth also becomes a critical issue. It is expected to fully utilize the link bandwidth to achieve high throughput. Network congestion is a major problem which deteriorate the performance. Thus, congestion management is always applied in the network to alleviate congestion. Usually, congestion managment includes congestion control and load balancing. But today, congestion control and load balancing work independently, without any coordination.</t>
      <t>This document discusses the uncoordinated mechanisms in current congestion management.
That leads to throughput issues which are particularly harmful in AI fabric. A scheme for coordinating different congestion management mechanisms is proposed in this document, which can be effectively and widely deployed in AI fabric.</t>
    </section>
    <section anchor="terminology">
      <name>Terminology</name>
      <ul spacing="normal">
        <li>
          <t>ML: Machine Learning</t>
        </li>
        <li>
          <t>AI: Artificial Intelligence</t>
        </li>
        <li>
          <t>ECN: Explicit Congestion Notification</t>
        </li>
        <li>
          <t>AR: Adaptive Routing</t>
        </li>
        <li>
          <t>DCQCN: Data center QCN <xref target="DCQCN"/></t>
        </li>
        <li>
          <t>CNP: Congestion Notification Packet</t>
        </li>
        <li>
          <t>PLB: Protective Load Balancing <xref target="PLB"/></t>
        </li>
        <li>
          <t>CC: Congestion Control</t>
        </li>
        <li>
          <t>ECMP: Equal-cost multi-path routing</t>
        </li>
      </ul>
    </section>
    <section anchor="requirements-language">
      <name>Requirements Language</name>
      <t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>", "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL
NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>", "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
"<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to be interpreted as
described in BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they
appear in all capitals, as shown here.</t>
      <?line -18?>

</section>
    <section anchor="background">
      <name>Existing congestion management</name>
      <t>Congestion managment includes congestion control and load balancing. PFC like flow control is not discussed in this document. It is useful as the last gate to prevent packet loss. We do not count it as a part of congestion management.</t>
      <ul spacing="normal">
        <li>
          <t>There are many congestion control mechanisms, such as DCQCN <xref target="DCQCN"/>, Timely <xref target="Timely"/>. Although they have differnt procedure, using different algorithms, the purpose is to control the sending rate at the source. Basically, congestion control identifies network congestion by network status, like queue length of switch port, end-to-end delay RTT, etc., then adjust the sending rate at the sender to alleviate congestion. How to quickly flatten down the rate curve to avoid packet loss and how to recover the rate for less throughput reduction are essential to congestion control mechanism.</t>
        </li>
        <li>
          <t>From another aspect, load balancing alleviate congestion by adjusting forwarding paths for traffic. ECMP is one way of load balancing. It hashes each flow on a specific path by 5-tuple of the flow. This does not work well for AI workload. Because AI has a few number of flows, and most of the flows are with big size. ECMP cannot distribute the traffic evenly on the network. So adpative routing is perferred. Adpative routing indicates to changes the path for a single flow according to network status. For example, originally, flow 1 uses path 1 for forwarding. When network switch detects the path is becoming heavy-loaded, it selects another light-loaded path, path 2, for the following packets in the flow. The path status could be indicated by local link status, and/or downstream link status etc. And how to judge if the path is heavy-loaded, that could be implementation dependently. Adaptive routing can select path for each packet, thus using network resource in a most efficient way. But avoding uncessary path swithcing is critical, because each path switching may increase the systeme complexity, like re-ordering. Another load balancing mechanism is packet spray. Source host or network switch evenly distributes packets on each path. The distribution does not consider actual path status. Compared with adaptive routing, it is easier for implementation, but it is not the most optimized way. In this document, we focus on adaptive routing. And the scheme proposed is also applicable for packet spray.</t>
        </li>
      </ul>
      <t>Currently, congestion control and adaptive routing work independently, without coordination. That results in negative impact on system performance. For example, when congestion caused by imbalanced load on network occurs on a switch, both DCQCN and adaptive routing are activated. ECN in data packets is marked, causing the CNP to be sent back to sender. Thus, sender slows down the sending rate of the congested flow. Meanwhile, the switch changes the path for packets of the congested flow, traversing the new incoming packets to a light-loaded path. The result is that the congested flow is forwarded on the light-loaded path at a low rate. Then, DCQCN needs some  time to recover the sending rate at the new path. It reduces effective bandwidth and seriously impact computation efficiency in AI training. Another example, if the congestion is caused by in-cast traffic, congestion control should be enough. Additional adaptive routing adjustments not only fail to mitigate congestion, but may also introduce more out-of-order packets.</t>
      <t>The fact is that current congestion management does not distinguish the cause of congestion, but triggering the mechanmis when congestion is detected. That brings trouble. In principle, in-cast congestion cannot be migigated by load balancing, and reducing flow rate by congestion control for imbalanced congestion (in-network congestion) decreases network efficiency.</t>
    </section>
    <section anchor="principle">
      <name>Design principle of coordinated congestion management</name>
      <t>Coordinated congestion management is designed to coordinate congestion control and adaptive routing. Design principle is shown as below.</t>
      <ul spacing="normal">
        <li>
          <t>Avoid unnecessary sending rate reduction  <br/>
AI fabric is bandwidth sensitive. High throughput is extremely important. Multipath is needed to make full use of network bandwidth. Slowing down the sending rate while there are still available paths for traffic will be a waste of network resource, thereby increasing communication time in AI cluster and reducing AI training performance.</t>
        </li>
        <li>
          <t>Fully use multipath while reducing invalid path switching <br/>
While searching for light-loaded paths for load balancing, new paths should be located quickly and accurately. The new path should not be restricted to local paths but extends the search to available paths upstream. Invalid path switching should be avoided. Invalid path switching includes  switching in-cast traffic as no matter how to switch the traffic path, it will final get congested on the last hop.</t>
        </li>
        <li>
          <t>Reuse current CC algorithm and AR algorithm  <br/>
There are already a variety of CC algorithm and AR algorithms. Those can still be used in the congestion management coordination scheme. The scheme enables CC and AR be triggered coordinately, adjusting sending rate or switching path depending on different reasons of congestion.</t>
        </li>
        <li>
          <t>Applicable to various topologies <br/>
Most AI fabrics use CLOS or FATTREE topologies, but there are also new studies considering the use of direct topologies, such as torus, dragonfly, dragonfly+. Some of existing solutions for CC and AR coordination, e.g PLB <xref target="PLB"/>, relies on ECMP which can only be used in topologies with equal cost paths like CLOS. For those topologies without equal cost paths, like dragonfly+, such solutions do not work. The coordination scheme should be applicable to different topologies.</t>
        </li>
      </ul>
    </section>
    <section anchor="scheme">
      <name>Coordinated congestion management scheme</name>
      <t>The key to the coordinated congestion management is to identify CC traffic and non-CC traffic, thereby they are treated differently in network when congestion occurs. CC traffic is those packets which cause in-cast congestion. Non-CC traffic is the rest packets in network.</t>
      <t>CC traffic recognized by network is notified to the source host. The subsequent packets of the same flow are tagged by the source host. This indicates  the network switch to perform CC mechanism on those packets instead of AR. For non-CC traffic, the network switch first performs AR. Only when AR mechansim cannot find light-loaded path for switching, the traffic turns to be CC traffic and CC will be run to alleviate congestion.</t>
      <t>Coordinated congestion management requires interaction between network switches and source hosts. The following sections explain the detail of the scheme.</t>
      <section anchor="coordination-tag">
        <name>Coordination tag</name>
        <t>Coordination tag is inserted into data packets by source host when it sends out the packets. The tag contains CC indicator and AR indicator.</t>
        <ul spacing="normal">
          <li>
            <t>CC indicator: indicates if the packet may cause in-cast congestion.</t>
          </li>
          <li>
            <t>AR indicator: indicates the location of upstream AR point where adaptive routing can be performed. The AR point can be a network switch or a source host. AR indicator can be an ID, an IP address or other information which guides how to send a message to the AR point.</t>
          </li>
        </ul>
        <t>The tag can use in-band telemetry scheme to carry in data packet. A new method CSIG <xref target="I-D.draft-ravi-ippm-csig"/> may provide another possibility.</t>
      </section>
      <section anchor="notification-message">
        <name>Notification message</name>
        <t>There are 3 types of notification.</t>
        <ul spacing="normal">
          <li>
            <t>Type 1:  congestion control required <br/>
Example: Type 1 message  is sent from incast congetion switch to source host, notifying the source host to tag (set CC indicator) the packets belonging to the flow which causes in-cast congestion.</t>
          </li>
          <li>
            <t>Type 2: congestion control released   <br/>
Example: When incast congestion is eliminated, the switch sends type 2 message to corresponding hosts, notfifying the source hosts to untag CC indicator in the subsequent packets of the corresponding flow.</t>
          </li>
          <li>
            <t>Type 3: upstream AR required    <br/>
Example: If the switch determins to perform AR upstream,  type 3 message is sent to the upstream AR point. The upstream AR point can be one-hop neighbour of the switch or a point multi-hop away.</t>
          </li>
        </ul>
        <t>The notification message includes source IP, destination IP, notification type and flow key. Source IP is the ip address of the switch which sends the notification. Destination IP is the ip address of the destination which will handle the notification message. Notification type is one of the above 3 types. Flow key is the information of the flow to be handled, such as 5-tuple information.</t>
      </section>
      <section anchor="behavior-of-network-switches">
        <name>Behavior of network switches</name>
        <section anchor="identify-congestion-type">
          <name>Identify congestion type</name>
          <t>When congestion is detected, network switch judge whether it is in-cast congestion.</t>
          <t>If congestion occurs at the switch egress port, and the switch is the last-hop switch to destination host, it is determined that the congestion is incast congestion. The flows causing incast congestion are identified as incast flow.</t>
          <t>There may have other methods to identify congestion type. This document does not make limitation on that.</t>
        </section>
        <section anchor="notify-cc-congestion">
          <name>Notify CC congestion</name>
          <t>When in-cast congestion is determined by the network switch, it generates type 1 notification messages for each identified flow, and sends the notification messages to source hosts of the flows. When CC congestion is eliminated, the switch sends type 2 notification messages to the source hosts.</t>
        </section>
        <section anchor="notify-upstream-point-to-perform-ar">
          <name>Notify upstream point to perform AR</name>
          <t>When it is determined to perform AR, but network switch cannot do it locally and AR indicator in the data packet shows availability to do AR upstream, a type 3 notification message is sent to upstream point according to AR indicator.</t>
        </section>
        <section anchor="perform-congestion-control">
          <name>Perform congestion control</name>
          <t>Network switch performs congestion control in below cases.</t>
          <ul spacing="normal">
            <li>
              <t>It is identified as in-cast congestion.</t>
            </li>
            <li>
              <t>It is not identified as in-cast congestion, but adaptive routing cannot be used because there is no available new path for traffic switching either locally or upstream.</t>
            </li>
          </ul>
          <t>This document does not limit which CC mechanism is performed.</t>
        </section>
        <section anchor="perform-adaptive-routing">
          <name>Perform adaptive routing</name>
          <t>Network switch performs adaptive routing in below cases.</t>
          <ul spacing="normal">
            <li>
              <t>The packet is not in-cast traffic. CC indicator in data packet is used to determine if it is in-cast traffic.</t>
            </li>
            <li>
              <t>Type 3 notification message is received. According to flow information in the notification, new path is selected for the subsequent packets of the flow.</t>
            </li>
          </ul>
          <t>In order to enable upstream AR, it is required to update AR indicator in data packets hop by hop. When a data packet arrives at the network switches,</t>
          <ul spacing="normal">
            <li>
              <t>if there are several local light-loaded paths available for AR on the switch, the switch updates AR indicator in the data packet to itself, such as its own ID. Then the switch selects the appropriate local path to send the data packet. This document does not define algorithm of local path selection.  It depends on routing strategy on the network switch.</t>
            </li>
            <li>
              <t>If there is only one local light-loaded path available for AR, network switch can only select that path for traffic.  AR indicator in the data packet will not be updated.</t>
            </li>
            <li>
              <t>If there is no local light-loaded path, network switch gets upstream AR availability by reading AR indicator in the data packet. If AR indicator indicates upstream point can perform AR, network switch generates type 3 notification message and sends it directly to the corresponding upstream point.  Otherwise, network switch triggers congestion control mechanism, such as set ECN in data packet.</t>
            </li>
          </ul>
        </section>
      </section>
      <section anchor="behavior-of-source-hosts">
        <name>Behavior of source hosts</name>
        <t>When receiving type 1 notification message, source host sets CC indicator of the subsequent packets for the corresponding flow.</t>
        <t>When receiving type 2 notificiation message, source host unset CC indicator of the subsequent packets for the corresponding flow.</t>
        <t>When receiving type 3 notification message, source host performs AR on the subsequent packets for the corresponding flow.</t>
        <t>When receiving congestion control signals and the CC indicator is set, source host performs CC on the flow.</t>
      </section>
    </section>
    <section anchor="example">
      <name>An example of end-to-end procedure</name>
      <t>Network topology is shown in  <xref target="ref-to-fig"/>.    This is a 4 layer fattree topology. There are n computing racks and m switching racks.  Computing racks have source hosts,  layer 1 switches and layer 2 switches. Swithcing racks contain layer 3 and layer 4 switches.</t>
      <figure anchor="ref-to-fig">
        <name>Network Topology</name>
        <artwork><![CDATA[
      Switching Rack 1    Switching Rack m
      +---------------+   +---------------+
      |L4-1-1...L4-1-e|   |L4-m-1...L4-m-e|
      |  | \    / |   |   |  | \    / |   |
      |  |  \  /  |   |   |  |  \  /  |   |
      |  |   \/   |   |   |  |   \/   |   |
      |  |   /\   |   |...|  |   /\   |   |
      |  |  /  \  |   |   |  |  /  \  |   |
      |  | /    \ |   |   |  | /    \ |   |
      |L3-1-1...L3-1-d|   |L3-m-1...L3-m-d|
      +--+-----------\    +-/----------+--+
         |            \    /           |
         |             \  /            |
         |  ......      \/     ......  |
         |              /\             |
         |             /  \            |
         |            /    \           |
      +--+-----------/      \----------+---+
      |L2-1-1...L1-1-c|    |L2-n-1...L2-n-c|
      |  | \    / |   |    |  | \    / |   |
      |  |  \  /  |   |    |  |  \  /  |   |
      |  |   \/   |   |    |  |   \/   |   |
      |  |   /\   |   |... |  |   /\   |   |
      |  |  /  \  |   |    |  |  /  \  |   |
      |  | /    \ |   |    |  | /    \ |   |
      |L1-1-1...L1-1-b|    |L1-n-1...L1-n-b|
      |  +        +   |    |  +        +   |
      | H-1-1... H-1-a|    | H-n-1... H-n-a|
      +---------------+    +---------------+
      Computing Rack 1     Computing Rack n

]]></artwork>
      </figure>
      <ul spacing="normal">
        <li>
          <t>Host H-1-1 in computing rack 1sends out a data packet P1 belonging to flow F1 to H-n-1 in computing rack n. The value of CC indicator in the packet tag is not set indicating this packet is in a non-incast flow. The AR indicator in the packet tag does not point to any available AR point.</t>
        </li>
        <li>
          <t>P1 arrives at switch L1-1-1 in computing rack 1. L1-1-1 has multiple light-loaded paths for AR. Path from L1-1-1 to L2-1-1 is selected for P1. AR indicator in P1 tag is updated to L1-1-1.</t>
        </li>
        <li>
          <t>P1 arrives at switch L2-1-1. L2-1-1 also has multiple light-loaded paths for AR. Path from L2-1-1 to L3-1-1 is selected for P1. AR indicator in P1 tag is updated to L2-1-1.</t>
        </li>
        <li>
          <t>P1 arrives at switch L3-1-1. L3-1-1 only has one light-loaded paths. The only path from L3-1-1 to L4-1-1 is selected for P1. AR indicator in P1 tag keeps to be L2-1-1.</t>
        </li>
        <li>
          <t>P1 arrives at switch L4-1-1. L4-1-1 is congested and no local path available for performing AR. By reading AR indicator in P1, L4-1-1 sends an type 3 notification to L2-1.</t>
        </li>
        <li>
          <t>After receiving AR notification,  L2-1-1 switches path from L2-1-1-&gt;L3-1-1 to L2-1-1-&gt;L3-m-1 for the new incoming packets of flow F1.</t>
        </li>
        <li>
          <t>After a while, L1-n-1 is congested due to incast. The flow F1 is identified as incast flow. L1-n-1 sends type 1 notification to H-1-1.</t>
        </li>
        <li>
          <t>By receiving the type 1notification, H-1-1 sets CC indicator of the subsequent packets of F1  indicating the packets are in a incast flow. Thus those packets will not be performed AR.  Sending rate of F1 will also be reduced according to congestion control algorithm.</t>
        </li>
      </ul>
    </section>
    <section anchor="security-considerations">
      <name>Security Considerations</name>
      <t>TBD.</t>
    </section>
    <section anchor="iana-considerations">
      <name>IANA Considerations</name>
      <t>TBD.</t>
    </section>
  </middle>
  <back>
    <references>
      <name>References</name>
      <references anchor="sec-normative-references">
        <name>Normative References</name>
        <reference anchor="RFC2119">
          <front>
            <title>Key words for use in RFCs to Indicate Requirement Levels</title>
            <author fullname="S. Bradner" initials="S." surname="Bradner"/>
            <date month="March" year="1997"/>
            <abstract>
              <t>In many standards track documents several words are used to signify the requirements in the specification. These words are often capitalized. This document defines these words as they should be interpreted in IETF documents. This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.</t>
            </abstract>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="2119"/>
          <seriesInfo name="DOI" value="10.17487/RFC2119"/>
        </reference>
        <reference anchor="RFC8174">
          <front>
            <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
            <author fullname="B. Leiba" initials="B." surname="Leiba"/>
            <date month="May" year="2017"/>
            <abstract>
              <t>RFC 2119 specifies common key words that may be used in protocol specifications. This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings.</t>
            </abstract>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="8174"/>
          <seriesInfo name="DOI" value="10.17487/RFC8174"/>
        </reference>
      </references>
      <references anchor="sec-informative-references">
        <name>Informative References</name>
        <reference anchor="I-D.draft-ravi-ippm-csig">
          <front>
            <title>Congestion Signaling (CSIG)</title>
            <author fullname="Abhiram Ravi" initials="A." surname="Ravi">
              <organization>Google LLC</organization>
            </author>
            <author fullname="Nandita Dukkipati" initials="N." surname="Dukkipati">
              <organization>Google LLC</organization>
            </author>
            <author fullname="Naoshad Mehta" initials="N." surname="Mehta">
              <organization>Google LLC</organization>
            </author>
            <author fullname="Jai Kumar" initials="J." surname="Kumar">
              <organization>Broadcom Inc.</organization>
            </author>
            <date day="2" month="February" year="2024"/>
            <abstract>
              <t>   This document presents Congestion Signaling (CSIG), an in-band
   network telemetry protocol that allows end-hosts to obtain visibility
   into fine-grained network signals for congestion control, traffic
   management, and network debuggability in the network.  CSIG provides
   a simple, low-overhead, and extensible packet header mechanism to
   obtain fixed-length summaries from bottleneck devices along a packet
   path.  This summarized information is collected over L2 CSIG-tags in
   a compare-and-replace manner across network devices along the path.
   Receivers can reflect this information back to senders via L4+ CSIG
   reflection headers.

   CSIG builds upon the successful aspects of prior work such as switch
   in-band network telemetry (INT) that incorporates multibit signals in
   live data packets.  At the same time, CSIG's end-to-end mechanism for
   carrying the signals via fixed size header is simple, practical and
   deployable akin to Explicit Congestion Notification (ECN).

   In addition to a detailed description of the end-to-end protocol,
   this document also motivates the use cases for CSIG and the rationale
   for design choices made in CSIG.  It describes a set of signals of
   interest to applications (minimum available bandwidth, maximum link
   utilization, and maximum hop delay), methods to compute these signals
   in network devices, and how these signals can be leveraged in
   applications.  Additionally, it describes how attributes about the
   bottleneck's location can be carried and made useful to applications.
   It also provides the framework to incorporate future signals.
   Finally, this document addresses incremental deployment, backward
   compatibility and nuances of CSIG's applicability in a range of
   scenarios.

              </t>
            </abstract>
          </front>
          <seriesInfo name="Internet-Draft" value="draft-ravi-ippm-csig-01"/>
        </reference>
        <reference anchor="DCQCN" target="https://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p523.pdf">
          <front>
            <title>Congestion Control for Large-Scale RDMA Deployments</title>
            <author>
              <organization/>
            </author>
            <date year="2015" month="August"/>
          </front>
        </reference>
        <reference anchor="Timely" target="https://conferences.sigcomm.org/sigcomm/2015/pdf/papers/p537.pdf">
          <front>
            <title>TIMELY: RTT-based Congestion Control for the Datacenter</title>
            <author>
              <organization/>
            </author>
            <date year="2015" month="August"/>
          </front>
        </reference>
        <reference anchor="PLB" target="https://dl.acm.org/doi/pdf/10.1145/3544216.3544226">
          <front>
            <title>PLB: Congestion Signals are Simple and Effective for Network Load Balancing</title>
            <author>
              <organization/>
            </author>
            <date year="2022" month="August"/>
          </front>
        </reference>
      </references>
    </references>
    <?line 285?>



  </back>
  <!-- ##markdown-source:
H4sIAAAAAAAAA7Vc/3IbN5L+n0+Bs/+4TUJSpqxkE9Xe7smyHKtKsrW2Uqnc
5eoKnAHJiecHdzAjhbG9z3LPck92X3cDMwBnKDt7dy5XmcRggEaj++uvG6Bn
s9nENrpM/1PnVWlOVVO3ZpLoxqyreneqbJNObLssMmuzqrzdbdHl8uL2xSTb
1tzZNsdPnnz35HiS63J9qkw5mTRZk6PbeVXVaVZiqBSfy7WxDYZQ17rUa1OY
spno5bI2d+h5fj1Jq6TUBV5La71qZvmundXN+n49S/phZkkxe7KYpPh8qo6f
HJ/MnpzMFt9NqqWtctMYezppt/QUHx6rrtvx7An9VbMZt6nMqlWW5xArK5Vu
m6rQTZboPN+p5U79WuTH9SpR2UqVVaPW2R2tSddGn6o3Vdtk5XpyX9Xv1nXV
btF0+/2P308m7+5PJ0rNlE71tsErqnZdqTHpV4+PTV3lGLBtNlWNl2bokpX2
VP00V1e7Ft9EDz+15RYD/OEqy3dfuCdVvdZl9pumoU7Vy1bfmwzNptBZfqry
u528868bfjJPqgJPH0MW2g+TZk1Vc4NtamMamgM9G51xY5I12PBXuvxF5EZL
lZIGF0/wxzW0EB+9zjfYkVj2f9tofs1Lj69d28Ny/0a9dvTCN5Hk3ejX0EzW
a+balOvfNq1r+4ROsraQ7uHQk7KqadPvDO3a4zcvzr/97ttvTieTrFx1T5Si
Z5ez5/PMNKuZ3dbQy2y5Sk99e5Pp0rfb+u6bWWq2ebUj24bZljZLTS1y4Q16
QWy71nfZLNtui1liszU9e37+1/NX9EEp5zyPAo85F5tREE1d6XptZm9hrUa9
eX59pp53U9pHPIA3+8XXsyffypD0DnZ70zRbe3p0BNFWpjZlYuwcAkAhxRxa
PHKfj+jdo226Otrqrant0fbr46dzfFeskdusMPkuFvb28vri6idyhtvZUtvY
4UPxm41Rz2FxCeQ19f+fwE//SAJjsJurZ7Go1BBK9zZblzq3Cg6Oz8UWegUa
qovVyiTsyCT2K9OQy6urSqfqmQbUJdhzEd85suIvkMp9CtFnfFVpPteJLCSt
MpZ/8WS+WJx8ffT065OT48U3c/73+BtS/GQG+NJLeK5Omsnk7FKt9LLOEkIz
a2BrLGxTqSXEv8/SZjMPl1l0qDuFTyV5C0hdjwATLz6nZS79Mqc0hcYIQMvC
YLEpTbNqCS+BcXn2m1GlU1BtbNXWiZmrl9W9uTP1VCVtja1rwrl6YTBgAr/P
bCE7QIgbAP5U3W+yZKNyo3nSZgNYXW+2baNSkwCSLeSbwyY3EBERpOUxt3V1
B9cjoW2ywTz0aj+qSrMVm9OnZZqrS5rJJnW2xHhkvfgGi1Hk9EkGa5mqpdnA
oavaqmrVKcLeZw3mtqzPTWUbS8GGBhCRpvyAggt1UeZXzaYHQe2mukcYTWdN
NcM/tJrEpG1t5mIDRZamuZkgwF3SjqVtwvK/f5zR14+TyfXVEaxjoy0kMyRp
tca2kKZUrbdZim2rsDMsTK4tq1Kn2LLzjW6+3zZuu7McmgEloFWtTclIBgs7
u2SlQAcZ4iwWlYIC1Lt/tpC52bQ207agVyBCTgavCgSRHJo8UyRCTiMonSQm
pxHhW/ir/SNrapIMutuoos0bUvB+f0jHVqK3eikag0s43YYzTlXa8uNcJ+9I
osIUoDRi4K4J6AFbMmoLY63n6m2lMl4+cMCvFwOYgiAWZoOFZsuW+Izd2cYU
IuhW1+AOJpedEkUHUlW21wH7LT+gTpioc9beh+fqVr+jx0AQjTFpbJvDGlny
69cXzifIXQj9qpL8kDB3O5QDEoQzwiacpU1ZX7X5W5vVJg3kgFDfP4PD3Zkc
ksAi1JLiG5nEMgNibGgOeA0tsQMhmETp3REht4T9bw3xKzZuaLOqwTEbpbeQ
TSebqVrVVaFOnjz5frm1NNS38nFK5oC5S7WYf3NLz7CCVduQ+asfIQ3sDLsI
8clLaG88TosSeFX9YgDsAEQDNTEawI2Z6EEu2xqxY9LIFkOYEVRjk6LV9COi
D+TPIKLaZOtNgEjzLkoEqOKQ8xesCiuHvRZu91LQ1RqgQXhE08DemHmUhJ23
8KPpAXCiEfN7vbOkzDwTEksjeOQhCbFFdxkN3Y8xVz/YljjucGAZl4MC1PRZ
MWGungGDmyrV8XiH31AsHODCbIFqmJEkIf8BT0b3XQ/QLCxFvBjV4X5Ja62D
4bYMwkQYRaCNByPOfMJGTRHF7oUUtgobuBc8G/bSAlVgFBtdFzAP51Le7s98
iCGa0C+B3PezowztKaxjW1m/m8G6fQhM4EhLE9k7KRlWSR+FeMrrvXQUJG5N
XWRllVfr3WTypbq+An8mCwawXxldEyyg+ezyVJ1hsassyeAfiCwmz7M1US48
vQA5VRe/wtyQI4Ss4lXFr/CeYccwzhuM41Mgny196fgtMz8l1E+hQb1/zw8+
fkSX81c3pweHvgFem4ZnYP52U1eNo2cxJcOQ6PDxI/U8Pz8doaG8nGvMdfE3
uANoOgIgh5rZVsO/fdpGqnsj6MjcGsS7XLfYOzJLo96ZHRk0TOjR9Q9vbx9N
5V/16jV/fnPx1x8u31w8p89vX55dXXUfJq7H25evf7h63n/q3zx/fX198eq5
vIxWFTVNHl2f/fRI2MOj1ze3l69fnV09GlgNmy+RQWS7pO5tbchPtJ14NsOm
8uz85r//a3ECrf0TcqDjxeI7qE6+fLv44wm+3G9MKbNxnJGvcMDdBPAD++EU
Os8pHGeNpqiLGEMUplQbw5zly38nzfzHqfrTMtkuTv7sGmjBUaPXWdTIOhu2
DF4WJY40jUzTaTNq39N0LO/ZT9F3r/eg8U9/ycmlZotv//LnCVvPxa/gC3sk
O3D/94+XMGoqIpQpSNs+Vf+HMPnmxTni1TuAUY7o6Hs6wuTxcwgxPhAifhLA
adtTwzXHpwrwREEZ1FocMa8s2MyPYMOVo+wtidvQu5phU9jVGP4Ssn+JEAfr
YCstBPwHC+zxcapsS4hsBUd62Ji6ZBQt8uHjRyByTjGFQ7Mh1L7zhF8yA+HS
Uyw2BmmdryvQgw1NxwG5rQmQSS+cPYhQTOARv4RMQzcIJdzmsp5noEDJIMp2
O0FxD6iGHS2HXGG567OHRjdEAXgz/9YaEFkiXcAnqFVSC0WcahomCggEekcp
OFqbZM7LgHOmv4CgHxacgnF9mDS8FJoFIEzeQdGrXDcNRk3JwZlB8gttLUxZ
31VZGhqJS394jBo8zOcd/BqFTHBbG8ZgkFGX0JBt4CFpDBFJ9uCgjcxhUi+I
UmqYI0wLxkKkbrpPQsYWSYoXLVEPCHWva9YTBQQrZYtar1YU7il0kElQXgQS
Rtux74TwJeRelPoZxFlxRVoOUeKEQhqPS5N+PWtaSnAwCOmEes59KmvEadka
7hGLWQxEdmqgGWFqyNqI8bpcT6uVuVdlWyyxeoxIo1lB7oJiXDCJpNqcuxCl
t6C6bmUgGQ4qXKbD77jVMy+n5DHim5w16XSrw6onMxowWgMiBlHPBo9hiFRj
Ft+i8p+jdayaVZgSsv6Q/1WyJ+gfO8lcvaCUwWc18OE1GBh7IL+6IFizMvCC
h+53mFIKGHOcszM1T5pAHsrTKIeg+TdG3+1mtANUnQDiWSSm1NsbHljTpnEd
+PWpDHI87QpgqyqHZGJh5CldacBbgJtXFkjgmqcSyUVrKRlPXlEiI7mWQwvs
9RGmIN+kAq8uwseMCeqsd8df2hRZXLaK1hkvryGm3E9PGiYIF04WEPl5z/n8
DhNdFdX0e8r+IEumsVvrQHi/eMScQqzWkOFlhM/wNkk6gDFsCS2VA61GRi/K
IntOnO35RI+KM+Ilbm7XkfjvGpFnRyGW8lYxdMnqOa3GUn/Nmp2D4NrMYH+m
Zps58zsdQ0uHRWz8AoEWKfmOPISXtWE/rPftzblV73S2swuouRNcDKPrxVvg
YcIXm+EoDVhtaD5UBSwQkU0qHr9/QDF1JQ9K3g07x95GQ4dt4zrRXKQoQRSM
UwA7Utmby2HiQqYOzsHwtzetmGJfDwsSICtpO6e3Cdd4SKhYoyBMkuQdiLQE
e/tTPpiBxtknp4gwR+QF7JylWQuAQTVQMS3IVYCivD1CIqLKkWiaizRw3awQ
qzGOwVU9BFUJYql18YLNA+qHuTnmM7osAnNNyRBBAwH5KxJZqkceXyyMvX5H
Pk1iMJBC9ci5XJJAYVYRIeUKJHMCX4ZwDMFy4OjifsQlXGxxi8WyBMiujS6R
vPpikzP3UbjvDH5spClFIDCHTu4SkQ6OK4jsXyX6MURf8RrZSuZyG0d84in4
TFDigkl9hBsMRqQJc6A7LZuHhnvI1pTGIBu0FUxZwS/MPuMZI1+0DhHy0jEf
Yg3dyUNQx8K+W6oXtTbfeSOU4qWAscfJZLdX8uvhqjPMLNKxq1IF1lnOEqL+
LuqPuhcSPBcUTEnMjQJAmlEXgM/QQJlcSRpNCMKZ5EpnzOoKvLaOCZkgDqEz
A0HmytyGK38Kg86qleCx3/y5lIuAFKQYv80Pnzx04JlKntZmdiOK4YAR5S8i
EYB3veYgICDIgF9kduDqhIJMIsgfwxoqFtIC0BgsgxMEr/IILZiJQcMFCM26
j/rxyQwZBhsOU1dvmdRzZNcE2zvsCXr8ARIMM5Iv/CFLkK/0hiY1usfq+d6B
iGiur80dSn+7Fz7SOOeffIOVSlNJpTY4z/lM/J8PRc18rYIPSxi0JD0940Sm
LUvjSUbkv32WQv2js7jRiv7LuFQsVeeGSkvizlIeB2LymYfjYoQostZCU07f
Igtwlul3Izjre+so5ThCMwxTs0u6oS2Mpu/ghBxhB6kOAiM6wPw0orttolk9
T5vKgMuORUm1oyja0hfuGAkFkZIcIEAUJTTZ8HAiCqW8Cy+kGk8HKJ1iZCXd
AFl5p/Ms3ad2eP9H7miNrqWJs819RJc173uVx2UbAB3xbTJOnwmzhVG0Risx
4NsAz/1rzoOhL0CHP2IQ4i7DE6rAELBT1u0ZSSuZdLw17VYoPUHH6Ip7STkJ
J+Q50LOrKkVtEeiTP5Rkdw3tmEsYXPwOM0LJb8AN2VpWlHiptWmC4OpDKQ2+
qbZz5RzsjaFt9Qh9ft7XX1izZ2+CBnqlrxfpHHpIsQHqTteZaTgJf3AAS7tD
tRxOShpn2K3tz07GESfkhI6myj47ympK2iDLk8uUS+ODBMOYRygimn2FIaZO
dbALvFFCT+krMfyuPkX+BZYfx6W5gFXPlLFNpBXwBHzcUtWfik3Q3zUx9g6o
uNanzq9evyUBXpzd3r65uAjecPEu0Lmt2Lpt06aZFCQ54fCx0OFSmtWU8YUD
+dJdU9XEJtNar6tyRRrpPn5FGVLBAxhfNLVVzjmOOGiv4XBPpsrM13Qg4Kv+
U2gpJ/GgOa5n9EcnzDnCXe+1w2mRodMAxacB4nCc95GGhNbLOe7eW5Q47L/o
MsZ+cU4F/YJczVSKJ7dsfQM7C9052t3eIHpZ5hyHPx1B3dDvH8uHj0opz5vo
PIMPw8xnxG4pibpi5o42p4ONkkCvnPVNfYzgciwfS8CUafBuKflOciyJLvtk
SlKieTgNEzzaD8/9/T6TGQ7J1Fy9imSS9wWXwxJMV9GaqEnQm1j8uuRENyjR
SjZMxdzUa872ab7DiXZpYR994bxLcKwufGmLNKIBGTz6yDCZDcpl0UmvB+TK
h07SUV+GqMo9LWUlIJnyzRU8Scx6ZLP2h19lNalJZrD85mt/FEQeKRParPCc
FWEgHcmdViHUTaMw0rR1aV0mumdN+OqZSN2Wh+vUn8Mg3W0HK0diWgjcEqs1
gyKgu7gTbIWVLe1rd9Yk4s/m123uL6CA9lNe43fZxQy4Z+CfzIv0OhTZN/Fe
I9FrGKTI3cMsHvYRCCQbwBVIog+VAHafEZG0NCZRYojHYcoZUlV7OO0ausAc
9joNLK8rFHIZhpKzw+7GY30ZjR8OxYSgcgwRqvLchl7YVlnJS6OoM1ZUXHZ3
JSS5Mv1b7rHet2ApJ4deFUrWvVaqy+dT/ucGU6d0Z4peley5uxgKkQVtkDAS
ifLkiE5gNJwB2YJcjmkC0UQlHmx5VzCPUx+ReAWSAFNpKNHor6zput7t1XLo
xgEFYncR7/zt5feIfn85dLn040feKncjritSbytrs2WWZ82OqTbMMzpv98tw
EjsW8FQ1u62Ra25B78506Ha4WpyqsYSsu2mErhdSiTh1L/SzyVXGRi4HgaZ2
diWRsYO7YC+nIsrOE5HQQWgToOo/WNNEZv1F6Cic95Vrf2nLVeHDiGLHbJzW
K/Ifn46vNzd8EVZFK+bzhnBlXcEAxKXwlx6Dapk4d8MThdaVVGDOdlsJU2SA
YlWsxnXB6NqWpI4IBvydxIOBKp5n5VLkYL+fnkb+G+xztPDLVbgqvv1UZAL6
PnjhbT/SVMmSn3ZL9qbh9mgAGQIFQyRxzl2VZob0A66DuLSEXjqIDiBC3pDL
INRZU11bXLYc844ukXKKvryZUomi8YhO36MXeU3k7Wxi4Fzd0cDljScl2bYH
n0hEsUjb5YqxCz6P5j08WCifjMjhFRE8lQLB6FLnMTzwOtwpqBtXL6u7DiHA
LdwCO0EC+AxOIl3Ml9nTPlnwJ6PBa67c9Fg9c9dtR2/buk6P1aVnp4GXsdjU
4cfD9brpfvCQYzIEJAkDcj10POZNJperIXPtTtrdaQ9fxnWn99qfgcizrL95
wQbYI164bQJ7WePFJkciErpX2XYLG4CN4zFczveHAUNEIrzvrivQxSHfx0GA
jwsUXfiOhQQWCUtxfrC3A90Rt7/X5+uwXOUiFHQ1beavupm7HWUL5GwjGLDb
zZESaqweR67j3WU1usvNxsHsYtQDbH+CGahFTiekOj/ml/3bcdSy0XG8O4aO
V/aZIeHgbPsBYN77htNkB5aCezES94od2FnYTwoFey7jbxHwbWoud7l6WUS7
PGPuuQ3XYq0vfTE7Yduv4tCgfWwYB+U+UOwtMLpEMMJ9STU3bmUjEZ26vIoX
2iVFY3d9SqkoQxvWWFfQlHtW+341TqBV15+U+al3ZCPGGLMrQsrxjjsNl8oO
Dx1UGrvyZVgH7qtTJnPH3rKh6NMXJYe3db1Xs0O7UBOlp+6KiCPywy0Y/JLu
kPIHix6o3vOVPn/xWo3LnvMBOQqtU27IpQLHzh0oLYpjQjeU28Lbh221NomB
7HRJJjRPOZMMgqa/5B0M09epxehzucXub5kc5nQC4YhWwNja3f6SWmZIoHyI
6RgdexT/gHLfj6MslQIXwJaLvYwgOtIichr50UszgsdUMKSdkmTTn1XQ74jo
rou78TIo4fcWzJek3viis8f4ADvdz0M/iUQUvxqodNVzkowUeE9Johz8xpCc
d1eG+DcO25orFH2tv8sR92Y6GA1TsyL76qvafNWsG05mFLDg3yhtpQpQdl5A
vxRrzHr/vpYTeS74suqhoJK7XeaQngdqHlClrs7qLv4wI9kHFEz8KeUzI/Ww
xRvGALEnb1kdknQg2JrsMswNohADY6UTBT6OeliyOQmw18cXNfaCDekiDJMD
iSLOcQAfemKRNa62ngd12jAzi6eHul6Tou4zawZzuxOK0ajVwXNv95Q9D++U
zAdcPKI3PX8QgGNYO8yuplHebmm7IiT2mdAQ0jzajeSp4wJ0lCl7QIK23K8Z
/F+KML7bsQRBvbWDs9819VD/Y/c4/I9fHTDF4Y/3/oBY6FmVUTR5rM763zHS
Uc7ITxjV+8eux8eIT7mzjF1/IA9rU+/f12ZFY6yokMURVcrhdNv1BGnSji6s
6YZ+zN4NMQ+ulpfujoyctiXvZKFFwGm4FSOf7/XjlCa06Kly8y3iCrE0HneN
yOi7i4gylKu/up5Pg7dOgrdYHX//+9/lN9b0520n5Bu6l7UYaStc169m8Z+v
xtpc3w9XJ7PFbDGfz/mD+eDaCt9WoM33pb8/06cj9UG+D9vCvvTgSMV9w7ao
r/r5SO33Ddrivkc/+3ZIOWiL+h7xlPG4QVvYlybDg6hv2Nbp7KnXGX1IP7i2
wrcVaOv3IlT9z9J2FGxEvxdu2u6PU2v/58OBjk6nhzrO+Y/rKf1808ERRZuf
MfWRk/NTHZ0ehx33NOTW8XOkocBcj73q8c8s+eAbS2mkD8lD9vq7DPZ3Wezv
Mtlh4wM2+7uM9gGrXYSqW37wjaVvLNHYj/yV36ivgpHjxq7zSzcyf9AffGPp
G0s0PgBOB9Gpx+Ee9PYbSwbKyftT9biPD+4/ePiXRz6k3Lp48Mgfc3+pXlIE
Y8n5V6IR5KtFf4wWJyw3i/iIgFOzFwv6yAseGcvV2O503hp3O2VAK32iIad+
xHaJcrhuUsHv75FzfkkHW1U5i+pw7vzrodG7jKIr89APrnoyH55R0e8rF2GK
5iijWNKY1ub+Gf0GpfvJ/oF7VnR2fMP5AB3wuBchkPj4IIu9WcwHi4N4TmUu
M+D3xdBlow8tQXDEz8U3Wv4BmY87mZ/+L2U+/gyZnzqZZS7OrUhmztIG4oo1
cKdtL/DTTuCT3yvwO2O2/lj+s8Q9ceJ2M/X3wORmSJjBxumkI5aShM3Vs8M5
2c1i6mcQl9XlKK12Ou6LaWcruszW82EMHJdT/O52BG+7t++zPwfq7FtAAzom
Pno13f0cC6ixL41W7oK8wHKsM/f/V4jD92V7Ap+RKmKACm6woFC8GOjmZb+f
IhCrvMtT6JiaX4xV9NLp/fMTNDyBvDGw9YevfMxAyLaHau3ghk9QE+jKhmwq
6u3eTxEwHfd2//mDu1ifxvXfsTvDvtRC2QzSmbcmaWsqEJyH/5WTnUxunz2f
y298L89enR14TP9VC/2gYsJ//gcQuQY+Z00AAA==

-->

</rfc>
