<?xml version='1.0' encoding='utf-8'?>
<?xml-model href="rfc7991bis.rnc"?>
<?rfc toc='yes'?>
<?rfc compact='yes'?>
<?rfc subcompact='no'?>

<rfc xmlns:xi="http://www.w3.org/2001/XInclude"
     xml:lang="en"
     ipr="trust200902"
     submissionType="IETF"
     consensus="true"
     category="info"
     docName="draft-stewart-tsvwg-sctpecn-07"
     version="3">

<front>
<title abbrev='ECN for SCTP'>
Explicit Congestion Notification for the Stream Control Transmission Protocol
</title>
<seriesInfo name="Internet-Draft"
            value="draft-stewart-tsvwg-sctpecn-07"/>

<author initials="R." surname="Stewart" fullname="Randall R. Stewart">
<organization>Netflix, Inc.</organization>
<address>
    <postal>
        <street>15214 Pendio Drive</street>
        <city>Bella Collina</city>
        <region>FL</region>
        <code>34756</code>
        <country>United States of America</country>
    </postal>
    <email>randall@lakerest.net</email>
</address>
</author>

<author initials="M." surname="Tüxen" fullname="Michael Tüxen">
  <organization abbrev='Münster Univ. of Appl. Sciences'>
                Münster University of Applied Sciences</organization>
  <address>
    <postal>
        <street>Stegerwaldstrasse 39</street>
        <city>48565 Steinfurt</city>
        <country>Germany</country>
    </postal>
    <email>tuexen@fh-muenster.de</email>
  </address>
</author>

<date />

<abstract>
<t>This document describes the addition of the Explicit Congestion
Notification (ECN) to the Stream Control Transmission Protocol (SCTP).</t>
</abstract>
</front>

<middle>

<section>
<name>Introduction</name>
<t>At the time SCTP was initially defined in <xref target='RFC2960'/>,
ECN as specified in <xref target='RFC2481'/> was still an experimental document.
This left the authors of SCTP in a position where they could not directly
refer to ECN without creating a normative reference in a standards track
document to an experimental RFC.
To work around this problem the authors of SCTP decided to add two reserved
chunk types for ECN (CWR and ECNE) but did not fully specify how they were to
be used except in a vague way within an appendix of the document.
This worked around the document reference problem, but left ECN and its
implementation for SCTP unspecified.
This document is intended to fill in the details of ECN processing in SCTP
in a standards track document.</t>
<t>This document assumes that the reader is familiar with ECN
<xref target='RFC3168'/>.
Readers unfamiliar with ECN are strongly encouraged to first read
<xref target='RFC3168'/> since this document will not repeat any of the details
on how the various IP level bits are set.
This document will use the same terminology has <xref target='RFC3168'/>.
For example the term ECT is used to indicate that the IP level packet is marked
indicating the transport (SCTP) supports ECN.</t>
</section>

<section anchor="conventions">
<name>Conventions</name>
<t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>",
"<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL NOT</bcp14>",
"<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>",
"<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
"<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are to
be interpreted as described in
BCP 14 <xref target='RFC2119'/> <xref target='RFC8174'/> when, and only when,
they appear in all capitals, as shown here.</t>
</section>

<section>
<name>Terminology</name>
<t>All integer fields defined in this document included in an SCTP
packet <bcp14>MUST</bcp14> be transmitted in network byte order, unless
otherwise stated.</t>
<dl newline="false">
<dt>ECT:</dt>
<dd>The term used to indicate that the IP level packet is marked indicating the
transport is willing to support ECN for this packet.</dd>
<dt>not-ECT:</dt>
<dd>The term used to indicate that the IP level packet is marked indicating the
transport is NOT willing to support ECN for this packet.</dd>
<dt>CE:</dt>
<dd>The term used to indicate that the IP level packet is marked indicating that
a router in the network has marked the packet as having experienced
congestion.</dd>
</dl>
</section>

<section>
<name>Chunk and Parameter Formats</name>
<section>
<name>ECN Support Parameter (32768)</name>
<figure>
<name>ECN Support Chunk Parameter</name>
<artwork align="center">
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|     Parameter Type = 32768    |      Parameter Length = 4     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork>
</figure>
<dl newline="true">
<dt>Type: 16 bits (unsigned integer)</dt>
<dd>
<t>This field holds the IANA defined parameter type for the
"ECN Support" chunk parameter.
IANA is requested to assign the value 32768 (0x8000) (suggested) for this
parameter type.</t>
</dd>
<dt>Length: 16 bits (unsigned integer)</dt>
<dd>
<t>This field holds the length in bytes of the chunk parameter;
the value <bcp14>MUST</bcp14> be 4.</t>
</dd>
</dl>
<t>The ECN Support Chunk Parameter <bcp14>MAY</bcp14> appear in INIT and
INIT ACK chunks and <bcp14>MUST NOT</bcp14> appear in any other
chunk.</t>
</section>
<section>
<name>ECN Echo (12)</name>
<figure>
<name>ECN Echo Chunk</name>
<artwork align="center">
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Chunk Type=12 | Flags=00000000|       Chunk Length = 12       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                       Lowest TSN Number                       |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|            Number CE Marked Packets Seen since CWR            |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork>
</figure>
<dl newline="true">
<dt>Flags: 8 bits</dt>
<dd>
<t>Set to all zeros on transmit and ignored on receipt.</t>
</dd>
<dt>Length: 16 bits (unsigned integer)</dt>
<dd>
<t>This field holds the length in bytes of the chunk;
the value <bcp14>MUST</bcp14> be 12.</t>
</dd>
<dt>Lowest TSN Number: 32 bits (unsigned integer)</dt>
<dd>
<t>This parameter contains the lowest TSN number contained in the last packet
received that was marked by the network with a CE indication.</t>
</dd>
<dt>Number CE Marked Packets: 32 bits (unsigned integer)</dt>
<dd>
<t>This parameter contains the total number of CE marked packets that has been
seen since the first CE mark received while waiting for a CWR chunk.
Note that the CE counter will overflow from 0xffffffff to 0 if a CWR chunk is
not recieved.</t>
</dd>
</dl>
<t>Note that the appendix of <xref target='RFC4960'/> did not have the field
Number CE Marked Packets.
Implementations <bcp14>SHOULD</bcp14> accept an 8 byte form of this chunk that
does not include this field.
In such a case the implementation <bcp14>SHOULD</bcp14> treat the missing field
as indicating one CE marked packet for any purpose for which the implementation
is using this field.</t>
</section>
<section>
<name>CWR Chunk(13)</name>
<figure>
<name>CWR Chunk</name>
<artwork align="center">
 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Chunk Type=13 | Flags=0000000R|           Length = 8          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                           TSN Number                          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
</artwork>
</figure>
<dl newline="true">
<dt>Flags: 8 bits</dt>
<dd>
<t>The R Bit indicates if the CWR is a retransmission of an earlier CWR
that may have been lost.
If this bit is set, then the TSN number included is the latest TSN that a CWR
has been responded to.
If the R bit is clear, than the TSN indicated is the latest TSN for that
destination.</t>
</dd>
<dt>Length: 16 bits (unsigned integer)</dt>
<dd>
<t>This field holds the length in bytes of the chunk;
the value <bcp14>MUST</bcp14> be 8.</t>
</dd>
<dt>TSN Number: 32 bits (unsigned integer)</dt>
<dd>
<t>This parameter contains the TSN number to which the sender has reduced his
congestion window to.</t>
</dd>
</dl>
</section>
</section>

<section>
<name>Procedures</name>
<section>
<name>SCTP Initialization</name>
<t>In the SCTP association setup phase, the source and destination SCTP
endpoints exchange information about their willingness to use ECN.
After the completion of this negotiation, an SCTP sender sets an ECT
codepoint in the IP header of data packets to indicate to the network
that the transport is capable and willing to participate in ECN for
this packet.
This indicates to the routers that they may mark this packet with the
CE codepoint.</t>
<t>If the SCTP association does not wish to use ECN notification for a
particular packet, the sending SCTP sets the ECN codepoint to not-ECT,
and the SCTP receiver ignores the CE codepoint in the received packet.</t>
<t>For this discussion we will call the endpoint initiating the SCTP
association as EP-A and the listening SCTP endpoint as EP-Z.</t>
<t>Before an SCTP association can use ECN, EP-A sends an INIT chunk
which includes the ECN Support parameter.
By including the ECN Support parameter the sending endpoint (EP-A) will
participate in ECN as both a sender and a receiver.
Specifically, as a receiver, it will respond to incoming data packets that have
the CE codepoint set in the IP header by sending an ECN Echo chunk bundled with
the next outgoing SACK Chunk.
As a sender, it will respond to incoming packets that include an ECN Echo chunk
by reducing the congestion window and sending a CWR chunk when appropriate.</t>
<t>Including an ECN Support parameter in an INIT or INIT-ACK does not commit
the SCTP sender to setting the ECT codepoint in any or all of the packets it
may transmit.
However, the commitment to respond appropriately to incoming packets with the
CE codepoint set remains.</t>
<t>When EP-Z sends INIT-ACK chunk, it also includes an ECN Support parameter.
Including the ECN Support parameter indicates that the SCTP transmitting the
INIT-ACK chunk is ECN-Capable.</t>
<t>The following rules apply to the use of ECN for an SCTP association.</t>
<ul>
<li><t>If the SCTP Endpoint supports ECN a sender of either an INIT or
INIT-ACK chunk <bcp14>MUST</bcp14> always include the
ECN Supported Parameter.</t></li>
<li><t>After the exchange of the INIT and INIT-ACK if both endpoints have
not indicated support of ECN by including an ECN Supported Parameter, then
ECT <bcp14>MUST NOT</bcp14> be set on any IP packets sent by any endpoint which
is ECN capable.
Furthermore upon receiving IP packets with a CE codepoint set, the ECN capable
endpoint <bcp14>SHOULD</bcp14> ignore the CE codepoint.</t></li>

<li><t>If both endpoints have included an ECN Supported Parameter in the INIT
and INIT-ACK exchange, then both endpoints <bcp14>MUST</bcp14> follow the
ECN procedures defined in the rest of this document.</t></li>

<li><t>A sending endpoint <bcp14>SHOULD</bcp14> set the ECT code points on
IP packets that carry DATA chunk.
This includes IP packets that have other control chunks bundled with the
Data.</t></li>
</ul>
</section>
<section>
<name>The SCTP Sender</name>
<t>For an SCTP association using ECN, new data packets are transmitted with
an ECT codepoint set in the IP header.
When only one ECT codepoint is needed by a sender for all packets sent on
an SCTP association ECT(0) <bcp14>SHOULD</bcp14> be used.
If the sender receives an ECN-Echo chunk packet, then the sender knows that
congestion was encountered in the network on the path from the sender to the
receiver.
The indication of congestion should be treated just as a congestion loss in
non-ECN-Capable SCTP.
That is, the SCTP source halves the congestion window "cwnd" for the
destination address that the sender transmitted the data to and reduces the
slow start threshold "ssthresh".
A packet containing an ECN-Echo chunk shouldn't trigger new data to be sent.
SCTP follows the normal procedures for increasing the congestion window when it
receives a packet with a SACK chunk without the ECN Echo chunk.</t>
<t>SCTP should not react to congestion indications more than once every
round-trip time.
That is, the SCTP sender's congestion window should be reduced only once in
response to a series of dropped and/or CE packets from a single window of data.
In addition, the SCTP source should not decrease the slow-start threshold,
ssthresh, if it has been decreased within the last round trip time.</t>
<t>One method to accomplish this is as following:</t>
<ol>
<li><t>During association setup, create a new state variable ECN_ECHO_TSN
and ECN_ECHO_LAST for each destination.
The initial value of these variables are set to the initial TSN that will be
assigned minus 1.</t></li>
<li><t>When an ECN Echo chunk arrives, use the TSN in the ECN Echo to
establish which destination the packet was sent to.
We will call this destination the selected destination.
If the chunk cannot be found note that an override is occurring from the selected
destination (if found) select its ECN Echo TSN.</t></li>
<li><t>Compare the ECN Echo TSN with the ECN_ECHO_TSN for the selected
destination.
If an override is not noted and the value of the ECN_ECHO_TSN is greater than
the ECN Echo TSN proceed to step 4; else proceed to step 6.</t></li>

<li><t>Reduce the cwnd and ssthresh for the selected destination the same
as if a loss was detected during a fast retransmit.
For details, see <xref target='RFC9260'/> Section 7.2.3 and
Section 7.2.4.</t></li>
<li><t>Record in the ECN_ECHO_TSN value, the last TSN that was sent and recorded
in ECN_ECHO_LAST the TSN number from the ECN Echo Chunk.</t></li>
<li><t>If the implementation is tracking the number of marked packets, record
the value found in the 'Number CE Marked Packets Seen since
CWR' field and also add this number to the running loss count.
If such a count is not being maintained, then proceed to step 8.</t></li>
<li><t>If the implementation is tracking the number of marked packets, compare
the number in the ECN Echo Chunk TSN to the ECN_ECHO_LAST.
If it is greater than ECN_ECHO_LAST, update ECN_ECHO_LAST with
this value.
Take the difference between the stored 'Number CE Marked Packets' field and the
value from the newly arriving 'Number CE Marked Packets' and add this difference
to the total loss count.
Then update the stored 'Number CE Marked Packets' with the
ECN Echo Chunk TSN.</t></li>

<li><t>Create a CWR chunk with the value found in the ECN_ECHO_LAST for
the selected destination.
If an override was noted, set the 'O' bit within the CWR flags.
Queue this chunk for transmission to the peer destination.
Note if there is already such a chunk in queue to be sent, remove that chunk and
replace it with the new chunk.</t></li>
</ol>
<t>After the sending SCTP reduces its congestion window in response to
a ECN Echo, incoming SACKs that continue to arrive can "clock out"
outgoing packets as allowed by the reduced congestion window.
Note that continued arrival of ECN Echo chunks should still be processed as
described above, possibly reducing the cwnd, but always sending a
CWR to the receiving SCTP.
This assures that the ECN Echo and CWR are robust with regard to loss in either
direction and that the implementation, if it desires, can maintain an accurate
loss count per destination.</t>
<t>Note, originally in the appendix of <xref target='RFC4960'/> a definition was
supplied for the ECN Echo chunk.
This definition did not include the 'Number CE Marked Packets' field.
An implementation <bcp14>SHOULD</bcp14> accept such a chunk, delineating it from
the standards track version by the fact that the length field will be 8 bytes
instead of 12.
When processing this older style chunk, the 'Number CE Marked Packets'
<bcp14>SHOULD</bcp14> be treated as if it contains the number 1.
This may cause incorrect loss counts but will not cause any issues with
SCTP's ECN handling.</t>
</section>
<section>
<name>The SCTP Receiver</name>
<t>When an SCTP endpoint first receives a CE data packet at the destination
end-system, the SCTP data receiver creates an ECN Echo chunk and records the
lowest TSN number found in the data packet.
It also sets the 'Number CE Marked Packets' to 1 and queues this chunk for
transmission at the next opportunity.
If there is any ACK withholding implemented, as in current "delayed-SACK" SCTP
implementations where the SCTP receiver can send an SACK for two arriving
data packets, then the ECN Echo chunk will not be sent until the SACK is sent.
If the next arriving data packet also has the CE codepoint set, then the
receiver updates the queued ECN Echo chunk to have a higher TSN value (the
lowest one in the newly arriving data packet) and increments the
'Number CE Marked Packets' field in the queued chunk.</t>
<t>Multi-homing requires one added restriction upon the ECN Echo chunk,
such a chunk <bcp14>MUST</bcp14> be bundled with a SACK, and the
SACK <bcp14>MUST</bcp14> follow the ECN Echo Chunk.
This ordering is necessary so that the receiver of the ECN Echo chunk will at
least one time find the proper destination to which the chunk was originally
sent.
Without this restriction it is possible a SACK could arrive ahead of the
ECN Echo Chunk, no matter what the sending order, causing the sender to free
the DATA chunk and thus loose the association with what destination it was sent
to.
For the same reason we also require the ECN Echo Chunk be earlier in the packet
ahead of the SACK so that the SACK is not processed before
the ECN Echo Chunk.</t>
<t>After transmission of the ECN Echo chunk, usually bundled with the SACK, the
receiver does not discard the ECN Echo chunk.
Instead it keeps the chunk in its queue and continues to send this chunk bundled
with at least a SACK chunk on each outgoing packet, updating it as described
above if other CE codepoint data packets arrive.
The ECN Echo chunk should only be discarded when a CWR Chunk arrives holding a
TSN value that is greater than or equal to the value inside
the ECN Echo Chunk.</t>
<t>This provides robustness against the possibility of a dropped SACK packet
carrying an ECN Echo chunk.
The SCTP receiver continues to transmit the ECN Echo chunk in subsequent SACK
packets until the correct CWR is received.</t>
<t>After the receipt of the CWR chunk, acknowledgments for subsequent
non-CE data packets will not have an ECN Echo chunk bundled with them.
If another CE packet is received by the data receiver, the receiver would once
again send SACK packets bundled with a newly created ECN Echo chunk.
The receipt of a CWR packet guarantees that the data sender has received the
ECN Echo chunk for the TSN specified, and reduced its congestion window at some
point after it sent the data packet for which the CE codepoint was set.</t>
<t>When processing a CWR, it is important that the receiver of the CWR validate
the source address from which the CWR came from.
It <bcp14>SHOULD</bcp14> match the destination the ECN Echo was sent to unless
the override bit is set in the CWR Chunk.</t>
</section>
<section>
<name>Congestion on the SACK Path</name>
<t>For the current generation of SCTP congestion control algorithms,
pure acknowledgement packets (e.g., packets that do not contain any accompanying
data) <bcp14>MUST</bcp14> be sent with the not-ECT codepoint.
Current SCTP receivers have no mechanisms for reducing traffic on the SACK-
path in response to congestion notification.
Mechanisms for responding to congestion on the SACK-path are areas for current
and future research.
For current SCTP implementations, a single dropped SACK generally has only a
very small effect on SCTP's sending rate.</t>
</section>
<section>
<name>Retransmitted SCTP Packets</name>
<t>This document specifies ECN-capable SCTP implementations
<bcp14>MUST NOT</bcp14> set either ECT codepoint (ECT(0) or ECT(1)) in the
IP header for retransmitted data packets, and that the SCTP data receiver
<bcp14>SHOULD</bcp14> ignore the ECN field on arriving data packets that are
outside of the receiver's current window.
The reasons for this can be found in <xref target='RFC3168'/> Section 6.1.5.</t>
</section>
<section>
<name>SCTP Window Probes</name>
<t>When the SCTP data receiver advertises a zero window, the SCTP data sender
sends window probes to determine if the receiver's window has increased.
Window probe packets for SCTP do contain user data (one chunk).
If a window probe packet is dropped in the network, this loss can be detected
by the receiver.
Therefore, the SCTP data sender <bcp14>MAY</bcp14> set an ECT codepoint on the
initial send of the window probe, but the SCTP sender <bcp14>MUST NOT</bcp14>
set the ECT codepoint on retransmissions of that TSN.</t>
</section>
</section>

<section>
<name>Socket API Considerations</name>
<t>This section describes how the socket API defined in
<xref target='RFC6458'/> needs to be extended to support ECN as defined in this
document.</t>
<t>Please note that this section is informational only.</t>
</section>

<section>
<name>IANA Considerations</name>
<t>TBD.</t>
</section>

<section>
<name>Security Considerations</name>
<t><xref target='RFC3168'/> defines the security considerations for ECN.
These same consideration that are described for TCP are applicable to SCTP.</t>
</section>

</middle>

<back>
<references>
<name>References</name>
<references>
<name>Normative References</name>
<xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
<xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.3168.xml"/>
<xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
<xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.9260.xml"/>
</references>
<references>
<name>Informative References</name>
<xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2481.xml"/>
<xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2960.xml"/>
<xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.4960.xml"/>
<xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.6458.xml"/>
</references>
</references>

<section numbered='false'>
<name>Acknowledgments</name>
<t>Special thanks to <contact fullname="Xuesong Dong"/> for being a coauthor on
early versions of the document.</t>
<t>Thanks to <contact fullname="Richard Scheffenegger"/> for his helpful
comments and review.</t>
</section>
</back>
</rfc>
