<?xml version="1.0" encoding="utf-8"?>
<!-- name="GENERATOR" content="github.com/mmarkdown/mmark Mmark Markdown Processor - mmark.miek.nl" -->
<rfc version="3" ipr="trust200902" docName="draft-morton-tsvwg-sce-04" submissionType="IETF" category="exp" xml:lang="en" xmlns:xi="http://www.w3.org/2001/XInclude" updates="3168, 8311" consensus="true">

<front>
<title abbrev="sceb">The Some Congestion Experienced ECN Codepoint</title><seriesInfo value="draft-morton-tsvwg-sce-04" stream="IETF" status="experimental" name="Internet-Draft"></seriesInfo>
<author initials="J." surname="Morton" fullname="Jonathan Morton"><organization></organization><address><postal><street>Kokkonranta 21</street>
<city>Pitkajarvi</city>
<code>31520</code>
<country>Finland</country>
</postal><phone>+358 44 927 2377</phone>
<email>chromatix99@gmail.com</email>
</address></author>
<author initials="P." surname="Heist" fullname="Peter G. Heist"><organization></organization><address><postal><street>Redacted</street>
<city>Liberec 30</city>
<code>463 11</code>
<country>Czech Republic</country>
</postal><email>pete@heistp.net</email>
</address></author>
<author role="editor" initials="R.W." surname="Grimes" fullname="Rodney W. Grimes"><organization></organization><address><postal><street>Redacted</street>
<city>Portland</city>
<code>97217</code>
<country>United States</country>
<region>OR</region>
</postal><email>rgrimes@freebsd.org</email>
</address></author>
<date/>
<area>Internet</area>
<workgroup>Transport Working Group</workgroup>
<keyword>Internet-Draft</keyword>
<keyword>congestion control</keyword>
<keyword>SCE</keyword>

<abstract>
<t>This memo reclassifies ECT(1) to be an early notification of
congestion on ECT(0) marked packets, which can be used by AQM
algorithms and transports as an earlier signal of congestion than
CE.  It is a simple, transparent, and backward compatible upgrade to
existing IETF-approved AQMs, RFC3168, and nearly all congestion
control algorithms.</t>
</abstract>

</front>

<middle>

<section anchor="terminology"><name>Terminology</name>
<t>The key words &quot;MUST&quot;, &quot;MUST NOT&quot;, &quot;REQUIRED&quot;, &quot;SHALL&quot;, &quot;SHALL NOT&quot;,
&quot;SHOULD&quot;, &quot;SHOULD NOT&quot;, &quot;RECOMMENDED&quot;, &quot;NOT RECOMMENDED&quot;, &quot;MAY&quot;, and
&quot;OPTIONAL&quot; in this document are to be interpreted as described in
<xref target="RFC2119"></xref> and <xref target="RFC8174"></xref> when, and only when, they appear in all
capitals, as shown here.</t>
</section>

<section anchor="introduction"><name>Introduction</name>
<t>Traditional TCP congestion control exhibits a &quot;sawtooth&quot; pattern
which, in the most favourable cases, oscillates around the optimum
operating point of maximum throughput and minimum delay, which
exists at the point where the congestion window equals path BDP.
The term &quot;sawtooth&quot; brings to mind the straight-edged graphs of
TCP Reno, but the equally common TCP CUBIC is essentially similar
in character, as are other AIMD-derived algorithms.</t>
<t>A number of proposals have sought to improve this, but introduce
various other tradoffs in return.  TCP Vegas is consistently
outcompeted by standard TCPs, DCTCP proved to be too aggressive
for deployment in the public Internet, and while BBR appears to
have avoided both of these problems, its complexity makes it
difficult to implement correctly.  Each of these proposals is
characterised by primarily changing only the endpoints, not the
network nodes on the path between them; though DCTCP is intended
for use with a specific style of AQM, it can work with standard
AQMs as long as there is no competing non-DCTCP traffic.</t>
<t>Some other proposals have attempted to convey information about
the network path explicitly, by having network nodes inject data
about link capacity and/or utilisation into passing traffic.
These proposals have generally been unsuccessful due to the
complex slow-path processing required in network nodes, and are
not widely deployed.  The only successful proposal of this type
is Explicit Congestion Notification <xref target="RFC3168"></xref> which allows an
AQM to signal congestion by marking packets with (essentially)
a one-bit signal in preference to dropping them.</t>
<t>ECN defines a two-bit field supporting four codepoints, of which
three are in active use and the fourth is a semantic duplicate.
It was explicitly suggested during ECN's development that new
meaning could be given to this spare codepoint, including as a
lesser indication of congestion in <xref target="RFC3168"></xref> (section 20.2).  With
an alternative use of this codepoint having fallen out of favour,
the time is right to revisit this suggestion and propose a
workable method of applying it.</t>
<t>In so doing, care must be taken that backwards compatibility is
maintained with existing traffic, endpoints and network nodes
that are known or suspected to have been deployed.  Keeping the
changes to on-wire protocols minimal, and the complexity of
implementation low, are also highly desirable.</t>
<t>This memo reclassifies ECT(1) to be an early notification of
congestion on ECT(0) marked packets, which can be used by AQM
algorithms and transports as an earlier signal of congestion than
CE (&quot;Congestion Experienced&quot;).</t>
<t>This memo also briefly discusses how transports should respond
to ECT(1) marked packets.  Detailed specifications of this
behaviour are left to transport-specific memos.</t>
</section>

<section anchor="background"><name>Background</name>
<t><xref target="RFC3168"></xref> defines the lower two bits of the (former) TOS byte in the
IPv4/6 header as the ECN field.  This may take four values:
Not-ECT, ECT(0), ECT(1) or CE.</t>
<table>
<thead>
<tr>
<th>Binary</th>
<th>Keyword</th>
<th>References</th>
</tr>
</thead>

<tbody>
<tr>
<td>00</td>
<td>Not-ECT (Not ECN-Capable Transport)</td>
<td><xref target="RFC3168"></xref></td>
</tr>

<tr>
<td>01</td>
<td>ECT(1) (ECN-Capable Transport(1))</td>
<td><xref target="RFC3168"></xref></td>
</tr>

<tr>
<td>10</td>
<td>ECT(0) (ECN-Capable Transport(0))</td>
<td><xref target="RFC3168"></xref></td>
</tr>

<tr>
<td>11</td>
<td>CE (Congestion Experienced)</td>
<td><xref target="RFC3168"></xref></td>
</tr>
</tbody>
</table><t>Research has shown that the ECT(1) codepoint goes essentially unused,
with the &quot;Nonce Sum&quot; extension to ECN having not been implemented in
practice and thus subsequently obsoleted by <xref target="RFC8311"></xref> (section
3). Additionally, known <xref target="RFC3168"></xref> compliant senders do not emit
ECT(1), and compliant middleboxes do not alter the field to ECT(1),
while compliant receivers all interpret ECT(1) identically to ECT(0).
These are useful properties which represent an opportunity for
improvement.</t>
<t>Experience gained with 7 years of <xref target="RFC8290"></xref> deployment in the field
suggests that it remains difficult to maintain the desired 100% link
utilisation, whilst simultaneously strictly minimising induced delay
due to excess queue depth - irrespective of whether ECN is in use.
This leads to a reluctance amongst hardware vendors to implement the
most effective AQM schemes because their headline benchmarks are
throughput-based.</t>
<t>The underlying cause is the very sharp &quot;multiplicative decrease&quot;
reaction required of transport protocols to congestion signalling
(whether that be packet loss or CE marks), which tends to leave the
congestion window significantly smaller than the ideal BDP when
triggered at only slightly above the ideal value.  The availability of
this sharp response is required to assure network stability (AIMD
principle), but there is presently no standardised and
backwards-compatible means of providing a less drastic signal.</t>
</section>

<section anchor="some-congestion-experienced"><name>Some Congestion Experienced</name>
<t>As consensus has arisen that some form of ECN signaling should be an
earlier signal than drop, this memo changes the meaning of ECT(1) to
SCE, meaning &quot;Some Congestion Experienced&quot;.  Since there is no longer
ambiguity between two ECT codepoints, ECT(0) is referred to as ECT.
The ECN-field codepoint table then becomes:</t>
<table>
<thead>
<tr>
<th>Binary</th>
<th>Keyword</th>
<th>References</th>
</tr>
</thead>

<tbody>
<tr>
<td>00</td>
<td>Not-ECT (Not ECN-Capable Transport)</td>
<td><xref target="RFC3168"></xref></td>
</tr>

<tr>
<td>01</td>
<td>SCE (Some Congestion Experienced)</td>
<td>[This draft]</td>
</tr>

<tr>
<td>10</td>
<td>ECT (ECN-Capable Transport)</td>
<td><xref target="RFC3168"></xref></td>
</tr>

<tr>
<td>11</td>
<td>CE (Congestion Experienced)</td>
<td><xref target="RFC3168"></xref></td>
</tr>
</tbody>
</table><t>This permits middleboxes implementing AQM to signal incipient
congestion, below the threshold required to justify setting CE, by
converting some proportion of ECT codepoints to SCE (&quot;SCE marking&quot;).
Existing <xref target="RFC3168"></xref> compliant receivers MUST transparently ignore this new
signal with respect to congestion control, and both existing and SCE-aware
middleboxes SHOULD convert SCE to CE in the same circumstances as for ECT,
thus ensuring backwards compatibility with <xref target="RFC3168"></xref> ECN endpoints.</t>
<t>The permitted ECN codepoint transitions by middleboxes are:</t>
<table>
<thead>
<tr>
<th>From</th>
<th>To</th>
</tr>
</thead>

<tbody>
<tr>
<td>Not-ECT</td>
<td>Not-ECT</td>
</tr>

<tr>
<td>ECT</td>
<td>ECT or SCE or CE</td>
</tr>

<tr>
<td>SCE</td>
<td>SCE or CE</td>
</tr>

<tr>
<td>CE</td>
<td>CE</td>
</tr>
</tbody>
</table><t>Note that dropping a packet is an allowed action for any ECN codepoint.  While
that is the only way of indicating congestion with Not-ECT, it may also be used
to both indicate and reduce congestion in any state.</t>
<t>To re-state the allowed transitions another way: for ECN-aware flows, the ECN
marking of an individual packet MAY be increased by a middlebox to signal
congestion, but MUST NOT be decreased, and packets SHALL NOT be altered to
appear to be ECN-aware if they were not originally, nor vice versa.  Note
however that SCE is numerically less than ECT, but semantically greater, and the
latter definition applies for this rule.</t>
<t>Receivers and transport protocols conforming to this specification SHALL
continue to apply the <xref target="RFC3168"></xref> interpretation of the CE codepoint, that
is, to signal the sender to back off send rate to the same extent as if a
packet loss were detected.  This maintains compatibility with existing
middleboxes, senders and receivers.</t>
<t>New SCE-aware receivers and transport protocols SHOULD interpret the SCE
codepoint as an indication of mild congestion, and respond accordingly by
applying send rates intermediate between those resulting from a continuous
sequence of ECT codepoints, and those resulting from a CE codepoint.  The
ratio of ECT and SCE codepoints received indicates the relative severity
of such congestion, with a higher proportion of SCE codepoints indicating
more congestion.</t>
<t>The intent of SCE marking is a &quot;cruise control&quot; signal which permits
middleboxes to request relatively small reductions in send rate, or merely
a slowing of send rate growth.  Accordingly, SCE marks SHOULD progressively
trigger exit from exponential slow-start growth, then reduction to Reno-linear
growth (for congestion control algorithms which support higher growth rates
in congestion-avoidance phase), then a halt to send rate growth, then a
gradual reduction of send rate.  For immediate large reductions of send rate,
the CE mark MUST retain its original Multiplicative Decrease power as per
<xref target="RFC8511"></xref>, and compliant AQMs SHOULD retain the ability to employ it where
appropriate.</t>
<t>Details of how to implement SCE awareness at the transport layer are left to
additional Internet Drafts.  To ensure RTT-fair convergence with single-queue
SCE AQMs, transports SHOULD stabilise at lower SCE-mark ratios for higher BDPs,
and MAY reduce their response to CE marks IFF they are responding to SCE signals
received at around the same time (eg. within 1-2 RTTs) in the same flow.</t>
<t>To maximise the benefit of SCE, middleboxes SHOULD begin to produce SCE marks
at lower congestion levels than they begin to produce CE marks.  This will
usually ensure that SCE-aware flows avoid receiving CE marks.  When a
single-queue AQM is upgraded to SCE awareness, this will tend to cause SCE
flows to give way to non-SCE flows; to avoid this behaviour, single-queue
AQMs MAY be left as <xref target="RFC3168"></xref> compliant without SCE support.</t>
<t>For the avoidance of doubt, a decision to mark CE or to drop a packet always
takes precedence over SCE marking.</t>
</section>

<section anchor="design-rationale"><name>Design Rationale</name>
<t>The SCE design sees ECN as a &quot;network feature&quot;.  The risks with ECN signaling
(<xref target="ecn-risks"></xref>), the need to handle unresponsive flows (<xref target="unresponsive-flows"></xref>),
the utility of fairness (<xref target="fairness"></xref>), and the availability of only one ECN
codepoint all influenced the SCE signaling design.  This section discusses these
related concerns, along with what is needed from middleboxes to address them,
and how that ultimately led to the selection of ECT(1) as an additional signal
of lesser congestion (<xref target="ect1-as-sce"></xref>).</t>

<section anchor="ecn-risks"><name>Risks with ECN Signaling</name>
<t>The safety and effectiveness of ECN signaling depends upon the unaltered
transmission of the ECN bits, both for the indication of ECN support, and for
ECN signaling.  Unlike a drop, which is reliably and irrevocably signaled, ECN
signals may be erased or manipulated.  Specifically, any of the following
results in the lack of a congestion response, which is likely to lead to the
near starvation of competing flows:</t>

<ul>
<li>if transports indicate ECT(0) but do not respond to CE</li>
<li>if packets are erroneously changed from Not-ECT to ECT(0) in the network</li>
<li>if CE marks are erased after a bottleneck</li>
<li>if ECE marks are erased post-negotiation</li>
</ul>
<t>Although the lack of a congestion response is similar to when transports do not
respond appropriately to drop, the difference is that with ECN, the behavior can
be brought about in the network, without changes to the endpoint.  This may
happen by accident, for example due to a broken network configuration or
endpoint implementation, or on purpose, e.g. using a simple firewall rule.</t>
<t>Unresponsive flow mitigation, discussed in the next section, deals with flows
that are not responding to congestion signals, including for the reasons listed
above.</t>
</section>

<section anchor="unresponsive-flows"><name>Unresponsive Flows</name>
<t>A single unresponsive flow has the potential to nearly starve all other
competing flows in a congested bottleneck, resulting in unacceptable network
delays and collapses in throughput.  The need to handle unresponsive flows is
corroborated in <xref target="RFC7567"></xref> (section 4), stating:</t>
<blockquote><t>&quot;Research, engineering, and measurement efforts are needed regarding the
design of mechanisms to deal with flows that are unresponsive to congestion
notification or are responsive, but are more aggressive than present TCP.&quot;</t>
</blockquote><t>The source language from <xref target="RFC2309"></xref> (section 5) is more direct:</t>
<blockquote><t>&quot;It is urgent to begin or continue research, engineering, and measurement
efforts contributing to the design of mechanisms to deal with flows that are
unresponsive to congestion notification or are responsive but more aggressive
than TCP.&quot;</t>
</blockquote><t>The <xref target="COBALT"></xref> AQM algorithm is one example of how unresponsive flows can be
dealt with, using the <xref target="BLUE"></xref> algorithm to detect overload and trigger drops.</t>
<t>Regardless of how it's done exactly, unresponsive flow mitigation is most
effectively implemented with some level of flow awareness, so that drops may be
directed to the offending flow/s.  Once flow awareness is available, fairness
steering becomes possible, discussed further in the following section.</t>
</section>

<section anchor="fairness"><name>Fairness</name>
<t>In order for SCE flows to compete fairly with non-SCE flows, at least one of the
following is required: some form of fairness steering, or some way of separating
SCE and non-SCE flows.  Following is a non-exhaustive list of options:</t>

<ul>
<li>FQ (fair queueing), to isolate and schedule flows fairly from separate queues</li>
<li>AF (approximate fairness), so that SCE and non-SCE flows can share the same
queue, e.g. <xref target="AFD"></xref>, <xref target="I-D.morton-tsvwg-codel-approx-fair"></xref>,
<xref target="I-D.morton-tsvwg-lightweight-fair-queueing"></xref></li>
<li>DSCP <xref target="RFC2474"></xref>, to explicitly separate SCE and non-SCE flows
(see <xref target="diffserv-usage"></xref>)</li>
</ul>
<t>When available, fairness is viewed as an advantage, in that it:</t>

<ul>
<li>controls aggressive flows</li>
<li>prevents network bias</li>
<li>promotes the fair interoperation between the ever-expanding matrix of new
congestion control mechanisms</li>
</ul>
<t>The abundance of new and proposed congestion controls is making their fair
competition across bandwidths, RTTs and network conditions more difficult if not
impossible to ensure in the endpoint alone <xref target="CC-REVOLUTION"></xref> <xref target="CC-COMPAT"></xref>.
Congestion control implementations may dominate one another under different
conditions, e.g. <xref target="BBR-CUBIC"></xref>, while the widespread deployment of potentially
beneficial congestion controls that seek to minimize delay is discouraged by the
fact that they are often out-competed in bottlenecks by standard TCP.  Fairness
in the network both improves these conditions and assists transports responding
to SCE.</t>
</section>

<section anchor="ect1-as-sce"><name>ECT(1) as SCE</name>
<t>With only a single ECN codepoint remaining, options are limited for how to
signal congestion with high fidelity.  Meanwhile, the recent rise in ECN
signaling prevalence in the Internet makes backwards compatibility with <xref target="RFC3168"></xref>
a requirement.  The existence of two distinct levels of ECN signalling also
potentially enables new congestion control paradigms, eg. max-min-fair or
power-fair instead of RTT-fair, to coexist on the Internet, even in the presence
of legacy infrastructure and traffic.</t>
<t>Fortunately, the same network technologies that mitigate the well recognized
risks listed in <xref target="design-rationale"></xref> above, also make the use of ECT(1) as
defined by SCE possible, without a separate traffic identifier.  Where those
technologies cannot be deployed, Diffserv may be used to identify SCE traffic
(see <xref target="diffserv-usage"></xref>), a purpose for which it was expressly designed.  Where
that is impossible, SCE allows a graceful fallback to <xref target="RFC3168"></xref> ECN.  SCE's
usage of ECT(1) provides a safe and solid foundation on which future innovations
in the network can improve the availability and performance of high-fidelity
congestion signaling.</t>
</section>
</section>

<section anchor="diffserv-usage"><name>Diffserv Usage</name>
<t>SCE is not dependent on Diffserv <xref target="RFC2474"></xref> for its signaling, but makes use of
it in the following ways:</t>

<ul>
<li>to mark SCE traffic for experimental or private use</li>
<li>to assist middleboxes in their operation</li>
<li>to request separation of traffic having different classes of SCE response</li>
</ul>

<section anchor="sce-diffserv-codepoints-dscps"><name>SCE Diffserv Codepoints (DSCPs)</name>
<t>All SCE DSCPs indicate SCE support in the originating endpoint.  This MAY assist
SCE marking middleboxes in their operation, but MUST NOT be depended upon for
effective congestion control, because the DSCP field cannot be relied upon to survive
end-to-end in the Internet.  See <xref target="simple-two-queue-middleboxes"></xref> for an
example of such a usage.</t>
<t>SCE middleboxes MUST retain any SCE DSCPs that arrive on incoming
packets, and MUST NOT set them on packets that do not already have them.  The DSCP
field MAY be translated between Diffserv domains by the SCE middlebox, whilst
retaining the sense of the SCE-related meaning thus encoded.</t>
<t>The SCE DSCPs MAY be set on TCP ACK and control packets which have the Not-ECT
codepoint set in the ECN field, provided the TCP connection as a whole is SCE
capable (or in the process of being negotiated as such).  This allows all
packets relating to that connection to be treated equally by middleboxes which
distinguish them.  Should ECN negotiation fail, the DSCP should be changed to
some non-SCE value for subsequent traffic on that connection.</t>
<t>SCE DSCPs are not intended to imply a priority class of service.  Legacy
middleboxes are expected to map SCE DSCPs to a best-effort PHB, and the DSCP
numerical value should be chosen to make this mapping natural.</t>

<section anchor="sce-rtt-fair"><name>SCE-RTT-FAIR</name>
<t>The SCE-RTT-FAIR DSCP indicates SCE support, with standard, best-effort
service implied.  The response to SCE signals is in the &quot;RTT fair&quot; class.</t>
</section>

<section anchor="sce-max-min-fair"><name>SCE-MAX-MIN-FAIR</name>
<t>The SCE-MAX-MIN-FAIR DSCP indicates SCE support, with standard, best-effort
service implied.  The response to SCE signals is in the &quot;max-min fair&quot; class.</t>
</section>

<section anchor="sce-power-fair"><name>SCE-POWER-FAIR</name>
<t>The SCE-POWER-FAIR DSCP indicates SCE support, with standard, best-effort
service implied.  The response to SCE signals is in the &quot;power fair&quot; class.</t>
</section>
</section>

<section anchor="codepoints-private"><name>Diffserv Codepoints for Experimental and Private Use</name>
<t>Prior to approval for public experiment, the SCE DSCPs are defined in the
experimental pool xxxx11, and the following rules MUST be observed to contain
SCE traffic within the experimental network:</t>

<ul>
<li>SCE senders SHOULD set one of the SCE DSCPs when participating in an
SCE experimental network.</li>
<li>SCE middleboxes MUST NOT mark SCE on packets lacking an SCE DSCP, or
packets that may leave the experimental network.</li>
<li>SCE receivers MUST check that one of the SCE DSCPs is present before
returning SCE feedback.</li>
<li>All SCE DSCPs MUST be bleached at the experimental network boundaries.</li>
</ul>
<t>The following values are proposed for guidance only.  Because they are in the
experimental pool, they may be changed to suit the environment:</t>
<table>
<thead>
<tr>
<th>Name</th>
<th>Value (Binary)</th>
<th>Value (Decimal)</th>
</tr>
</thead>

<tbody>
<tr>
<td>SCE-RTT-FAIR</td>
<td>000011</td>
<td>3</td>
</tr>

<tr>
<td>SCE-MAX-MIN-FAIR</td>
<td>000111</td>
<td>7</td>
</tr>

<tr>
<td>SCE-POWER-FAIR</td>
<td>001011</td>
<td>11</td>
</tr>
</tbody>
</table></section>

<section anchor="diffserv-codepoints-for-public-use"><name>Diffserv Codepoints for Public Use</name>
<t>In the event that SCE is approved for public experiment, the DSCPs will be
allocated in an appropriate standards action pool, using a value that is
intended to be treated as best-effort traffic by existing deployed devices.</t>
<t>One of the SCE DSCPs SHOULD be set by sending endpoints on all SCE capable
traffic.  However, they neither need to be checked by middleboxes that do not
require them before marking SCE, nor by receiving endpoints before returning SCE
feedback.  That way, they can serve as hints for middleboxes, but the SCE
signaling mechanism is not dependent on end-to-end DSCP traversal.</t>
<t>Unless and until a public experiment is approved, the guidance in
<xref target="codepoints-private"></xref> MUST be followed.</t>
</section>
</section>

<section anchor="examples-of-use"><name>Examples of use</name>

<section anchor="codel-type-aqms"><name>Codel-type AQMs</name>
<t>A simple and natural way to implement SCE in a Codel-type AQM is to mark all
ECT packets as SCE if they are over half the Codel target sojourn time, and not
marked CE by Codel itself.  This threshold function does not necessarily
produce the best performance, but is very easy to implement and provides useful
information to SCE-aware flows, often sufficient to avoid receiving CE marks
whilst still efficiently using available capacity.</t>
<t>For a more sophisticated approach avoiding even small-scale oscillation, a
stochastic ramp function may be implemented with 100% marking at the Codel
target, falling to 0% marking at or above zero sojourn time.  The lower point
of the ramp should be chosen so that SCE is not accidentally signalled due to
CPU scheduling latencies or serialisation delays of single packets.  Absent
rigorous analysis of these factors, setting the lower limit at half the Codel
target should be safe in many cases.</t>
<t>The default configuration of Codel is 100ms interval, 5ms target.  A typical
ramp function for these parameters might cease marking below 2.5ms sojourn
time, increase marking probability linearly to 100% at 5ms, and mark at 100%
for sojourn times above 5ms (in which CE marking is also possible).</t>
<t>In single-queue AQMs, the above strategy will result in SCE flows yielding to
pressure from non-SCE flows, since CE marks do not occur until SCE marking has
reached 100%.  A balance between smooth SCE behaviour and fairness versus
non-SCE traffic can be found by having the marking ramp cross the Codel target
at some lower SCE marking rate, perhaps even 0%.  A two-part ramp, reaching
1/sqrt(X) at the Codel target (for some chosen X, a cwnd at which the crossover
between smoothness and fairness occurs) and ramping up more steeply thereafter,
has been implemented successfully for experimentation.</t>
<t>The CNQ algorithm <xref target="I-D.morton-tsvwg-cheap-nasty-queueing"></xref> offers a relatively simple way to limit this yielding
behaviour and ensure that, even in competition with non-SCE flows, SCE flows
maintain a reasonable minimum throughput capability.  This may be sufficient
to avoid the need for the two-part ramp described above.</t>
<t>Flow-isolating AQMs, including especially CNQ and DRR++ based algorithms,
should avoid signalling SCE to flows classified as &quot;sparse&quot;,
in order to encourage the fastest possible convergence to the fair share.</t>
</section>

<section anchor="red-type-aqms-including-pie"><name>RED-type AQMs (including PIE)</name>
<t>There are several reasonable methods of producing SCE signals in a RED-type AQM.</t>
<t>The simplest would be a threshold function, giving a hard boundary in queue depth
between 0% and 100% SCE marking.  This could be a sensible option for limited
hardware implementations.  The threshold should be set below the point at which
a growing queue might trigger CE marking or packet drops.</t>
<t>Another option would be to implement a second marking probability function,
occupying a queue-depth space just below that occupied by the main marking
probability function.  This should be arranged so that high marking rates
(ideally 100%) are achieved at or before the point at which CE marking or
packet drops begin.</t>
<t>For PIE specifically, a second marking probability function could be added with
the same parameters as the main marking probability function, except for a lower
QDELAY_REF value.  This would result in the SCE marking probability remaining
strictly higher than the CE marking probability for ECT flows.</t>
</section>

<section anchor="simple-two-queue-middleboxes"><name>Simple Two-Queue Middleboxes</name>
<t>In high-capacity or resource constrained SCE marking middleboxes, DSCP may be
used to select one of two queues, in lieu of implementing fairness steering.
Packets marked with an SCE DSCP are placed in an SCE queue, where an AQM
instance may mark congestion with either SCE or CE.  Packets not marked with an
SCE DSCP are placed in a second <xref target="RFC3168"></xref> queue, whose AQM instance may only
mark congestion with CE.  For approximate flow fairness, the queues may be
scheduled in proportion to the number of flows they contain.</t>
<t>Note that as long as the SCE DSCP remains intact from the sending endpoint to
the marking queue, the SCE queue may be used.  If it has been erased or altered
to a non-SCE DSCP, the packet will be placed in the <xref target="RFC3168"></xref> queue, and may
still benefit from standard ECN.</t>
<t>If this middlebox is to be used in public environments, some form of
unresponsive flow mitigation is warranted to ensure that flows haven't indicated
their support for either SCE or <xref target="RFC3168"></xref> ECN incorrectly.  If flows do not
respond to the signals they advertise support for, they will dominate competing
traffic in the same queue.</t>
</section>

<section anchor="tcp"><name>TCP</name>
<t>The proposed mechanism for TCP to feed back SCE signals to the sender is
outlined in <xref target="I-D.grimes-tcpm-tcpsce"></xref>.  Use is made of the redundant NS bit in
the TCP header, which was formerly associated with ECT(1) in the Nonce Sum
specification.</t>
<t>The recommended response to each single segment marked with SCE is to reduce
cwnd by an amortised 1/sqrt(cwnd) segments.  Other responses, such as the 1/cwnd
from DCTCP, are also acceptable but may perform less well.</t>
</section>

<section anchor="other"><name>Other</name>
<t>New transports under development, such as QUIC, may implement a fine-grained
signal back to the sender based on SCE.  QUIC itself appears to have this sort
of feedback already (counting ECT(0), ECT(1) and CE packets received), and the
data should be made available for congestion control.</t>
</section>
</section>

<section anchor="compatibility"><name>Compatibility</name>

<section anchor="existing-ecn-aqm-deployments"><name>Existing ECN &amp; AQM Deployments</name>
<t>SCE explicitly retains <xref target="RFC8511"></xref> compliant Multiplicative Decrease responses to
CE marks, and conventional Multiplicative Decrease responses to packet loss.
SCE senders' behaviour is thus naturally compliant with existing specifications
when running over existing networks.</t>
<t>Existing endpoints, supporting Not-ECT or <xref target="RFC3168"></xref> compliant congestion
control, are required to treat SCE marks (that is, ECT(1)) as identical to
ECT(0), and will thus transparently ignore SCE marks.  This is allowed for in
SCE's design, and allows SCE middleboxes to be deployed into a heterogeneous
network.</t>
<t>Hence the incremental deployability of SCE endpoints and middleboxes is good.</t>
</section>

<section anchor="l4s"><name>L4S</name>
<t>L4S <xref target="I-D.ietf-tsvwg-l4s-arch"></xref> also claims the ECT(1) codepoint, with
significantly different semantic meaning than SCE, so a discussion around the
potential for L4S and SCE compatibility is warranted.  In the L4S system, ECT(1)
is used to identify L4S flows, to distinguish them from <xref target="RFC3168"></xref> flows -
necessary since in L4S, the semantic meaning of CE marks is also changed.</t>
<t>Since L4S connections are explicitly negotiated through support of AccECN, and
AccECN doesn't support SCE, there is no ambiguity regarding the mode of the
connection as far as endpoints are concerned.</t>
<t>SCE middleboxes will treat L4S flows in the same way as <xref target="RFC3168"></xref> does.
However, because SCE middleboxes are likely to upgrade ECT(1) marked packets to
CE at a higher threshold than L4S middleboxes would, L4S flows will outcompete
non-L4S flows in a single SCE-aware queue.  This is the same known safety
concern with L4S deployment in regards to existing <xref target="RFC3168"></xref> queues, resulting
from the redefinition of CE in L4S.  Fairness steering in SCE middleboxes could
mitigate this.</t>
<t>L4S middleboxes may interpret ECT packets which have received SCE markings at
some other SCE-aware middlebox as though they were L4S traffic.  This may result
in a higher CE marking rate and/or different queuing behaviour.  It may also
result in the reordering of packets for both SCE and non-SCE aware flows through
L4S middleboxes, as packets marked ECT(1) will on average traverse the
bottleneck with lower delay than packets not marked ECT(1).  Although this
could be mitigated by <xref target="I-D.ietf-tcpm-rack"></xref>, it may lead to reduced throughput
and head-of-line blocking for flows that traverse both SCE and L4S bottlenecks.</t>
<t>There are at least two secondary concerns brought about by the L4S use of ECT(1)
as a traffic identifier:</t>

<ul>
<li>If it is found necessary to firewall L4S traffic off from the general
Internet, then SCE-marked packets are also likely to be dropped at this
boundary.  This could have a significantly detrimental effect on ECT traffic
traversing both an SCE and an L4S enabled network, even if the endpoints
are not explicitly SCE aware.</li>
<li>If it is found necessary to bleach ECT(1) in order to disable L4S in a
network, this would erase SCE signals sent to endpoints.  Although not ideal,
SCE transports would still safely fall back to relying on CE for congestion
notification.</li>
</ul>
<t>Lastly, an ambiguous definition of ECT(1) complicates network debugging with
packet captures, since it would be unclear whether a packet was marked ECT(1)
due to congestion at an SCE bottleneck, or because it is an L4S flow.  Although
examination of other packets in the flow could reduce this ambiguity, the
necessity of observing flow state is generally discouraged for debugging
purposes.</t>
<t>Thus far, the working group is operating under the assumption that coexistence
of SCE and L4S is not an option.</t>
</section>
</section>

<section anchor="ongoing-research-and-development"><name>Ongoing Research and Development</name>
<t>The SCE proposal is a work in progress, with ongoing or planned work in at
least the following areas:</t>

<ul>
<li>AQM strategies for a small number of FIFO queues</li>
<li>Tunnel traversal, with possible updates to <xref target="RFC3168"></xref> and <xref target="RFC6040"></xref></li>
<li>Research ways of reducing RTT dependence (Prague requirement #5)</li>
<li>Performance in environments with jitter and burstiness</li>
<li>New testing tools that cover many short flows, and VBR UDP flows</li>
<li>Testing, with guidance from <xref target="RFC2914"></xref>, <xref target="RFC7141"></xref> and <xref target="RFC5033"></xref></li>
</ul>
</section>

<section anchor="related-work"><name>Related Work</name>
<t><xref target="RFC8087"></xref> <xref target="RFC7567"></xref> <xref target="RFC7928"></xref> <xref target="RFC8290"></xref> <xref target="RFC8289"></xref> <xref target="RFC8033"></xref> <xref target="RFC8034"></xref>
<xref target="I-D.morton-tsvwg-interflow-intraflow-delays"></xref></t>
</section>

<section anchor="iana-considerations"><name>IANA Considerations</name>
<t>There are no IANA considerations.</t>
</section>

<section anchor="security-considerations"><name>Security Considerations</name>
<t>An adversary could inappropriately set SCE marks at middleboxes he controls to
slow down SCE-aware flows, eventually reaching a minimum congestion window.
However, the same threat already exists with respect to inappropriately setting
CE marks on normal ECN flows, and this would have a greater impact per mark.
Therefore no new threat is exposed by SCE in practice.</t>
<t>An adversary could also simply ignore SCE marks at the receiver, or ignore SCE
information fed back from the receiver to the sender, in an attempt to gain some
advantage in throughput.  Again, the same could be said about ignoring CE marks,
so no truly new threat is exposed.  Additionally, correctly implemented SCE
detection may actually improve long-term goodput compared to ignoring SCE.</t>
<t>An adversary could erase congestion information by converting SCE marks to ECT
or Not-ECT codepoints, thus hiding it from the receiver.  This has equivalent
effects to ignoring SCE signals at the receiver.  An identical threat already
exists for erasing congestion information from CE marked packets, and may be
mitigated by AQMs switching to dropping packets from flows observed to be
non-responsive to CE.</t>
<t>An adversary could drop SCE-marked packets, believing them to be bogons (see
also L4S Compatibility, above).  Endpoints should be able to recover from this
through retransmission and a reduction of cwnd.  However, it is possible for
this to lead to a significant denial of service.  A workaround is to disable ECN
for connections over the affected path.</t>
</section>

<section anchor="acknowledgements"><name>Acknowledgements</name>
<t>Thanks to Dave Taht for his contributions to the SCE effort, and his work on
writing the original draft-morton-taht-sce-00 that was submitted for IETF/104 on
which this draft is based.</t>
<t>Many thanks to John Gilmore, the members of the ecn-sane project and the
cake@lists.bufferbloat.net mailing list, and the former IETF AQM working group.</t>
</section>

</middle>

<back>
<references><name>Normative References</name>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8311.xml"/>
</references>
<references><name>Informative References</name>
<reference anchor="AFD" target="https://dl.acm.org/doi/10.1145/956981.956985">
  <front>
    <title>Approximate fairness through differential dropping</title>
    <author initials="R." surname="Pan"></author>
    <author initials="L." surname="Breslau"></author>
    <author initials="B." surname="Prabhakar"></author>
    <author initials="S." surname="Shenker"></author>
    <date year="2003" month="4"></date>
  </front>
  <seriesInfo name="in" value="ACM SIGCOMM Computer Communication Review"></seriesInfo>
</reference>
<reference anchor="BBR-CUBIC" target="https://www.uio.no/studier/emner/matnat/ifi/INF5072/v18/inf5072_example1.pdf">
  <front>
    <title>Comparing BBR and CUBIC Congestion Controls</title>
    <author initials="R.J." surname="Borgli"></author>
    <author initials="J." surname="Misund"></author>
    <date year="2018"></date>
  </front>
  <seriesInfo name="in" value="University of Oslo, INF5072"></seriesInfo>
</reference>
<reference anchor="BLUE" target="http://www.eecs.umich.edu/techreports/cse/99/CSE-TR-387-99.pdf">
  <front>
    <title>BLUE: A New Class of Active Queue Management Algorithms</title>
    <author initials="W." surname="Feng"></author>
    <author initials="D.D." surname="Kandlur"></author>
    <author initials="D." surname="Saha"></author>
    <author initials="K.G." surname="Shin"></author>
    <date year="1999" month="4"></date>
  </front>
  <seriesInfo name="in" value="Computer Science Technical Report"></seriesInfo>
</reference>
<reference anchor="CC-COMPAT" target="http://ppv.elte.hu/scalable-cc-comp">
  <front>
    <title>Compatibility of Scalable Congestion Controls</title>
    <author initials="F." surname="Fejes"></author>
    <author initials="G." surname="Gombos"></author>
    <author initials="S." surname="Laki"></author>
    <author initials="S." surname="Nadas"></author>
    <date year="2020"></date>
  </front>
  <seriesInfo name="in" value="Second Workshop on the Future of Internet Transport - FIT 2020, Paris, France (Virtual)"></seriesInfo>
</reference>
<reference anchor="CC-REVOLUTION" target="http://ppv.elte.hu/buffer-sizing">
  <front>
    <title>Who will Save the Internet from the Congestion Control Revolution?</title>
    <author initials="F." surname="Fejes"></author>
    <author initials="G." surname="Gombos"></author>
    <author initials="S." surname="Laki"></author>
    <author initials="S." surname="Nadas"></author>
    <date year="2019"></date>
  </front>
  <seriesInfo name="in" value="Workshop on Buffer Sizing, Stanford University"></seriesInfo>
</reference>
<reference anchor="COBALT" target="https://ieeexplore.ieee.org/abstract/document/8847054">
  <front>
    <title>Design and Evaluation of COBALT Queue Discipline</title>
    <author initials="J." surname="Palmei"></author>
    <author initials="S." surname="Gupta"></author>
    <author initials="P." surname="Imputato"></author>
    <author initials="J." surname="Morton"></author>
    <author initials="M.P." surname="Tahiliani"></author>
    <author initials="S." surname="Avallone"></author>
    <author initials="D." surname="Taht"></author>
    <date year="2019" month="9"></date>
  </front>
  <seriesInfo name="in" value="2019 IEEE International Symposium on Local and Metropolitan Area Networks (LANMAN)"></seriesInfo>
</reference>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml-ids/reference.I-D.grimes-tcpm-tcpsce.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml-ids/reference.I-D.ietf-tcpm-rack.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml-ids/reference.I-D.ietf-tsvwg-l4s-arch.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml-ids/reference.I-D.morton-tsvwg-cheap-nasty-queueing.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml-ids/reference.I-D.morton-tsvwg-codel-approx-fair.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml-ids/reference.I-D.morton-tsvwg-interflow-intraflow-delays.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml-ids/reference.I-D.morton-tsvwg-lightweight-fair-queueing.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2309.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2474.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.2914.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.3168.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.5033.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.6040.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7141.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7567.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.7928.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8033.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8034.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8087.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8289.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8290.xml"/>
<xi:include href="https://xml2rfc.ietf.org/public/rfc/bibxml/reference.RFC.8511.xml"/>
</references>

</back>

</rfc>
