<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="2"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="exp" docName="draft-ietf-lsr-isis-fast-flooding-04" ipr="trust200902">
	<front>
		<title abbrev="IS-IS Fast Flooding">IS-IS Fast Flooding</title>
		<author fullname="Bruno Decraene" initials="B." surname="Decraene">
			<organization>Orange</organization>
			<address>
				<email>bruno.decraene@orange.com</email>
			</address>
		</author>

		<author fullname="Les Ginsberg" initials="L" surname="Ginsberg">
			<organization>Cisco Systems</organization>

			<address>
				<postal>
					<street>821 Alder Drive</street>

					<city>Milpitas</city>

					<code>95035</code>

					<region>CA</region>

					<country>USA</country>
				</postal>


				<email>ginsberg@cisco.com</email>
			</address>
		</author>

		<author fullname="Tony Li" initials="T." surname="Li">
			<organization>Juniper Networks, Inc.</organization>
			<address>
				<phone/>
				<email>tony.li@tony.li</email>
			</address>
		</author>

		<author fullname="Guillaume Solignac" initials="G." surname="Solignac">

			<address>
				<email>gsoligna@protonmail.com</email>
			</address>
		</author>

		<author fullname="Marek Karasek" initials="M" surname="Karasek">
			<organization>Cisco Systems</organization>

			<address>
				<postal>
					<street>Pujmanove 1753/10a, Prague 4 - Nusle</street>

					<city>Prague</city>

					<region/>

					<code>10 14000</code>

					<country>Czech Republic</country>
				</postal>

				<phone/>

				<facsimile/>

				<email>mkarasek@cisco.com</email>

				<uri/>
			</address>
		</author>

		<author initials="C." surname="Bowers" fullname="Chris Bowers">
			<organization>Juniper Networks, Inc.</organization>
			<address>
				<postal>
					<street>1194 N. Mathilda Avenue</street>
					<city>Sunnyvale</city>
					<region>CA</region>
					<code>94089</code>
					<country>USA</country>
				</postal>
				<email>cbowers@juniper.net</email>
			</address>
		</author>

		<author initials="G." surname="Van de Velde" fullname="Gunter Van de Velde">
			<organization>Nokia</organization>
			<address>
				<postal>
					<street>Copernicuslaan 50</street>
					<city>Antwerp</city>
					<code>2018</code>
					<country>Belgium</country>
				</postal>
				<email>gunter.van_de_velde@nokia.com</email>
			</address>
		</author>

		<author fullname="Peter Psenak" initials="P" surname="Psenak">
			<organization>Cisco Systems</organization>

			<address>
				<postal>
					<street>Apollo Business Center Mlynske nivy 43</street>

					<city>Bratislava</city>

					<code>821 09</code>

					<country>Slovakia</country>
				</postal>

				<email>ppsenak@cisco.com</email>
			</address>
		</author>


		<author fullname="Tony Przygienda" initials="T" surname="Przygienda">
			<organization>Juniper</organization>

			<address>
				<postal>
					<street>1137 Innovation Way</street>

					<city>Sunnyvale</city>

					<region>Ca</region>

					<code/>

					<country>USA</country>
				</postal>

				<phone/>

				<facsimile/>

				<email>prz@juniper.net</email>

				<uri/>
			</address>
		</author>






		<date year="2023"/>
		<abstract>
		  <t>
		    Current Link State Protocol Data Unit (PDU)
		    flooding rates are much slower than what modern
		    networks can support.  The use of IS-IS at larger
		    scale requires faster flooding rates to achieve
		    desired convergence goals.  This document
		    discusses the need for faster flooding, the issues
		    around faster flooding, and some example
		    approaches to achieve faster flooding. It also
		    defines protocol extensions relevant to faster
		    flooding.
		  </t>
		</abstract>
	</front>
	<middle>

		<section title="Introduction">
			<t>Link state IGPs such as Intermediate-System-to-Intermediate-System
      (IS-IS) depend upon having consistent Link State Databases (LSDB) on all
      Intermediate Systems (ISs) in the network in order to provide correct
      forwarding of data packets. When topology changes occur, new/updated
      Link State PDUs (LSPs) are propagated network-wide. The speed of
      propagation is a key contributor to convergence time.</t>

			<t>Historically, flooding rates have been conservative - on the order of
      10s of LSPs/second. This is the result of guidance in the base specification
				<xref target="ISO10589"/>
 and early deployments when both CPU speeds and
      interface speeds were much slower and the scale of
      an area was much smaller than they are today.</t>

			<t>As IS-IS is deployed in greater scale both in the number of nodes in an
      area and in the number of neighbors per node, the impact of the historic
      flooding rates becomes more significant. Consider the bringup or failure
      of a node with 1000 neighbors. This will result in a minimum of 1000 LSP
      updates. At typical LSP flooding rates used today
      (33 LSPs/second), it would take 30+ seconds simply to send the updated
      LSPs to a given neighbor. Depending on the diameter of the network,
      achieving a consistent LSDB on all nodes in the network could easily
      take a minute or more.</t>

			<t>Increasing the LSP flooding rate therefore becomes an essential element
      of supporting greater network scale.</t>

			<t>	Improving the LSP flooding rate is complementary to protocol
	extensions that reduce LSP flooding traffic by reducing the
	flooding topology such as Mesh Groups <xref target="RFC2973"/>
	or Dynamic Flooding <xref target="I-D.ietf-lsr-dynamic-flooding"/>
. Reduction of the
	flooding topology does not alter the number of LSPs required
	to be exchanged between two nodes, so increasing the overall
	flooding speed is still beneficial when such extensions are in
	use. It is also possible that the flooding topology can be
	reduced in ways that prefer the use of neighbors that support
	improved flooding performance.</t>

	</section>

	<section anchor="Language" title="Requirements Language">
          <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
          NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
          "MAY", and "OPTIONAL" in this document are to be interpreted as
          described in BCP 14 <xref target="RFC2119"/> <xref
          target="RFC8174"/> when, and only when, they appear in all capitals,
          as shown here.</t>
	</section>

	<section anchor="HISTORY" title="Historical Behavior">
		<t>The base specification for IS-IS <xref target="ISO10589"/>
 was first
      published in 1992 and updated in 2002. The update made no changes in
      regards to suggested timer values. Convergence targets at the time were
      on the order of seconds and the specified timer values reflect that.
      Here are some examples:</t>

		<t>
			<figure>
				<artwork><![CDATA[minimumLSPGenerationInterval - This is the minimum time interval
     between generation of Link State PDUs. A source Intermediate 
     system shall wait at least this long before re-generating one
     of its own Link State PDUs.]]></artwork>
			</figure>
		</t>
		<t>
	The recommended value is 30 seconds.
		</t>
		<t>
			<figure>
				<artwork><![CDATA[minimumLSPTransmissionInterval - This is the amount of time an 
     Intermediate system shall wait before further propagating 
     another Link State PDU from the same source system.]]></artwork>
			</figure>
		</t>
		<t>
	The recommended value is 5 seconds.
		</t>
		<t>
			<figure>
				<artwork><![CDATA[partialSNPInterval - This is the amount of time between periodic 
     action for transmission of Partial Sequence Number PDUs.
     It shall be less than minimumLSPTransmission-Interval.]]></artwork>
			</figure>
		</t>
		<t>
	The recommended value is 2 seconds.
		</t>
		<t>Most relevant to a discussion of the LSP flooding rate is the recommended
      interval between the transmission of two different LSPs on a given
      interface.</t>

		<t>For broadcast interfaces, <xref target="ISO10589"/>
 defined:</t>

		<t>
			<figure>
				<artwork><![CDATA[  minimumBroadcastLSPTransmissionInterval - the minimum interval
     between PDU arrivals which can be processed by the slowest 
     Intermediate System on the LAN.]]></artwork>
			</figure>
		</t>

		<t>
	  The default value was defined as 33 milliseconds.
	  It is permitted to send multiple LSPs "back-to-back"
	  as a burst, but this was limited to 10 LSPs in a one second
	  period.
		</t>
		<t>
	  Although this value was specific to LAN interfaces, this has commonly
      been applied by implementations to all interfaces though that was not
      the original intent of the base specification. In fact Section
      12.1.2.4.3 states:</t>

		<t>
			<figure>
				<artwork><![CDATA[  On point-to-point links the peak rate of arrival is limited only 
  by the speed of the data link and the other traffic flowing on 
  that link.]]></artwork>
			</figure>
		</t>

		<t>Although modern implementations have not strictly adhered to the 33
      millisecond interval, it is commonplace for implementations to limit
      the flooding rate to an order of magnitude similar to the 33 ms value.</t>

		<t>In the past 20 years, significant work on achieving faster
      convergence - more specifically sub-second convergence - has resulted in
      implementations modifying a number of the above timers in order to
      support faster signaling of topology changes. For example,
      minimumLSPGenerationInterval has been modified to support millisecond
      intervals, often with a backoff algorithm applied to prevent LSP
      generation storms in the event of a series of rapid oscillations.</t>

		<t>However, the flooding rate has not been fundamentally altered.</t>
	</section>





	<section anchor="FloodingTLV" title="Flooding Parameters TLV">
		<t>
		    This document defines a new Type-Length-Value
		    tuple (TLV) called the "Flooding Parameters TLV"
		    that may be included in IS to IS Hellos (IIH) or
		    Partial Sequence Number PDUs (PSNPs). It allows
		    IS-IS implementations to advertise flooding
		    related parameters and capabilities which may be
		    of use to the peer in support of faster flooding.
		</t>
		<t>Type: 21</t>
		<t>Length: variable, the size in octets of the Value field</t>

		<t>Value: One or more sub-TLVs</t>
		<t>Several sub-TLVs are defined in this document. The support of any sub-TLV is OPTIONAL.</t>

		<t>
			For a given IS-IS adjacency, the Flooding
			Parameters TLV does not need to be advertised
			in each IIH or PSNP.  An IS uses the latest
			received value for each parameter until a new
			value is advertised by the peer.  However, as
			IIHs and PSNPs are not reliably exchanged, and
			may never be received, parameters SHOULD be
			sent even if there is no change in value since
			the last transmission.  For a parameter which
			has never been advertised, an IS SHOULD use
			its local default value.  That value SHOULD be
			configurable on a per node basis and MAY be
			configurable on a per interface basis.
		</t>
		<section anchor="LSPReceptionWindow" title="LSP Burst Window sub-TLV">
			<t>The LSP Burst Window sub-TLV advertises the maximum number of LSPs that the node can receive with no separation interval between LSPs.</t>
			<t>Type: 1</t>
			<t>Length: 4 octets</t>
			<t>Value: number of LSPs that can be sent back to back.</t>
		</section>
		<section anchor="InterfaceLSPTransmissionInterval" title="LSP Transmission Interval sub-TLV">
			<t>The LSP Transmission Interval sub-TLV advertises the minimum interval, in micro-seconds, between LSPs arrivals which can be received on this interface, after the maximum number of un-acknowledged LSPs has been sent.</t>
			<t>Type: 2</t>
			<t>Length: 4 octets</t>
			<t>Value: minimum interval, in micro-seconds, between two consecutive LSPs sent after the burst window has been used</t>
			<t>The LSP Transmission Interval is an advertisement of the receiver's steady-state LSP reception rate.</t>

		</section>
		<section anchor="LPP" title="LSPs Per PSNP sub-TLV">
			<t>The LSP per PSNP (LPP) sub-TLV advertises the number of received LSPs that triggers the immediate sending of a PSNP to acknowledge them.</t>
			<t>Type: 3</t>
			<t>Length: 2 octets</t>
			<t>Value: number of LSPs acknowledged per PSNP</t>
			<t>A node advertising this sub-TLV with a value LPP MUST send a PSNP once LPP LSPs have been received and need to be acknowledged.</t>
		</section>
		<section anchor="Flags" title="Flags sub-TLV">
			<t>The sub-TLV Flags advertises a set of flags.</t>
			<t>Type: 4</t>
			<t>Length: Indicates the length in octets (1-8) of the Value field. The length SHOULD be the minimum required to send all bits that are set.</t>
			<t>Value: List of flags.</t>
			<t>
				<figure>
					<artwork align="left">
          0 1 2 3 4 5 6 7 ...
         +-+-+-+-+-+-+-+-+...
         |O|              ...
         +-+-+-+-+-+-+-+-+...</artwork>
				</figure>
			</t>
			<t>When the O-flag (Ordered acknowledgement ) is set, the LSP will be
			acknowledged in the order they are received: a
			PSNP acknowledging N LSPs is acknowledging the
			N oldest LSPs received. The order inside the
			PSNP is meaningless. If the sender keeps track
			of the order of LSPs sent, this indication
			allows a fast detection of the loss of an
			LSP. This MUST NOT be used to trigger faster
			retransmission of LSP. This MAY be used to
			trigger a congestion signal.</t>
		</section>

      <section anchor="partialSNPI" title="Partial SNP Interval sub-TLV">
        <t>The Partial SNP Interval sub-TLV advertises the amount of
	time in milliseconds between periodic action for transmission of Partial
        Sequence Number PDUs. This time will trigger the sending of a PSNP
        even if the number of unacknowledged LSPs received on a given
        interface does not exceed LPP (<xref target="LPP"/>). The time is
	measured from the reception of the first unacknowldeged LSP.</t>

        <t>Type: 5</t>

        <t>Length: 2 octets</t>

        <t>Value: partialSNPInterval in milliseconds</t>

        <t>A node advertising this sub-TLV SHOULD send a PSNP at least once
        per Partial SNP Interval if one or more unacknowledged LSPs have been
        received on a given interface.</t>
      </section>

		<section anchor="RWIN" title="Receive Window sub-TLV">
			<t>The Receive Window (RWIN) sub-TLV advertises the maximum number of unacknowledged LSPs that the node can receive.</t>
			<t>Type: 6</t>
			<t>Length: 2 octets</t>
			<t>Value: maximum number of unacknowledged LSPs</t>			
		</section>


		<section anchor="TLVoperationLAN" title="Operation on a LAN interface">
			<t>On a LAN interface, all LSPs are link-level multicasts. Each LSP sent will be received by all ISs on the LAN and each IS will receive LSPs from all transmitters. In this section, we clarify how the flooding parameters should be interpreted in the context of a LAN.</t>
			<t>An LSP receiver on a LAN will communicate its desired flooding parameters using a single Flooding Parameters TLV, copies of which will be received by all transmitters. The flooding parameters sent by the LSP receiver MUST be understood as instructions from the receiver to each transmitter about the desired maximum transmit characteristics of each transmitter. The receiver is aware that there are multiple transmitters that can send LSPs to the receiver LAN interface. The receiver might want to take that into account by advertising more conservative values, e.g. a higher LSP Transmission Interval. When the transmitters receive the LSP Transmission Interval value advertised by a LSP receiver, the transmitters should rate limit LSPs according to the advertised flooding parameters. They should not apply any further interpretation to the flooding parameters advertised by the receiver.</t>
			<t>A given LSP transmitter will receive multiple flooding parameter advertisements from different receivers that may carry different flooding parameter values. A given transmitter SHOULD use the most convervative value on a per parameter basis. For example, if the transmitter receives multiple LSP Burst Window values, it should use the smallest value.</t>
			<t>The Designated Intermediate System (DIS) plays a special role in the operation of flooding on the LAN as it is responsible for responding to PSNPs sent on the LAN circuit which are used to request LSPs that the sender of the PSNP does not have. If the DIS does not support faster flooding this will impact the maximum flooding speed which could occur on a LAN. Use of LAN priority to prefer a node which supports faster flooding in the DIS election may be useful.</t>
			<t>NOTE: The focus of work used to develop the example algorithms discussed later in this document focused on operation over point to point interfaces. A full discussion of how best to do faster flooding on a LAN interface is therefore out of scope for this document.</t> 
		</section>

	</section>


	<section anchor="Receiver" title="Performance improvement on the receiver">

		<t>This section defines two behaviors that SHOULD be implemented on the receiver.</t>


		<section anchor="LSPACKRate" title="Rate of LSP Acknowledgments">
			<t>On point-to-point networks, PSNP PDUs provide acknowledgments for
        received LSPs. <xref target="ISO10589"/>
 suggests that some delay be
        used when sending PSNPs. This provides some optimization as multiple
        LSPs can be acknowledged in a single PSNP.</t>

			<t>
	  Faster LSP flooding benefits from a faster feedback
          loop. This requires a reduction in the delay in sending
          PSNPs.
			</t>

			<t>The receiver SHOULD reduce its partialSNPInterval. The choice of this lower value is a local choice. It may depend on the available processing power of the node, the number of adjacencies, and the requirement to synchronize the LSDB more quickly. 200 ms seems to be a reasonable value.</t>
			<t>
			  In addition to the timer based
			  partialSNPInterval, the receiver SHOULD keep
			  track of the number of unacknowledged LSPs
			  per circuit and level. When this number
			  exceeds a preset threshold of LSPs Per PSNP
			  (LPP), the receiver SHOULD immediately send
			  a PSNP without waiting for the PSNP timer to
			  expire. In case of a burst of LSPs, this
			  allows for more frequent PSNPs, giving
			  faster feedback to the sender. Outside of
			  the burst case, the usual time-based PSNP
			  approach comes into effect. The LPP SHOULD
			  also be less than or equal to 90 as this is
			  the maximum number of LSPs that can be
			  acknowledged in a PSNP at common MTU sizes,
			  hence waiting longer would not reduce the
			  number of PSNPs sent but would delay the
			  acknowledgements. Based on experimental
			  evidence, 15 unacknowledged LSPs is a good
			  value assuming that the Receive Window is
			  at least 30 and reasonably fast CPUs for
			  both the transmitter and receiver. More
			  frequent PSNPs gives the transmitter more
			  feedback on receiver progress, allowing the
			  transmitter to continue transmitting while
			  not burdening the receiver with undue
			  overhead.
			</t>
			<t>By deploying both the time-based and the threshold-based PSNP approaches, the receiver can be adaptive to both LSP bursts and infrequent LSP updates.  </t>

			<t>As PSNPs also consume link bandwidth, packet queue space, and
        protocol processing time on receipt, the increased sending of PSNPs
        should be taken into account when considering the rate at which LSPs
        can be sent on an interface.</t>
		</section>


		<section anchor="PKTPRI" title="Packet Prioritization on Receive">
			<t>There are three classes of PDUs sent by IS-IS:</t>

			<t>
				<list style="symbols">
					<t>Hellos</t>

					<t>LSPs</t>


					<t>Complete Sequence Number PDUs (CSNPs) and PSNPs</t>
				</list>Implementations today may prioritize the reception of Hellos
        over LSPs and SNPs in order to prevent a burst of LSP updates from
        triggering an adjacency timeout which in turn would require additional
        LSPs to be updated.</t>

			<t>CSNPs and PSNPs serve to trigger or acknowledge the transmission of specified
        LSPs. On a point-to-point link, PSNPs acknowledge the receipt of one
        or more LSPs. 
        For this reason, <xref target="ISO10589"/>
 specifies a delay
        (partialSNPInterval) before sending a PSNP so that the number of PSNPs
        required to be sent is reduced. On receipt of a PSNP, the set of LSPs
        acknowledged by that PSNP can be marked so that they do not need to be
        retransmitted.</t>

			<t>If a PSNP is dropped on reception, 
        the set of LSPs advertised in the PSNP cannot be marked as
        acknowledged and this results in needless retransmissions that will
        further delay transmission of other LSPs that have yet to be
        transmitted. It may also make it more likely that a receiver becomes
        overwhelmed by LSP transmissions.</t>

			<t>It is therefore RECOMMENDED that implementations prioritize the
        receipt of Hellos and then SNPs over LSPs. Implementations MAY also prioritize IS-IS packets over other less critical protocols.</t>
		</section>

	</section>

	<section anchor="Control" title="Congestion and Flow Control">

		<section anchor="Overview" title="Overview">
			<t>Ensuring the goodput between two entities is a layer 4 responsibility as per the OSI model and a typical example is the TCP protocol defined in
				<xref target="RFC9293"></xref> and relies on the flow control, congestion control, and reliability mechanisms of the protocol.
			</t>
			<t>Flow control creates a control loop between a transmiter and a receiver so that the transmitter does not overwhelm the receiver. TCP provides a mean for the receiver to govern the amount of data sent by the sender through the use of a sliding window.</t>
			<t>Congestion control creates multiple interacting control loops between multiple transmitters and multiple receivers to prevent the transmitters  from overwhelming the overall network. For an IS-IS adjacency, the network between two IS-IS neighbors is relatively limited in scope and consist of a link that is typically over-sized compared to the capability of the IS-IS speakers, but may also includes components inside both routers such as a switching fabric, line card CPU, and forwarding plane buffers that may experience congestion. These resources may be shared across multiple IS-IS adjacencies for the system and it is the responsibility of congestion control to ensure that these are shared reasonably.</t>
			<t>Reliability provides loss detection and recovery. IS-IS already has mechanisms to ensure the reliable transmission of LSPs. This is not changed by this document.</t>

			<t>The following two sections provides examples of Flow and/or Congestion control algorithms as examples that may be implemented by taking advantage of the extensions defined in this document. They are non-normative. The IS-IS extensions defined in  <xref target="FloodingTLV"/> and  <xref target="Receiver"/> are generic and are designed to support different sender-side algorithms. A sender can unilaterally choose a different algorithm to use.</t>
		</section>

		<section anchor="ControlExample1" title="Congestion and Flow Control algorithm 1">


			<section anchor="FlowControl" title="Flow control">
				<t>
		    A flow control mechanism creates a control loop
		    between a single instance of a transmitter and a
		    single receiver. This section uses a mechanism
		    similar to the TCP receive window to allow the
		    receiver to govern the amount of data sent by the
		    sender. This receive window ('rwin') indicates an
		    allowed number of LSPs that the sender may
		    transmit before waiting for an acknowledgment. The
		    size of the receive window, in units of LSPs, is
		    initialized with the value advertised by the
		    receiver in the Receive Window sub-TLV. If no
		    value is advertised, the transmitter should
		    initialize rwin with its own local value.
				</t>
				<t>
		    When the transmitter sends a set of LSPs to the
		    receiver, it subtracts the number of LSPs sent
		    from rwin. If the transmitter receives a PSNP,
		    then rwin is incremented for each acknowledged
		    LSP. The transmitter must ensure that the value of
		    rwin never goes negative.
				</t>


				<section anchor="TLVoperationP2P" title="Operation on a point to point interface">

					<t>By sending the LSP Burst Window sub-TLV, a node advertises to its neighbor its ability to receive that many un-acknowledged LSPs from the neighbor, with no separation interval. This is akin to a receive window or sliding window in flow control. In some implementations, this value should reflect the IS-IS socket buffer size. Special care must be taken to leave space for CSNP and PSNP (SNP) PDUs and IIHs if they share the same input queue. In this case, this document suggests advertising an LSP Burst Window corresponding to half the size of the IS-IS input queue. </t>

					<t>By advertising an LSP Transmission Interval sub-TLV, a node advertises its ability to receive LSPs separated by at least the advertised value, outside of LSP bursts.</t>

					<t>The LSP transmitter MUST NOT exceed these parameters. After having sent a full burst of un-acknowledged LSPs, it MUST send the following LSPs with an LSP Transmission Interval between LSP arrivals. For CPU scheduling reasons, this rate may be averaged over a small period e.g. 10 to 30ms.</t>

					<t>If either the LSP transmitter or receiver does not adhere to these parameters, for example because of transient conditions, this causes no fatal condition to the operation of IS-IS. In the worst case, an LSP is lost at the receiver and this situation is already remedied by mechanisms in <xref target="ISO10589"/>
. After a few seconds, neighbors will exchange PSNPs (for point to point interfaces) or CSNPs (for broadcast interfaces) and recover from the lost LSPs. This worst case should be avoided as those additional seconds impact convergence time as the LSDB is not fully synchronized. Hence it is better to err on the conservative side and to under-run the receiver rather than over-run it.</t>


				</section>
				<section title="Operation on a
						broadcast LAN
						interface">
				  <t>
				    In order for the LSP Burst Window
				    to be a useful parameter, an LSP
				    transmitter needs to be able to
				    keep track of the number of
				    un-acknowledged LSPs it has sent
				    to a given LSP receiver. On a LAN
				    there is no explicit
				    acknowledgment of the receipt of
				    LSPs between a given LSP
				    transmitter and a given LSP
				    receiver. However, an LSP
				    transmitter on a LAN can infer
				    whether any LSP receiver on the
				    LAN has requested retransmission
				    of LSPs from the DIS by monitoring
				    PSNPs generated on the LAN. If no
				    PSNPs have been generated on the
				    LAN for a suitable period of time,
				    then an LSP transmitter can safely
				    set the number of un-acknowledged
				    LSPs to zero. Since this suitable
				    period of time is much higher than
				    the fast acknowledgment of LSPs
				    defined in <xref
				    target="LSPACKRate"/>, the
				    sustainable transmission rate of
				    LSPs will be much slower on a LAN
				    interface than on a point to point
				    interface. The LSP Burst Window is
				    still very useful for the first
				    burst of LSPs sent, especially in
				    the case of a single node failure
				    that requires the flooding of a
				    relatively small number of LSPs.
				  </t>
				</section>


		</section>
		<section anchor="CongestionControl" title="Congestion Control">
		  <t>Whereas flow control prevents the sender from overwhelming the receiver, congestion control prevents senders from overwhelming the network. For an IS-IS adjacency, the network between two IS-IS neighbors is relatively limited in scope and includes a single link which is typically over-sized compared to the capability of the IS-IS speakers.</t>
			<t>This section describes one sender-side congestion control algorithm largely inspired by the TCP congestion control algorithm <xref target="RFC5681"></xref>.</t>
			<t>The proposed algorithm uses a variable congestion window 'cwin'. It plays a role similar to the receive window described above. The main difference is that cwin is dynamically changed according to various events described below.</t>

			<section anchor="CC1Core" title="Core algorithm">
				<t>In its simplest form, the congestion control algorithm looks like the following:</t>
				<figure anchor="cc1_core_algo">
					<artwork>
   +---------------+
   |               |
   |               v
   |   +----------------------+
   |   | Congestion avoidance |
   |   + ---------------------+
   |               |
   |               | Congestion signal
   ----------------+
					</artwork>
				</figure>
				<t>The algorithm starts with cwin := LPP + 1. In the congestion avoidance phase, cwin increases as LSPs are acked: for every acked LSP, cwin += 1 / cwin. Thus, the sending rate roughly increases linearly with the RTT. Since the RTT is low in many IS-IS deployments, the sending rate can reach fast rates in short periods of time.</t>

				<t>When updating cwin, it must not become higher than the number of LSPs waiting to be sent, otherwise the sending will not be paced by the receiving of acks. Said differently, tx pressure is needed to maintain and increase cwin.</t>

				<t>When the congestion signal is triggered, cwin is set back to its initial value and the congestion avoidance phase starts again.</t>
			</section>
			<section anchor="CC1CongestionSignals" title="Congestion signals">
				<t>The congestion signal can take various forms. The more reactive the congestion signals, the less LSPs will be lost due to congestion. However, congestion signals too aggressive will cause a sender to keep a very low sending rate even without actual congestion on the path.</t>

				<t>Two practical signals are given hereafter.</t>

				<t>Timers: when receiving acknowledgements, a sender estimates the acknowledgement time of the receiver. Based on this estimation, it can infer that a packet was lost, and infer congestion on the path.</t>
				<t>There can be a timer per LSP, but this can become costly for implementations. It is possible to use only a single timer t1 for every LSPs: during t1, sent LSPs are recorded in a list list_1. Once the RTT is over, list_1 is kept and another list list_2 is used to store the next LSPs. LSPs are removed from the lists when acked. At the end of the second t1 period, every LSP in list_1 should have been acked, so list_1 is checked to be empty. list_1 can then be reused for the next RTT.</t>
				<t>There are multiple strategies to set the timeout value t1. It should be based on measures of the maximum acknowledgement time (MAT) of each PSNPs. The simplest one is to use a exponential moving average of the MATs, like <xref target="RFC6298"/>. A more elaborate one is to take a running maximum of the MATs over a period of time of a few seconds. This value should include a margin of error to avoid false positives (e.g. estimated MAT measure variance) which would have a significant impact on performance.</t>

				<t> Reordering: a sender can record its sending order and check that acknowledgements arrive on the same order than LSPs. This makes an additional assumption and should ideally be backed up by a confirmation by the receiver that this assumption stands. The O flag defined in <xref target="Flags"/> serves this purpose. </t>

			</section>

			<section anchor="CC1Refinement1" title="Refinement 1">
				<t>With the algorithm presented above, if congestion is detected, cwin goes back to its initial value, and does not use the information gathered in previous congestion avoidance phases.</t>

				<t>It is possible to use a fast recovery phase once congestion is detected, to avoid going through this linear rate of growth from scratch. When congestion is detected, a fast recovery threshold frthresh is set to frthresh := cwin / 2. In this fast recovery phase, for every acked LSP, cwin += 1. Once cwin reaches frthresh, the algorithm goes back to the congestion avoidance phase.</t>

				<figure anchor="cc1_algo_refinement_1">
					<artwork>
   +---------------+
   |               |
   |               v
   |   +----------------------+
   |   | Congestion avoidance |
   |   + ---------------------+
   |               |
   |               | Congestion signal
   |               |
   |   +----------------------+
   |   |     Fast recovery    |
   |   +----------------------+
   |               |
   |               | frthresh reached
   ----------------+
					</artwork>
				</figure>

			</section>
			<section anchor="CC1Refinement2" title="Refinement 2">
				<t>The rates of increase were inspired from TCP <xref target="RFC5681"/>, but it is possible that a different rate of increase for cwin in the congestion avoidance phase actually yields better results due to the low RTT values in most IS-IS deployments.</t>
			</section>
			<section anchor="cc_remarks" title="Remarks">
			  <t>
			    This algorithm's performance is dependent
			    on the LPP value. Indeed, the smaller LPP
			    is, the more information is available for
			    the congestion control algorithm to
			    perform well. However, it also increases
			    the resources spent on sending PSNPs, so a
			    tradeoff must be made. This document
			    recommends to use an LPP of 15 or less. If
			    an LSP Burst Window is advertised, LPP
			    SHOULD be lower and the best performance
			    is achieved when LPP is an integer
			    fraction of the LSP Burst Window.
			  </t>

				<t>Note that this congestion control algorithm benefits from the extensions proposed in this document. The advertisement of a receive window from the receiver (<xref target="FlowControl"/>) avoids the use of an arbitrary maximum value by the sender. The faster acknowledgment of LSPs (<xref target="LSPACKRate"/>) allows for a faster control loop and hence a faster increase of the congestion window in the absence of congestion.
				</t>
			</section>
		</section>
		
		<section anchor="Pacing" title="Pacing">
			<t>As discussed in <xref target="RFC9002" sectionFormat="comma" section="7.7" /> a sender SHOULD pace sending of all in-flight LSP based on input from the congestion controller.</t>
			<t>Sending multiple packets without any delay between them creates a packet burst that might cause short-term congestion and losses. Senders MUST either use pacing or limit such bursts. Senders SHOULD limit bursts to the initial congestion window. A sender with knowledge that the receiver can absorb larger bursts, such as by receiving the LSP Burst Window sub-TLV from this receiver MAY use a higher limit.</t>
			<t>Senders can implement pacing as they choose. A perfectly paced sender spreads packets exactly evenly over time. For a window-based congestion controller, such as the one in this section, that rate can be computed by averaging the congestion window over the RTT. Expressed as an inter-packet interval in units of time:</t>
			<t>interval = ( smoothed_rtt / congestion_window ) / N</t> 
						<t>Using a value for N that is small, but at least 1 (for example, 1.25) ensures that variations in RTT do not result in underutilization of the congestion window.</t>
			<t>Practical considerations, such as scheduling delays and computational efficiency, can cause a sender to deviate from this rate over time periods that are much shorter than an RTT.</t>
			<t>One possible implementation strategy for pacing uses a leaky bucket algorithm, where the capacity of the "bucket" is limited to the maximum burst size and the rate the "bucket" fills is determined by the above function.</t>
		</section>

		<section anchor="sec_determining_values" title="Determining values to be advertised in the Flooding Parameters TLV">
			<t>The values that a receiver advertises do not need to be perfect. If the values are too low then the transmitter will not use the full bandwidth or available CPU resources. If the values are too high then the receiver may drop some LSPs during the first RTT and this loss will reduce the usable receive window and the protocol mechanisms will allow the adjacency to recover. Flooding several orders of magnitude slower than both nodes can achieve will hurt performance, as will consistently overloading the receiver.</t>
			<t>The values advertised need not be dynamic as feedback is provided by the acknowledgment of LSPs in SNP messages. Acknowledgments provide a feedback loop on how fast the LSPs are processed by the receiver. They also signal that the LSPs can be removed from receive window, explicitly signaling to the sender that more LSPs may be sent. By advertising relatively static parameters, we expect to produce overall flooding behavior similar to what might be achieved by manually configuring per-interface LSP rate limiting on all interfaces in the network. The advertised values may be based, for example, on an offline tests of the overall LSP processing speed for a particular set of hardware and the number of interfaces configured for IS-IS. With such a formula, the values advertised in the Flooding Parameters TLV would only change when additional IS-IS interfaces are configured.</t>
			<t>The values may be updated dynamically, to reflect the relative change of load of the receiver, by improving the values when the receiver load is getting lower and degrading the values when the receiver load is getting higher. For example, if LSPs are regularly dropped, or if the queue regularly comes close to being filled, then the values may be too high. On the other hand, if the queue is barely used (by IS-IS), then values may be too low.</t>
			<t>The values may also be absolute value reflecting relevant average hardware resources that are been monitored, typically the amount of buffer space used by incoming LSPs. In this case, care must be taken when choosing the parameters influencing the values in order to avoid undesirable or instable feedback loops. It would be undesirable to use a formula that depends, for example, on an active measurement of the instantaneous CPU load to modify the values advertised in the Flooding Parameters TLV. This could introduce feedback into the IGP flooding process that could produce unexpected behavior.</t>
		</section>

		<section anchor="OPS_Considerations" title="Operation considerations">
			<t>As discussed in  <xref target="TLVoperationLAN"/>, the solution is more effective on point to point adjacencies. Hence a broadcast interface (e.g. Ethernet) only shared by two IS-IS neighbhors should be configured as point to point in order to have a more effective flooding.</t>
		</section>
	</section>
	<section anchor="ControlExample2" title="Congestion Control algorithm 2">
	          <t>This section describes a congestion control algorithm based on
        performance measured by the transmitter without dependance on
        signaling from the receiver.</t>

        <section anchor="Ex2-arch" title="Router Architecture Discussion">
          <t>(The following description is an abstraction - implementation
          details vary.)</t>

          <t>Existing router architectures may utilize multiple input queues.
          On a given line card, IS-IS PDUs from multiple interfaces may be
          placed in a rate limited input queue. This queue may be dedicated to
          IS-IS PDUs or may be shared with other routing related packets.</t>

          <t>The input queue may then pass IS-IS PDUs to a "punt queue" which
          is used to pass PDUs from the data plane to the control plane. The
          punt queue typically also has controls on its size and the rate at
          which packets will be punted.</t>

          <t>An input queue in the control plane may then be used to assemble
          PDUs from multiple linecards, separate the IS-ISs PDU from other
          types of packets, and place the IS-IS PDUs in an input queue
          dedicated to the IS-IS protocol.</t>

          <t>The IS-IS input queue then separates the IS-IS PDUs and directs
          them to an instance specific processing queue. The instance
          specififc processing queue may then further separate the IS-IS PDUs
          by type (IIHs, SNPs, and LSPs) so that separate processing threads
          with varying priorities may be employed to process the incoming
          PDUs.</t>

          <t>In such an architecture, it may be difficult for IS-IS in the
          control plane to accurately track the state of the various input
          queues and determine what value should be advertised as a current
          receive window.</t>

          <t>The following section describes a congestion control algorithm
          based on performance measured by the transmitter without dependance
          on signaling from the receiver.</t>
        </section>

        <section anchor="Ex2-tx" title="Transmitter Based Flow Control">
          <t>The congestion control algorithm described in this section does
          not depend upon direct signaling from the receiver. Instead it
          adapts the tranmsmission rate based on measurement of the actual
          rate of acknowledgments received.</t>

          <t>When flow control is necessary, it can be implemented
          based on knowledge of the current flooding
          rate and the current acknowledgement rate. Such an algorithm is a
          local matter and there is no requirement or intent to standardize an
          algorithm. There are a number of aspects which serve as guidelines
          which can be described. </t>

          <t>A maximum target LSP transmission rate (LSPTxMax) SHOULD be
          configurable. This represents the fastest LSP transmission rate
          which will be attempted. This value SHOULD be applicable to all
          interfaces and SHOULD be consistent network wide.</t>

          <t>When the current rate of LSP transmission (LSPTxRate) exceeds the
          capabilities of the receiver, the flow control algorithm needs to
          quickly and aggressively reduce the LSPTxRate. Slower
          responsiveness is likely to result in a larger number of
          retransmissions which can introduce much larger delays in
          convergence.</t>

          <t>Dynamic adjustment of the rate of LSP transmission (LSPTxRate)
          upwards (i.e., faster) SHOULD be done less aggressively and only be
          done when the neighbor has demonstrated its ability to sustain the
          current LSPTxRate.</t>

          <t>The flow control algorithm MUST NOT assume the receive
          performance of a neighbor are static, i.e., it MUST handle
          transient conditions which result in a slower or faster receive rate
          on the part of a neighbor.</t>

          <t>The flow control algorithm SHOULD consider the expected delay
          time in receiving an acknowledgment. It therefore incorporates the
          neighbor partialSNPInterval(<xref target="partialSNPI"/>) to help
          determine whether acknowlegments are keeping pace with the rate of
          LSPs transmitted. In the absence of an advertisement of
          partialSNPInterval a locally configured value can be used.</t>
	</section>
      </section>
    </section>

<section anchor="IANA_Consideration" title="IANA Considerations">
	<section anchor="IANA_Consideration1" title="Flooding Parameters TLV">

	<t>IANA has made the following temporary allocation from the IS-IS TLV codepoint registry.</t>
	<figure anchor="IANA_Registration" title=''>
		<preamble></preamble>
		<artwork align="center">
   Type    Description                    IIH   LSP   SNP   Purge
   ----    ---------------------------    ---   ---   ---   ---
    21    Flooding Parameters TLV         y     n     y     n
		</artwork>
	</figure>

	</section>

	<section anchor="IANA_Consideration2" title="Registry: IS-IS Sub-TLV for Flooding Parameters TLV">
	<t>This document creates the following sub-TLV Registry:</t>
	<t>Name: IS-IS Sub-TLVs for Flooding Parameters TLV.</t>
	<t>Registration Procedure(s): Expert Review</t>
	<t>Expert(s): TBD</t>
	<t>Description: This registry defines sub-TLV for the Flooding Parameters TLV(21).</t>
	<t>Reference: This document.</t>
	<texttable anchor="Registry_Flooding" title="Initial Sub-TLV allocations for Flooding Parameters TLV">
		<ttcol align='center'>Type</ttcol>
		<ttcol align='left'>Description</ttcol>
		<c>0</c>
		<c>Reserved</c>
		<c>1</c>
		<c>LSP Burst Window</c>
		<c>2</c>
		<c>LSP Transmission Interval</c>
		<c>3</c>
		<c>LSPs Per PSNP</c>
		<c>4</c>
		<c>Flags</c>
		<c>5</c>
		<c>Partial SNP Interval</c>
		<c>6</c>
		<c>Receive Window</c>
		<c>7-255</c>
		<c>Unassigned</c>
	</texttable>
	</section>

	<section anchor="IANA_Consideration3" title="Registry: IS-IS Bit Values for Flooding Parameters Flags Sub-TLV">
      <t>This document also requests IANA to create a new registry for
      assigning Flag bits advertised in the Flags sub-TLV.</t>

      <t>Name: IS-IS Bit Values for Flooding Parameters Flags Sub-TLV.</t>

      <t>Registration Procedure: Expert Review</t>

      <t>Expert Review Expert(s): TBD</t>

	  <t>Description: This registry defines bit values for the Flags sub-TLV(4) advertised in the Flooding Parameters TLV(21).</t>
	  
	  <t>Reference: This document.</t>

	<texttable anchor="Registry_Flags" title="Initial bit allocations for Flags Sub-TLV">
		<ttcol align='center'>Bit #</ttcol>
		<ttcol align='left'>Description</ttcol>
		<c>0</c>
		<c>Ordered acknowledgement (O-flag)</c>
	</texttable>

    </section>
	</section>

    <section anchor="Security" title="Security Considerations" toc="default">


	<t>
    Security concerns for IS-IS are addressed in <xref target="ISO10589"/>
,
	<xref target="RFC5304"/>
, and <xref target="RFC5310"/>
.  These documents
    describe mechanisms that provide the authentication and integrity of IS-IS
    PDUs, including SNPs and IIHs. These authentication mechanisms are not
    altered by this document.</t>
<t>
    With the cryptographic mechanisms described in <xref target="RFC5304"/>
    and <xref target="RFC5310"/>
, an attacker wanting to advertise an incorrect
    Flooding Parameters TLV would have to first defeat these mechanisms.
</t>
<t>In the absence of cryptographic authentication, as IS-IS does not run over IP but directly over the link layer, it's considered difficult to inject false SNP/IHH without having access to the link layer.</t>
<t>If a false SNP/IIH is sent with a Flooding Parameters TLV set to conservative values, the attacker can reduce the flooding speed between the two adjacent neighbors which can result in LSDB inconsistencies and transient forwarding loops. However, it is not significantly different than filtering or altering LSPs which would also be possible with access to the link layer. In addition, if the downstream flooding neighbor has multiple IGP neighbors, which is typically the case for reliability or topological reasons, it would receive LSPs at a regular speed from its other neighbors and hence would maintain LSDB consistency.</t>
<t>If a false SNP/IIH is sent with a Flooding Parameters TLV set to aggressive values, the attacker can increase the flooding speed which can either overload a node or more likely generate loss of LSPs. However, it is not significantly different than sending many LSPs which would also be possible with access to the link layer, even with cryptographic authentication enabled. In addition, IS-IS has procedures to detect the loss of LSPs and recover.</t>
<t>This TLV advertisement is not flooded across the network but only sent between adjacent IS-IS neighbors. This would limit the consequences in case of forged messages, and also limits the dissemination of such information.</t>
</section>

<section anchor="Contributors" title="Contributors">
<t>The following people gave a substantial contribution to the content of this document and should be considered as coauthors:<list style="symbols">
	<t>Acee Lindem, Cisco Systems, acee@cisco.com</t>
	<t>Jayesh J, Juniper Networks, jayeshj@juniper.net</t>
</list></t>
</section>

<section anchor="Acknowledgments" title="Acknowledgments">
<t>The authors would like to thank Henk Smit, Sarah Chen, Xuesong Geng, Pierre Francois and Hannes Gredler for their reviews, comments and suggestions.</t>
<t>The authors would like to thank David Jacquet, Sarah Chen, and Qiangzhou Gao for the tests performed on commercial implementations and their identification of some limiting factors.</t>
</section>
</middle>
<back>
<references title="Normative References">
<?rfc include="reference.RFC.2119"?>
<?rfc include="reference.RFC.8174"?>
<?rfc include="reference.RFC.5304"?>
<?rfc include="reference.RFC.5310"?>
<?rfc include="reference.RFC.6298"?>
<reference anchor="ISO10589">
<front>
	<title>Intermediate system to Intermediate system intra-domain routeing information exchange protocol for use in conjunction with the protocol for providing the connectionless-mode Network Service (ISO 8473)</title>
	<author>
		<organization abbrev="ISO">International Organization for Standardization</organization>
	</author>
	<date month="Nov" year="2002"/>
</front>
<seriesInfo name="ISO/IEC" value="10589:2002, Second Edition"/>
</reference>
</references>
<references title="Informative References">
<?rfc include="reference.I-D.ietf-lsr-dynamic-flooding"?>
<?rfc include="reference.RFC.9293"?>
<?rfc include="reference.RFC.9002"?>
<?rfc include="reference.RFC.2973"?>
<?rfc include="reference.RFC.5681"?>
</references>
<section anchor="authors-notes" title="Changes / Author Notes">
<t>[RFC Editor: Please remove this section before publication]</t>
<t>IND 00: Initial version.</t>
<t>WG 00: No change.</t>
<t>WG 01: IANA allocated code point.</t>
<t>WG 02: No change.</t>
<t>WG 03: <list style="symbols">
	<t>Pacing section added (taken from RFC 9002).</t>
	<t>Some text borrowed from RFC 9002 (QUIC Loss Detection and Congestion Control).</t>
	<t>Considerations on the special role of the DIS.</t>
	<t>Editorial changes.</t>
</list></t>
<t>WG 04: <list style="symbols">
	<t>Update IANA section as per IANA editor comments (2023-03-23).</t>
</list></t>

</section>
    <section anchor="open-issues" title="Issues for Further Discussion">
      <t>[RFC Editor: Please remove this section before publication]</t>

      <t>This section captures issues which the authors either have not yet
      had time to address or on which the authors have not yet reached
      consensus. Future revisions of this document may include new/altered
      text relevant to these issues.</t>

      <t>
	There are no open issues at this time.
      </t>

    </section>
</back>
</rfc>
