<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
    which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
    There has to be one entity for each item to be referenced. 
    An alternate method (rfc include) is described in the references. -->

<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC2544 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2544.xml">
<!ENTITY RFC4814 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.4814.xml">
<!ENTITY RFC5180 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5180.xml">
<!ENTITY RFC8174 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8174.xml">
<!ENTITY RFC8219 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8219.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs), 
    please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
    (Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space 
    (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="info" docName="draft-lencse-bmwg-benchmarking-stateful-04" ipr="trust200902">

  <front>
    <!-- The abbreviated title is used in the page header - it is only necessary if the 
          full title is longer than 39 characters -->

    <title abbrev="Benchmarking Stateful Gateways">Benchmarking Methodology
    for Stateful NATxy Gateways using RFC 4814 Pseudorandom Port Numbers</title>

    <!-- add 'role="editor"' below for the editors if appropriate -->

    <!-- Another author who claims to be an editor -->

    <author fullname="Gabor Lencse" initials="G." surname="Lencse">
      <organization>Szechenyi Istvan University</organization>
      <address>
        <postal>
          <street>Egyetem ter 1.</street>
          <!-- Reorder these if your country does things differently -->
          <city>Gyor</city>
          <region></region>
          <code>H-9026</code>
          <country>Hungary</country>
        </postal>
        <phone></phone>
        <email>lencse@sze.hu</email>
        <uri></uri>
      </address>
    </author>

    <author fullname="Keiichi Shima" initials="K." surname="Shima">
	  <organization>IIJ Innovation Institute</organization>
      <address>
        <postal>
          <street>Iidabashi Grand Bloom, 2-10-2 Fujimi</street>
          <city>Chiyoda-ku</city>
          <region>Tokyo</region>
          <code>102-0071</code>
          <country>Japan</country>
        </postal>
        <phone></phone>
        <email>keiichi@iijlab.net</email>
        <uri></uri>
      </address>
    </author>

    <date year="2022" />

    <!-- Meta-data Declarations -->

    <area>Operations and Management Area</area>

    <workgroup>Benchmarking Methodology Working Group</workgroup>

    <!-- WG name at the upperleft corner of the doc,
          IETF is fine for individual submissions.  
    If this element is not present, the default is "Network Working Group",
          which is used by the RFC Editor as a nod to the history of the IETF. -->

    <keyword>Benchmarking, Stateful NATxy, Measurement Procedure, Throughput, Frame Loss Rate, Latency, PDV</keyword>

    <!-- Keywords will be incorporated into HTML output
          files in a meta tag but they have no effect on text or nroff
          output. If you submit your draft to the RFC Editor, the
          keywords will be used for the search engine. -->

    <abstract>
      <t>RFC 2544 has defined a benchmarking methodology for network
      interconnect devices. RFC 5180 addressed IPv6 specificities and it also
      provided a technology update, but excluded IPv6 transition technologies.
      RFC 8219 addressed IPv6 transition technologies, including stateful NAT64.
	  However, none of them discussed how to apply RFC 4814 pseudorandom port numbers
	  to any stateful NATxy (NAT44, NAT64, NAT66) technologies.
	  We discuss why using pseudorandom port numbers with stateful NATxy gateways is a 
	  difficult problem. We recommend a solution limiting the port number ranges and using 
	  two phases: the preliminary phase and the real test phase. We show how the 
	  classic performance measurement procedures (e.g. throughput, frame loss rate, 
	  latency, etc.) can be carried out. We also define new performance metrics and 
	  measurement procedures for maximum connection establishment rate, connection 
	  tear down rate and connection tracking table capacity measurements.
	  </t>
    </abstract>
  </front>

  <middle>
    <section anchor="intro" title="Introduction">
      <t><xref target="RFC2544"/> has defined a comprehensive benchmarking 
	  methodology for network interconnect devices, which is still in use. It was 
	  mainly IP version independent, but it used IPv4 in its examples. 
	  <xref target="RFC5180"/> addressed IPv6 specificities
      and also added technology updates, but declared IPv6 transition technologies
      out of its scope. <xref target="RFC8219"/> addressed the IPv6 transition 
	  technologies, including stateful NAT64. It has reused several benchmarking
      procedures from <xref target="RFC2544"/> (e.g. throughput, frame loss rate), 
	  it has redefined the latency measurement, and added further ones, e.g. the PDV 
	  (packet delay variation) measurement.</t>  
	  <t>However, none of them discussed, how to apply <xref target="RFC4814"/> 
	  pseudorandom port numbers, when benchmarking stateful NATxy (NAT44, NAT64, NAT66) 
	  gateways. We are not aware of any other RFCs that address this question.
	  </t> 
      <t>First, we discuss why using pseudorandom port numbers with stateful NATxy 
	  gateways is a hard problem. 
	  </t>
	  <t>Then we recommend a solution.
	  </t>
		  
      <section title="Requirements Language">
        <t>
           The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", 
           "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", 
           and "OPTIONAL" in this document are to be interpreted as described 
           in BCP14 <xref target="RFC2119"/> <xref target="RFC8174"/> 
           when, and only when, they appear in all capitals, as shown here.
         </t>
      </section>
    </section>

    <section anchor="problem" title="Pseudorandom Port Numbers and Stateful Translation">
    <t>In its appendix, <xref target="RFC2544"/> has defined a frame format 
	for test frames including specific source and destination port numbers.
    <xref target="RFC4814"/> recommends to use pseudorandom and
    uniformly distributed values for both source and destination port
    numbers. However, stateful NATxy (NAT44, NAT64, NAT66) solutions use the 
	port numbers to identify connections. The usage of pseudorandom port
	numbers causes different problems depending on the direction.
	<list style="symbols">
	  <t> As for the private to public direction, pseudorandom source and 
	  destination port numbers could be used, however, this approach would 
	  be a denial of service attack against the stateful NATxy gateway, 
	  because it would exhaust its connection tracking table capacity. To that end,
	  let us see some calculations using the recommendations of RFC 4814:
	  <list style="symbols">
	    <t>The recommended source port range is: 1024-65535, thus its size is: 64512.</t>
	    <t>The recommended destination port range is: 1-49151, thus its size is: 49151.</t>
	    <t>The number of source and destination port number combinations is:
	    3,170,829,312.</t>
	  </list>
      We note that section 12 of <xref target="RFC2544"/> also requires testing 
	  with 256 destination networks, which further increases the number of 
	  connection tracking table entries.</t>
	  <t>As for the public to private direction, the stateful DUT (Device Under Test) would drop any 
	  packets that do not belong to an existing connection, therefore, the 
	  direct usage of pseudorandom port numbers from the above-mentioned ranges 
	  is not feasible.</t>
	</list>
	</t>
    </section>


    <section anchor="setup_term" title="Test Setup and Terminology">
	  <t>
	  Our methodology works with any IP version. We use IPv4 in the Test Setup
	  shown in <xref target="test_setup"/> to facilitate its easy 
	  understanding based on the well-known stateful NAT44 
	  (also called NAPT: Network Address and Port Translation) solution.
	  </t>

        <figure anchor="test_setup" align="center" title="Test Setup for benchmarking
		stateful NATxy gateways">
          <preamble></preamble>

          <artwork align="left"><![CDATA[
              +--------------------------------------+
     10.0.0.2 |Initiator                    Responder| 198.19.0.2
+-------------|                Tester                |<------------+
| private IPv4|                         [state table]| public IPv4 |
|             +--------------------------------------+             |
|                                                                  |
|             +--------------------------------------+             |
|    10.0.0.1 |                 DUT:                 | 198.19.0.1  |
+------------>|        Sateful NATxy gateway         |-------------+
  private IPv4|     [connection tracking table]      | public IPv4
              +--------------------------------------+

            ]]></artwork>

        <postamble></postamble>
        </figure>
	  <t>As for transport layer protocol, <xref target="RFC2544"/> recommended 
	  testing with UDP, and it was kept also in <xref target="RFC8219"/>. For 
	  the general recommendation, we also keep UDP, thus the port numbers in the 
	  following text are to be understood as UDP port numbers. We discuss the 
	  limitation of this approach in <xref target="udp_or_tcp"/>.
	  </t>
	  <t>
      We define the most important elements of our proposed benchmarking system as follows.
	  <list style="symbols">
	  <t>Connection tracking table: The stateful NATxy gateway uses a connection 
	  tracking table to be able to perform the stateful translation in the public to
	  private direction. Its size, policy and content are unknown for the Tester.</t>
	  <t>Four tuple: The four numbers that identify a connection are source IP address, 
	  source port number, destination IP address, destination port number.</t>
	  <t>State table: The Responder of the Tester extracts the four tuple from each received
	  test frame and stores it in its state table. Recommendation is given for writing and 
	  reading order of the state table in <xref target="st_wr_order"/>.</t>
	  <t>Initiator: The port of the Tester that may initiate a connection through the 
	  stateful DUT in the private to public direction. Theoretically, it can use 
	  any source and destination port numbers from the ranges recommended by 
	  <xref target="RFC4814"/>: if the used four tuple does not belong to an existing 
	  connection, the DUT will register a new connection into its connection tracking table.</t>
	  <t>Responder: The port of the Tester that may not initiate a connection through 
	  the stateful DUT in the public to private direction. It may send only frames that 
	  belong to an existing connection. To that end, it uses four tuples that have been 
	  previously extracted from the received test frames and stored in its state table.</t>
	  <t>Preliminary test phase: Test frames are sent only by the Initiator to the 
	  Responder through the DUT to fill both the connection tracking table of the DUT 
	  and the state table of the Responder. This is a newly introduced operation phase 
	  for stateful NATxy benchmarking. The necessity of this phase is explained in 
	  <xref target="prelim"/>.</t>
	  <t>Real test phase: The actual test (e.g. throughput, latency, etc.) is performed 
	  in this phase after the completion of the preliminary test phase. Test frames 
	  are sent as required (e.g. bidirectional test or unidirectional test in any of 
	  the two directions).</t>

	  </list>
	  </t>
    </section>
	
    <section anchor="method" title="Recommended Benchmarking Method">
	
	  <section anchor="restr_port_range" title="Restricted Port Number Ranges">
	  <t>The Initiator SHOULD use restricted ranges for source and destination port 
	  numbers to avoid the denial of service attack like event against the 
	  connection tracking table of the DUT described in <xref target="problem"/>. 
	  The size of the source port number range SHOULD be larger (e.g. in the order 
	  of a few times ten thousand), whereas the size of the destination port number 
	  range SHOULD be smaller (may vary from a few to several hundreds or thousands 
	  as needed). 
	  The rationale is that source and destination port numbers that can be observed in 
	  the Internet traffic are not symmetrical. Whereas source port numbers may be random, 
	  there are a few very popular destination port numbers (e.g. 443, 80, etc., 
	  see <xref target="IIR2020"/>) and others hardly occur. And we have
	  found that their role is also asymmetric in the Linux kernel routing hash 
	  function <xref target="LEN2020"/>. 
	  </t>
	  <t>The product of the sizes of the two ranges can be used as a parameter. The 
	  performance of the stateful NATxy gateway MAY be examined as a function of this
	  parameter.
	  </t>
	  </section>

	  <section anchor="prelim" title="Preliminary Test Phase">
		<t>The preliminary phase serves two purposes:
		<list style="numbers">
		  <t>The connection tracking table of the DUT is filled. It is important, 
		  because its maximum connection establishment rate may be lower than its maximum
		  frame forwarding rate (that is throughput).</t>
		  <t>The state table of the Responder is filled with valid four tuples. It is 
		  a precondition for the Responder to be able to transmit frames that belong to connections 
		  exist in the connection tracking table of the DUT.</t>
		</list>				
		Whereas the above two things are always necessary before the real test phase, 
		the preliminary phase can be used without the real test phase. It is done so, 
		when the maximum connection establishment rate is measured (as described in 
		<xref target="meas_max_conn_est_rate"/>).
		</t>
	    <t>A preliminary test phase MUST be performed before all tests performed in 
		the real test phase. In this phase, the following things happen:
		<list style="numbers">
		  <t>The Initiator sends test frames to the Responder through the DUT at a 
		  specific frame rate.</t>
		  <t>The DUT performs the stateful translation of the test frames and it also 
		  stores the new combinations in its connection tracking table.</t>
		  <t>The Responder receives the translated test frames and updates its state 
		  table with the received four tuples. The responder transmits no test frames 
		  during the preliminary phase.</t>
		</list>  
		</t>
		<t>When the preliminary test phase is performed in preparation to the real test phase, 
		the applied frame rate and the duration of the preliminary phase SHOULD be 
		carefully selected so that:
		<list style="symbols">
		  <t>The applied frame rate be safely lower than the maximum connection establishment rate.</t>
		  <t>Enough four tuples be stored in the state table of the Responder so that 
		  it can generate frames with the proper distribution of the four tuples.</t>
		</list>
		Please refer to <xref target="ctrl_conntrack"/> for further conditions regarding timeout 
		and port number combinations.
		</t>
	  </section>
	  
	  <section anchor="consider_stateful" title="Consideration of the Cases of Stateful Operation">
		<t>We consider the most important Events that may happen during the operation 
		of a stateful NATxy gateway, and the Actions of the gateway as follows.
		<list style="numbers">
		  <t>EVENT: A packet not belonging to an existing connection arrives in the private to public 
		  direction. ACTION: A new connection is registered into the connection tracking 
		  table and the packet is translated and forwarded.</t>
		  <t>EVENT: A packet not belonging to an existing connection arrives in the public to private 
		  direction. ACTION: The packet is discarded.</t>		  
		  <t>EVENT: A packet belonging to an existing connection arrives (in any dicection). 
		   ACTION: The packet is translated and forwarded and the timeout counter of the corresponding 
		  connection tracking table entry is reset.</t>
		  <t>EVENT: A connection tracking table entry times out.  ACTION: The entry is deleted from 
		  the connection tracking table.</t>
		</list>
		</t>
		<t>Due to "black box" testing, the Tester is not able to directly examine (or delete) the entries 
		of the connection tracking table. But the entires can be and MUST be controlled by setting 
		an appropriate timeout value and carefully selecting the port numbers of the packets
		(as described in <xref target="ctrl_conntrack"/>) to be able to produce meaningful and 
		repeatable measurement results.
		</t>
		<t>We aim to support the measurement of the following performance characteristics 
		of a stateful NATxy gateway:
		<list style="numbers">
		  <t>maximum connection establishment rate</t>
		  <t>all "classic" performance metrics like throughput, frame loss rate, latency, etc.</t>		  
		  <t>connection tear down rate</t>
		  <t>connection tracking table capacity</t>
		</list>
		</t>
	  </section>
	  
	  <section anchor="ctrl_conntrack" title="Control of the Connection Tracking Table Entries">
		<t>It is necessary to control the connection tracking table entries 
		of the DUT in order to achieve clear conditions for the measurements. We can simply 
		achieve the following two extreme situations:
		<list style="numbers">
		  <t>All frames create a new entry in the connection tracking table of the DUT and no 
		  old entries are deleted during the test. This is required for measuring the maximum 
		  connection establishment rate.</t>
		  <t>No new entries are created in the connection tracking table of the DUT and no old
		  ones are deleted during the test. This is ideal for the real test phase measurements, 
		  like throughput, latency, etc.</t>
		</list>	
		</t>

		<t>
		From this point we use the following three assumptions:
		<list style="numbers">
		  <t>A single source address destination address pair is used for all tests. We make this 
		  assumption for simplicity. Of course, we are aware that <xref target="RFC2544"/> requires 
		  testing also with 256 different destination networks.</t>
		  <t>The connection tracking table of the stateful NATxy is large enough to store all 
		  connections defined by the different source port number destination port number 
		  combinations.</t>
		  <t>Each experiment is started with an empty connection tracking table. (It can be ensured
		  by deleting its content before the experiment.)</t>
		</list>	
		</t>		
		
		<t>The first extreme situation can be achieved by 
		<list style="symbols">
		  <t>using different source port number destination port number combinations 
		  for every single test frame in the preliminary phase and</t>
		  <t> setting the UDP timeout of the NATxy gateway to a value higher than the length of 
		  the preliminary phase.</t>
		</list>			  
		</t>
		
		<t>The second extreme situation can be achieved by 
		<list style="symbols">
		  <t>enumerating all the possible source port number destination port number combinations 
		  in the preliminary phase and</t>
		  <t>setting the UDP timeout of the NATxy gateway to a value higher than the length of 
		  the preliminary phase plus the gap between the two phases plus the length of the real 
		  test phase.</t>
		</list>
		</t>
				
		<t>
		<xref target="RFC4814"/> REQUIRES pseudorandom port numbers, which we believe is a good 
		approximation of the distribution of the source port numbers a NATxy gateway on the 
		Internet may face with.
		</t>
		
		<t>
		We note that although the enumeration of all possible source port 
		number destination port number combinations is not a requirement 
		for the first extreme situation and the usage of different source 
		port number destination port number combinations is not a 
		requirement for the second extreme situation, pseudorandom 
		enumeration of source port number destination port number combinations 
		is a good solution in both cases. It may be computing efficiently 
		generated by preparing a random permutation of the previously 
		enumerated all possible source port number destination port number 
		combinations using Dustenfeld's random shuffle algorithm <xref target="DUST1964"/>.
		</t>

		<t>Important warning: in normal (non-NAT) router testing, the port number selection algorithm, 
		whether it is pseudo-random or enumerated in increasing (or decreasing) order does not affect 
		final results. However, our experience with iptables shows that if the connection tracking table 
		is filled using port number enumeration in increasing order, then the maximum connection 
		establishment rate of iptables degrades significantly compared to its performance using 
		pseudorandom port numbers <xref target="LEN2021"/>.
		</t>
		
		<t>The enumeration of the source port number destination port number combinations 
		in increasing or decreasing order (or in any other specific order) MAY be used as an 
		additional measurement. 
		</t>
	  </section>
	  
	  <section anchor="meas_max_conn_est_rate" title="Measurement of the Maximum Connection Establishment Rate">
	    <t>The maximum connection establishment rate is an important characteristic of
		the stateful NATxy gateway and its determination is necessary for the safe 
		execution of the preliminary test phase (without frame loss) before the real 
		test phase.
		</t>
		<t>The measurement procedure of the maximum connection establishment rate is 
		very similar to the throughput measurement procedure defined in 
		<xref target="RFC2544"/>.
		</t>
		<t>Procedure: The Initiator sends a specific number of test frames using all
		different source port number destination port number combinations at a specific rate
		through the DUT. The Responder counts the frames that are successfully translated 
		by the DUT. If the count of offered frames is equal to the count of received
		frames, the rate of the offered stream is raised and the test is rerun.  If fewer 
		frames are received than were transmitted, the rate of the offered stream is 
		reduced and the test is rerun.
		</t>
		<t>The maximum connection establishment rate is the fastest rate at which 
		the count of test frames successfully translated by the DUT is equal to the number 
		of test frames sent to it by the Initiator.
		</t>
		<t>Notes:
		<list style="numbers">
		  <t>In practice, we RECOMMEND the usage of binary search.</t>
		  <t>As for the successful translation, the Responder MAY check that the 
		  source IP address is different than the original source IP address set by the 
		  Initiator. However, it is still not a guarantee for the establishment of the 
		  connection in the DUT. Therefore we RECOMMEND the usage of the validation 
		  of the connection establishment defined in <xref target="validation_of_conn"/>.
		  </t>
		</list>
		</t>
	  </section>

	  <section anchor="validation_of_conn" title="Validation of Connection Establishment">
	    <t>Due to "black box" testing, the entries of the connection tracking table of 
		the DUT may not be directly examined, but the presence of the connections can be 
		checked easily by sending frames from the Responder to the Initiator in the Real 
		Test Phase using all four tuples stored in the state table of the Tester 
		(at a low enough frame rate). The arrival of all test frames indicates that the 
		connections are really present.
		</t>

		<t>Procedure: When all the desired N number of test frames were sent by the Initiator 
		to the Receiver at frame rate R in the Preliminary Phase for the maximum connection 
		establishment rate measurement, and the Receiver has successfully received all 
		the N frames, the establishment of the connections is checked in the Real Test 
		Phase as follows:
		<list style="symbols">
		  <t>The Responder sends test frames to the Initiator at frame rate: r=R*alpha, 
		  for the duration of N/r using a different four tuple from its state table for 
		  each test frame.</t> 
		  <t>The Initiator counts the received frames, and if all N frames are arrived 
		  then the frame rate of the maximum connection establishment rate is raised, 
		  otherwise lowered (as well as in the case if test frames were missing 
		  in the preliminary phase).</t>
		</list>
		</t>
		<t>Notes:		  
		  <list style="symbols">
		    <t>The alpha is a kind of "safety factor", its aim is to make sure that 
			the frame rate used for the validation is not too high, and test may fail only 
			in the case if at least one connection is not present in the connection 
			tracking table of the DUT. (So alpha should be typically less than 1, e.g. 
			0.8 or 0.5.)
			</t>
			<t>The duration of N/r and the frame rate of r means that N frames are sent for validation.</t>
			<t>The order of four tuple selection is arbitrary provided that all four tuples MUST be used.</t>
			<t>Please refer to <xref target="meas_contr_capacity"/> for a short analysis 
			of the operation of the measurement and what problems may occur.</t>
		  </list>
		</t>
	  </section>

	  
	  <section anchor="real_test" title="Real Test Phase">
	    <t>As for the traffic direction, there are three possible cases during the real 
	    test phase:
	    <list style="symbols">
		  <t>bidirectional traffic: The Initiator sends test frames to the Responder and 
		  the Responder sends test frames to the Initiator.</t>
		  <t>unidirectional traffic from the Initiator to the Responder: The Initiator 
		  sends test frames to the Responder but the Responder does not send test frames to 
		  the Initiator.</t>
		  <t>unidirectional traffic from the Responder to the Initiator: The Responder 
		  sends test frames to the Initiator but the Initiator does not send test frames to 
		  the Responder.</t>
		</list>
		</t>
		<t>If the Initiator sends test frames, then it uses pseudorandom source port numbers and 
		destination port numbers from the restricted port number ranges. The responder receives 
		the test frames, updates its state table and processes the test 
		frames as required by the given measurement procedure (e.g. only counts them for 
		throughput test, handles timestamps for latency or PDV tests, etc.).
		</t>
		<t>If the Responder sends test frames, then it uses the four tuples from its state 
		table. The reading order of the state table may follow different policies (discussed
		in <xref target="st_wr_order"/>). The Initiator receives the test frames, and 
		processes them as required by the given measurement procedure.
		</t>
		<t>
		As for the actual measurement procedures, we RECOMMEND to use the updated ones 
		from Section 7 of <xref target="RFC8219"/>.
		</t>
	  </section>

	  <section anchor="meas_conn_tear_down_rate" title="Measurement of the Connection Tear Down Rate">	  
		<t>Connection tear down can cause significant load for the NATxy gateway. 
		The connection tear down performance can be measured as follows:
	    <list style="numbers">
		  <t>Load a certain number of connections (N) into the connection 
		  tracking table of the DUT (in the same way as done to measure the 
		  maximum connection establishment rate).</t>
		  <t>Record TimestampA.</t>
		  <t>Delete the content of the connection tracking table of the DUT.</t>
		  <t>Record TimestampB.</t>
  		</list>
		The connection tear down rate can be computed as:
		</t>
        <t>connection tear down rate = N / ( TimestampB - TimestampA)
        </t>
		<t>The connection tear down rate SHOULD be measured for various values of N.
		</t>
        <t>We assume that the content of the connection tracking table may be deleted
		by an out-of-band control mechanism specific to the given NATxy gateway implementation. 
		(E.g. by removing the appropriate kernel module under Linux.)
		</t>
        <t>We are aware that the performance of removing the entire content of the connection 
		tracking table at one time may be different from removing all the entries one by one. 
		</t>
		
	  </section>

	  <section anchor="meas_contr_capacity" title="Measurement of the Connection Tracking Table Capacity">	  
		<t>The connection tracking table capacity is an important metric of stateful 
		NATxy gateways. Its measurement is not easy, because an elementary 
		step of a validated maximum connection establishment rate measurement (defined in 
		<xref target="validation_of_conn"/>) may have only a few distinct observable outcomes, 
		but some of them they may have different root causes: 
	    <list style="numbers">
		  <t>During the preliminary phase, the number of test frames received by the 
		  Responder is less than the number of test frames sent by the Initiator. 
		  It may have different root causes, including:
		  <list style="numbers">
		    <t>The R frame sending rate was higher than the maximum connection 
			establishment rate. (Note that now the maximum connection 
			establishment rate is considered unknown, because we can not measure the 
			maximum connection establishment without our assumption 2 in 
			<xref target="ctrl_conntrack"/>!)
			This root cause may be eliminated by lowering the R rate and re-executing 
			the test. (This step may be performed multiple times, while R>0.)</t>
			<t>The capacity of the connection tracking table of the DUT has been 
			  exhausted. (And either the DUT does not want to delete connections 
			  or the deletion of the connections makes it slower. This case is not
			  investigated further in the preliminary phase.)</t>
		  </list>
	      </t>
		  <t>During the preliminary phase, the number of test frames received by the 
		  Responder equals the number of test frames sent by the Initiator. 
		  In this case the connections are validated in the Real Test Phase. 
		  The validation may have two kinds of observable results:
		  <list style="numbers">
		    <t>The number of validation frames received by the Initiator 
			equals the number of validation frames sent by the Responder. 
			(It proves that the capacity of the connection tracking table of 
			the DUT is enough and both R and r were chosen properly.)</t>
			<t>The number of validation frames received by the Initiator 
			is less than the number of validation frames sent by the Responder. 
			This phenomenon may have various root causes:
			<list style="numbers">
			  <t>The capacity of the connection tracking table of the DUT has been 
			  exhausted. (It does not matter, whether some existing connections are 
			  discarded and new ones are stored, or the new connections are discarded.
			  Some connections are lost anyway, and it makes validation fail.)</t>
			  <t>The R frame sending rate used by the Initiator was too high in the 
			  Preliminary Phase and thus some connections were not established, 
			  even though all test frames arrived to the Responder. This root cause 
			  may be eliminated by lowering the R rate and re-executing the test. 
			  (This step may be performed multiple times, while R>0.)</t>
			  <t>The r frame sending rate used by the Responder was too high in the Real 
			  Test Phase and thus some test frames did not arrive to the Initiator, even 
			  though all connections were present in the connection tracking table of the DUT. 
			  This root cause may be eliminated by lowering the r rate and re-executing the test. 
			  (This step may be performed multiple times, while r>0.)</t>
			</list>
			And here is the problem: as the above three root causes are indistinguishable, 
			it is not easy to decide, whether R or r should be decreased.
			</t>
		  </list>
		  </t>
		</list>		
		</t>

		<t>We have some experience with benchmarking stateful NATxy gateways. When we tested 
		iptables with very high number of connections, the 256GB RAM of the DUT was 
		exhausted and it stopped responding. Such a situation may make the connection 
		tracking table capacity	measurements rather inconvenient. We include this 
		possibility in our recommended measurement procedure, but we do not address the detection 
		and elimination of such a situation. (E.g. how the algorithm can reset the DUT.)
		</t>

		<t>For the connection tracking table size measurement, fist we need a safe 
		number: C0. It is a precondition, that C0 number of connections can surely be 
		stored in the connection tracking table of the DUT. Using C0, one can determine 
		the maximum connection establishment rate using C0 number of connections. 
		It is done with a binary search using validation. The result is: R0. The values 
		C0 and R0 will serve as "safe" starting values for the following two searches.
		</t>

		<t>First, we perform an exponential search to find the order of magnitude of the 
		connection tracking table capacity. The search stops if the DUT collapses OR 
		the maximum connection establishment rate severely drops (e.g. to its one tenth)
		due to doubling the number of connections.
		</t>

		<t>Then, the result of the exponential search gives the order of magnitude of 
		the size of the connection tracking table. Before disclosing the possible algorithms to
		determine the size of the connection tracking table, we consider a three possible 
		replacement policies of the NATxy gateway:
	    <list style="numbers">
		  <t>The gateway does not delete any live connections until their timeout expires.</t>
		  <t>The gateway replaces the live connections according to LRU (least recently used) policy.</t>
		  <t>The gateway does a garbage collection, when its connection tracking table is full 
		  and a frame with a new four tuple arrives. During the garbage collection, it deletes the K 
		  least recently used connections, where K greater than 1.</t>
  		</list>		
		Now, we examine, what happens and how many validation frames arrive in the there cases. 
		Let the size of the connection tracking table be S, and the number of preliminary 
		frames be N, where S is less than N.
	    <list style="numbers">
		  <t>The connections defined by the first S test frames are registered into 
		  the connection tracking table of the DUT, and the last N-S connections are lost. 
		  (It is a another question if the last N-S test frames are translated and 
		  forwarded in the preliminary or simply dropped.) During validation, the validation 
		  frames with four tuples corresponding to the first S test frames will arrive to the 
		  Initiator, and the other N-S validation frames will be lost.</t>
		  <t>All connections are registered into the connection tracking table of the DUT, 
		  but the first N-S connections are replaced (and thus lost). During validation, 
		  the validation frames with four tuples corresponding to the last S test frames 
		  will arrive to the Initiator, and the other N-S validation frames will be lost. </t>
		  <t>Depending on the values of K, S and N, maybe less than S connections will survive.
		  In the worst case, only S-K+1 validation frames arrive, even though, the size of 
		  the connection tracking table is S.</t>
  		</list>
		If we know that the stateful NATxy gateway uses the first or second replacement 
		policy, and we also know that both R and r rates are low enough, then the final
		step of determining the size of the connection tracking table is simple. If Responder 
		sent N validation frames and the Initator received N' of them, then the size of the 
		connection tracking table is N'.
 		</t>
		
		<t>In the general case, we perform a binary search to find the exact value of the connection 
		tracking table capacity within E error. The search chooses the lower half of 
		the interval if the DUT collapses OR the maximum connection establishment 
		rate severely drops (e.g. to its half) otherwise it chooses the higher half. 
		The search stops if the size of the interval is less than the E error.
		</t>		

		<t>The algorithms for the general case are defined using C like pseudocode in 
		<xref target="meas_contr_capacity_algo"/>. In practice, this algorithm may 
		be made more efficient in a way that the binary search for the maximum 
		connection establishment rate stops, if an elementary test fails at a rate 
		under RS*beta or RS*gamma during the external search or during the final 
		binary search for the capacity of the connection tracking table, respectively. 
		(This saves a lot a execution time by eliminating the long lasting tests at 
		low rates.)		
		</t>

        <figure anchor="meas_contr_capacity_algo" align="center" title="Measurement of the Connection Tracking Table Capacity">
          <preamble></preamble>

          <artwork align="left"><![CDATA[
// The binary_search_for_maximum_connection_establishment_rate(c,r) 
// function performs a binary search for the maximum connection 
// establishment rate in the [0, r] interval using c number of 
// connections.

// This is an exponential search for finding the order of magnitude 
// of the connection tracking table capacity
// Variables:
//   C0 and R0 are beginning safe values for connection tracking table 
       size and connection establishment rate, respectively
//   CS and RS are their currently used safe values
//   CT and RT are their values for current examination
//   beta is a factor expressing unacceptable drop of R (e.g. beta=0.1)
R0=binary_search_for_maximum_connection_establishment_rate(C0,maxrate);
for ( CS=C0, RS=R0;  1; CS=CT, RS=RT )
{
  CT=2*CS;
  RT=binary_search_for_maximum_connection_establishment_rate(CT,RS);
  if ( DUT_collapsed || RT < RS*beta )
    break;
}
// here the size of the connection tracking table is between CS and CT

// This the final binary search for finding the connection tracking  
// table capacity within E error
// Variables:
//   CS and RS are the safe values for connection tracking table size 
//     and connection establishment rate, respectively
//   C and R are the values for current examination
//   gamma is a factor expressing unacceptable drop of R 
//     (e.g. gamma=0.5)
for ( D=CT-CS;  D>E; D=CT-CS )
{
  C=(CS+CT)/2;
  R=binary_search_for_maximum_connection_establishment_rate(C,RS);
  if ( DUT_collapsed || R < RS*gamma)
    CT=C; // take the lower half of the interval
  else
    CS=C,RS=R; // take the upper half of the interval
}
// here the size of the connection tracking table is CS within E error
            ]]></artwork>

        <postamble></postamble>
        </figure>
		
	  </section>
	  
	  <section anchor="st_wr_order" title="Writing and Reading Order of the State Table">	  
		<t>As for writing policy of the state table of the Responder, we RECOMMEND round robin, 
		because it ensures that its entries are automatically kept fresh and consistent with 
		that of the connection tracking table of the DUT.
		</t>
		<t>The Responder can read its state table in various orders, for example:
	    <list style="symbols">
		  <t>pseudorandom</t>
		  <t>round robin</t>
		</list>
		</t>
		<t>
		We RECOMMEND pseudorandom to follow the spirit of <xref target="RFC4814"/>. Round 
		robin may be used as a computationally cheaper alternative. 
		</t>
	  </section>
	    
    </section>	

    <section anchor="impl_exp" title="Implementation and Experience">
	  <t>The "stateful" branch of siitperf <xref target="SIITPERF"/> is an implementation of this concept.
	  It is documented in this (open access) paper <xref target="LEN2022"/>. 
	  </t>
	  <t>Our experience with this methodology using siitperf for measuring the 
	  scalability of the iptables stateful NAT44 and Jool stateful NAT64 
	  implementations is described in 
	  <xref target="I-D.lencse-v6ops-transition-scalability"/>.
	  </t>	  
    </section>

	
    <section anchor="udp_or_tcp" title="Limitations of using UDP as Transport Layer Protocol">
	  <t>Stateful NATxy solutions handle TCP and UDP differently, e.g. iptables uses 30s 
	  timeout for UDP and 60s timeout for TCP. Thus benchmarking results produced using UDP do not 
      necessarily characterize the performance of a NATxy gateway well enough, when they 
	  are used for forwarding Internet traffic. As for the given example, timeout values of the DUT may 
      be adjusted, but it requires extra consideration. 	  
	  </t> 
	  <t>Other differences in handling UDP or TCP are also possible. Thus we recommend that 
	  further investigations are to be performed in this field.
	  </t>
	  <t>As a mitigation of this problem, we recommend that testing with protocols usig TCP 
	  (like HTTP and HTTPS) can be performed as described in 
	  <xref target="I-D.ietf-bmwg-ngfw-performance"/>. This approach also solves the potential 
	  problem of protocol helpers may be present in the stateful DUT. 
	  </t>
    </section>

   <section anchor="Acknowledgements" title="Acknowledgements">
     <t>The authors would like to thank Al Morton, Sarah Banks, Edwin Cordeiro, Lukasz Bromirski 
	 and Sandor Repas for their comments.</t>
   </section>

   <!-- Possibly a 'Contributors' section ... -->

   <section anchor="IANA" title="IANA Considerations">
     <t>This document does not make any request to IANA.</t>
   </section>

   <section anchor="Security" title="Security Considerations">
     <t>We have no further security considerations beyond that of <xref target="RFC8219"/>. 
	 Perhaps they should be cited here so that they be applied not only for the 
	 benchmarking of IPv6 transition technologies, but also for the benchmarking of 
	 stateful NATxy gateways.</t>
   </section>
 </middle>

 <!--  *****BACK MATTER ***** -->

 <back>
   <!-- References split into informative and normative -->

   <!-- There are 2 ways to insert reference entries from the citation libraries:
    1. define an ENTITY at the top, and use "ampersand character"RFC2629; here (as shown)
    2. simply use a PI "less than character"?rfc include="reference.RFC.2119.xml"?> here
       (for I-Ds: include="reference.I-D.narten-iana-considerations-rfc2434bis.xml")

    Both are cited textually in the same manner: by using xref elements.
    If you use the PI option, xml2rfc will, by default, try to find included files in the same
    directory as the including file. You can also define the XML_LIBRARY environment variable
    with a value containing a set of directories to search.  These can be either in the local
    filing system or remote ones accessed by http (http://domain/dir/... ).-->

   <references title="Normative References">
    <!--?rfc include="http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml"?-->
    &RFC2119;
	&RFC2544;
    &RFC4814;
	&RFC5180;
    &RFC8174;
	&RFC8219;

   </references>

   <references title="Informative References">
     <!-- Here we use entities that we defined at the beginning. -->

    <?rfc include='reference.I-D.ietf-bmwg-ngfw-performance'?>
    <?rfc include='reference.I-D.lencse-v6ops-transition-scalability'?>
	
    <reference anchor="DUST1964" 
    target="https://dl.acm.org/doi/10.1145/364520.364540">
      <front>
        <title>Algorithm 235: Random permutation
        </title>
        <author initials="R." surname="Durstenfeld">
          <organization></organization>
		</author>
        <date day="" month="July" year="1964"/>
      </front>
      <seriesInfo name="" value="Communications of the ACM, vol. 7, no. 7, p.420."/>
      <seriesInfo name="DOI" value="10.1145/364520.364540"/>
    </reference>
	
    <reference anchor="IIR2020" 
    target="https://www.iij.ad.jp/en/dev/iir/pdf/iir_vol49_report_EN.pdf">
      <front>
        <title>Periodic observation report: Internet trends as seen from IIJ infrastructure - 2020
        </title>

        <author initials="T." surname="Kurahashi">
          <organization></organization>
        </author>
        <author initials="Y." surname="Matsuzaki">
          <organization></organization>
        </author>
        <author initials="T." surname="Sasaki">
          <organization></organization>
        </author>
        <author initials="T." surname="Saito">
          <organization></organization>		  
        </author>
        <author initials="F." surname="Tsutsuji">
          <organization></organization>
        </author>
        <date day="" month="Dec" year="2020"/>
      </front>
      <seriesInfo name="" value="Internet Infrastructure Review, vol. 49"/>
    </reference>
	

    <reference anchor="LEN2020" 
    target="http://www.hit.bme.hu/~lencse/publications/291-1113-1-PB.pdf">
      <front>
        <title>Adding RFC 4814 Random Port Feature to Siitperf: Design, Implementation and Performance Estimation
        </title>

        <author initials="G." surname="Lencse">
          <organization></organization>
        </author>
        <date day="" month="" year="2020"/>
      </front>
      <seriesInfo name="" value="International Journal of Advances in Telecommunications, Electrotechnics, Signals and Systems, vol 9, no 3, pp. 18-26."/>
      <seriesInfo name="DOI" value="10.11601/ijates.v9i3.291"/>
    </reference>

    <reference anchor="LEN2021"  
    target="http://www.hit.bme.hu/~lencse/publications/SFNAT64-tester-for-review.pdf">
      <front>
        <title>Design and Implementation of a Software Tester for Benchmarking Stateful NAT64 Gateways: 
		Theory and Practice of Extending Siitperf for Stateful Tests
        </title>

        <author initials="G." surname="Lencse">
          <organization></organization>
        </author>

        <date day="" month="" year="2021"/>
      </front>
      <seriesInfo name="" value="it was under review in Computer Communications"/>    
      <seriesInfo name="" value="then it was significantly rewritten"/>
    </reference>


	
    <reference anchor="LEN2022"  
    target="http://www.hit.bme.hu/~lencse/publications/ECC-2022-SFNAT64xy-Tester-published.pdf">
      <front>
        <title>Design and Implementation of a Software Tester for Benchmarking Stateful NAT64xy Gateways: 
		Theory and Practice of Extending Siitperf for Stateful Tests
        </title>

        <author initials="G." surname="Lencse">
          <organization></organization>
        </author>

        <date day="" month="" year="2022"/>
      </front>
      <seriesInfo name="" value="Computer Communications, vol. 172, no. 1, pp. 75-88, August 1, 2022"/>    
      <seriesInfo name="DOI" value="10.1016/j.comcom.2022.05.028"/>
    </reference>

	   
	
    <reference anchor="SIITPERF" 
    target="https://github.com/lencsegabor/siitperf">
      <front>
        <title>Siitperf: An RFC 8219 compliant SIIT (stateless NAT64) tester written in C++ using DPDK
        </title>

        <author initials="G." surname="Lencse">
          <organization></organization>
        </author>

        <date day="" month="" year="2019-2022" />
      </front>
      <seriesInfo name="" value="source code"/>
      <seriesInfo name="" value="available from GitHub"/>
    </reference>
	<!-- 	-->
	
   </references>

   <section anchor="change_log" title="Change Log">
    <section title="00">
      <t>Initial version.
      </t>
    </section>
    <section title="01">
      <t>Updates based on the comments received on the BMWG mailing list and minor corrections.
      </t>
    </section>   
    <section title="02">
      <t><xref target="ctrl_conntrack"/> was completely re-written. As a consequence, 
	  the occurrences of the now undefined "mostly different" source port number destination 
	  port number combinations were deleted from <xref target="meas_max_conn_est_rate"/>, 
	  too.
      </t>
    </section>  	  
    <section title="03">
      <t>Added <xref target="consider_stateful"/> about the consideration of the
	  cases of stateful operation.
      </t>
      <t>Consistency checking. Removal of some parts obsoleted by the previous re-writing 
	  of <xref target="ctrl_conntrack"/>. 
      </t>
      <t>Added <xref target="meas_conn_tear_down_rate"/> about the method for measuring connection tear down rate.
      </t>
      <t>Updates for <xref target="impl_exp"/> about the implementation and experience.
      </t>
    </section>
    <section title="04">
      <t>Update of the abstract.
      </t>
      <t>Added <xref target="validation_of_conn"/> about validation of connection establishment.
      </t>
      <t>Added <xref target="meas_contr_capacity"/> about the method for measuring connection tracking table capacity.
      </t>
      <t>Consistency checking and corrections. 
      </t>
    </section>   	
  </section>
  </back>
</rfc>