Benchmarking Methodology for Stateful NATxy Gateways using RFC 4814 Pseudorandom Port Numbers

The Initiator SHOULD use restricted ranges for source and destination port numbers to avoid the denial of service attack like event against the connection tracking table of the DUT described in . The size of the source port number range SHOULD be larger (e.g. in the order of a few times ten thousand), whereas the size of the destination port number range SHOULD be smaller (may vary from a few to several hundreds or thousands as needed). The rationale is that source and destination port numbers that can be observed in the Internet traffic are not symmetrical. Whereas source port numbers may be random, there are a few very popular destination port numbers (e.g. 443, 80, etc., see ) and others hardly occur. And we have found that their role is also asymmetric in the Linux kernel routing hash function . The product of the sizes of the two ranges can be used as a parameter. The performance of the stateful NATxy gateway MAY be examined as a function of this parameter.

The preliminary phase serves two purposes: The connection tracking table of the DUT is filled. It is important, because its maximum connection establishment rate may be lower than its maximum frame forwarding rate (that is throughput). The state table of the Responder is filled with valid four tuples. It is a precondition for the Responder to be able to transmit frames that belong to connections exist in the connection tracking table of the DUT. Whereas the above two things are always necessary before the real test phase, the preliminary phase can be used without the real test phase. It is done so, when the maximum connection establishment rate is measured (as described in ). A preliminary test phase MUST be performed before all tests performed in the real test phase. In this phase, the following things happen: The Initiator sends test frames to the Responder through the DUT at a specific frame rate. The DUT performs the stateful translation of the test frames and it also stores the new combinations in its connection tracking table. The Responder receives the translated test frames and updates its state table with the received four tuples. The responder transmits no test frames during the preliminary phase. When the preliminary test phase is performed in preparation to the real test phase, the applied frame rate and the duration of the preliminary phase SHOULD be carefully selected so that: The applied frame rate be safely lower than the maximum connection establishment rate. Enough four tuples be stored in the state table of the Responder so that it can generate frames with the proper distribution of the four tuples. Please refer to for further conditions regarding timeout and port number combinations.

We consider the most important Events that may happen during the operation of a stateful NATxy gateway, and the Actions of the gateway as follows. EVENT: A packet not belonging to an existing connection arrives in the private to public direction. ACTION: A new connection is registered into the connection tracking table and the packet is translated and forwarded. EVENT: A packet not belonging to an existing connection arrives in the public to private direction. ACTION: The packet is discarded. EVENT: A packet belonging to an existing connection arrives (in any dicection). ACTION: The packet is translated and forwarded and the timeout counter of the corresponding connection tracking table entry is reset. EVENT: A connection tracking table entry times out. ACTION: The entry is deleted from the connection tracking table. Due to "black box" testing, the Tester is not able to directly examine (or delete) the entries of the connection tracking table. But the entires can be and MUST be controlled by setting an appropriate timeout value and carefully selecting the port numbers of the packets (as described in ) to be able to produce meaningful and repeatable measurement results. We aim to support the measurement of the following performance characteristics of a stateful NATxy gateway: maximum connection establishment rate all "classic" performance metrics like throughput, frame loss rate, latency, etc. connection tear down rate connection tracking table capacity

It is necessary to control the connection tracking table entries of the DUT in order to achieve clear conditions for the measurements. We can simply achieve the following two extreme situations: All frames create a new entry in the connection tracking table of the DUT and no old entries are deleted during the test. This is required for measuring the maximum connection establishment rate. No new entries are created in the connection tracking table of the DUT and no old ones are deleted during the test. This is ideal for the real test phase measurements, like throughput, latency, etc. From this point we use the following three assumptions: A single source address destination address pair is used for all tests. We make this assumption for simplicity. Of course, we are aware that requires testing also with 256 different destination networks. The connection tracking table of the stateful NATxy is large enough to store all connections defined by the different source port number destination port number combinations. Each experiment is started with an empty connection tracking table. (It can be ensured by deleting its content before the experiment.) The first extreme situation can be achieved by using different source port number destination port number combinations for every single test frame in the preliminary phase and setting the UDP timeout of the NATxy gateway to a value higher than the length of the preliminary phase. The second extreme situation can be achieved by enumerating all the possible source port number destination port number combinations in the preliminary phase and setting the UDP timeout of the NATxy gateway to a value higher than the length of the preliminary phase plus the gap between the two phases plus the length of the real test phase. REQUIRES pseudorandom port numbers, which we believe is a good approximation of the distribution of the source port numbers a NATxy gateway on the Internet may face with. We note that although the enumeration of all possible source port number destination port number combinations is not a requirement for the first extreme situation and the usage of different source port number destination port number combinations is not a requirement for the second extreme situation, pseudorandom enumeration of source port number destination port number combinations is a good solution in both cases. It may be computing efficiently generated by preparing a random permutation of the previously enumerated all possible source port number destination port number combinations using Dustenfeld's random shuffle algorithm . Important warning: in normal (non-NAT) router testing, the port number selection algorithm, whether it is pseudo-random or enumerated in increasing (or decreasing) order does not affect final results. However, our experience with iptables shows that if the connection tracking table is filled using port number enumeration in increasing order, then the maximum connection establishment rate of iptables degrades significantly compared to its performance using pseudorandom port numbers . The enumeration of the source port number destination port number combinations in increasing or decreasing order (or in any other specific order) MAY be used as an additional measurement.

The maximum connection establishment rate is an important characteristic of the stateful NATxy gateway and its determination is necessary for the safe execution of the preliminary test phase (without frame loss) before the real test phase. The measurement procedure of the maximum connection establishment rate is very similar to the throughput measurement procedure defined in . Procedure: The Initiator sends a specific number of test frames using all different source port number destination port number combinations at a specific rate through the DUT. The Responder counts the frames that are successfully translated by the DUT. If the count of offered frames is equal to the count of received frames, the rate of the offered stream is raised and the test is rerun. If fewer frames are received than were transmitted, the rate of the offered stream is reduced and the test is rerun. The maximum connection establishment rate is the fastest rate at which the count of test frames successfully translated by the DUT is equal to the number of test frames sent to it by the Initiator. Notes: In practice, we RECOMMEND the usage of binary search. As for the successful translation, the Responder MAY check that the source IP address is different than the original source IP address set by the Initiator. However, it is still not a guarantee for the establishment of the connection in the DUT. Therefore we RECOMMEND the usage of the validation of the connection establishment defined in .

Due to "black box" testing, the entries of the connection tracking table of the DUT may not be directly examined, but the presence of the connections can be checked easily by sending frames from the Responder to the Initiator in the Real Test Phase using all four tuples stored in the state table of the Tester (at a low enough frame rate). The arrival of all test frames indicates that the connections are really present. Procedure: When all the desired N number of test frames were sent by the Initiator to the Receiver at frame rate R in the Preliminary Phase for the maximum connection establishment rate measurement, and the Receiver has successfully received all the N frames, the establishment of the connections is checked in the Real Test Phase as follows: The Responder sends test frames to the Initiator at frame rate: r=R*alpha, for the duration of N/r using a different four tuple from its state table for each test frame. The Initiator counts the received frames, and if all N frames are arrived then the frame rate of the maximum connection establishment rate is raised, otherwise lowered (as well as in the case if test frames were missing in the preliminary phase). Notes: The alpha is a kind of "safety factor", its aim is to make sure that the frame rate used for the validation is not too high, and test may fail only in the case if at least one connection is not present in the connection tracking table of the DUT. (So alpha should be typically less than 1, e.g. 0.8 or 0.5.) The duration of N/r and the frame rate of r means that N frames are sent for validation. The order of four tuple selection is arbitrary provided that all four tuples MUST be used. Please refer to for a short analysis of the operation of the measurement and what problems may occur.

As for the traffic direction, there are three possible cases during the real test phase: bidirectional traffic: The Initiator sends test frames to the Responder and the Responder sends test frames to the Initiator. unidirectional traffic from the Initiator to the Responder: The Initiator sends test frames to the Responder but the Responder does not send test frames to the Initiator. unidirectional traffic from the Responder to the Initiator: The Responder sends test frames to the Initiator but the Initiator does not send test frames to the Responder. If the Initiator sends test frames, then it uses pseudorandom source port numbers and destination port numbers from the restricted port number ranges. The responder receives the test frames, updates its state table and processes the test frames as required by the given measurement procedure (e.g. only counts them for throughput test, handles timestamps for latency or PDV tests, etc.). If the Responder sends test frames, then it uses the four tuples from its state table. The reading order of the state table may follow different policies (discussed in ). The Initiator receives the test frames, and processes them as required by the given measurement procedure. As for the actual measurement procedures, we RECOMMEND to use the updated ones from Section 7 of .

Connection tear down can cause significant load for the NATxy gateway. The connection tear down performance can be measured as follows: Load a certain number of connections (N) into the connection tracking table of the DUT (in the same way as done to measure the maximum connection establishment rate). Record TimestampA. Delete the content of the connection tracking table of the DUT. Record TimestampB. The connection tear down rate can be computed as: connection tear down rate = N / ( TimestampB - TimestampA) The connection tear down rate SHOULD be measured for various values of N. We assume that the content of the connection tracking table may be deleted by an out-of-band control mechanism specific to the given NATxy gateway implementation. (E.g. by removing the appropriate kernel module under Linux.) We are aware that the performance of removing the entire content of the connection tracking table at one time may be different from removing all the entries one by one.

The connection tracking table capacity is an important metric of stateful NATxy gateways. Its measurement is not easy, because an elementary step of a validated maximum connection establishment rate measurement (defined in ) may have only a few distinct observable outcomes, but some of them they may have different root causes: During the preliminary phase, the number of test frames received by the Responder is less than the number of test frames sent by the Initiator. It may have different root causes, including: The R frame sending rate was higher than the maximum connection establishment rate. (Note that now the maximum connection establishment rate is considered unknown, because we can not measure the maximum connection establishment without our assumption 2 in !) This root cause may be eliminated by lowering the R rate and re-executing the test. (This step may be performed multiple times, while R>0.) The capacity of the connection tracking table of the DUT has been exhausted. (And either the DUT does not want to delete connections or the deletion of the connections makes it slower. This case is not investigated further in the preliminary phase.) During the preliminary phase, the number of test frames received by the Responder equals the number of test frames sent by the Initiator. In this case the connections are validated in the Real Test Phase. The validation may have two kinds of observable results: The number of validation frames received by the Initiator equals the number of validation frames sent by the Responder. (It proves that the capacity of the connection tracking table of the DUT is enough and both R and r were chosen properly.) The number of validation frames received by the Initiator is less than the number of validation frames sent by the Responder. This phenomenon may have various root causes: The capacity of the connection tracking table of the DUT has been exhausted. (It does not matter, whether some existing connections are discarded and new ones are stored, or the new connections are discarded. Some connections are lost anyway, and it makes validation fail.) The R frame sending rate used by the Initiator was too high in the Preliminary Phase and thus some connections were not established, even though all test frames arrived to the Responder. This root cause may be eliminated by lowering the R rate and re-executing the test. (This step may be performed multiple times, while R>0.) The r frame sending rate used by the Responder was too high in the Real Test Phase and thus some test frames did not arrive to the Initiator, even though all connections were present in the connection tracking table of the DUT. This root cause may be eliminated by lowering the r rate and re-executing the test. (This step may be performed multiple times, while r>0.) And here is the problem: as the above three root causes are indistinguishable, it is not easy to decide, whether R or r should be decreased. We have some experience with benchmarking stateful NATxy gateways. When we tested iptables with very high number of connections, the 256GB RAM of the DUT was exhausted and it stopped responding. Such a situation may make the connection tracking table capacity measurements rather inconvenient. We include this possibility in our recommended measurement procedure, but we do not address the detection and elimination of such a situation. (E.g. how the algorithm can reset the DUT.) For the connection tracking table size measurement, fist we need a safe number: C0. It is a precondition, that C0 number of connections can surely be stored in the connection tracking table of the DUT. Using C0, one can determine the maximum connection establishment rate using C0 number of connections. It is done with a binary search using validation. The result is: R0. The values C0 and R0 will serve as "safe" starting values for the following two searches. First, we perform an exponential search to find the order of magnitude of the connection tracking table capacity. The search stops if the DUT collapses OR the maximum connection establishment rate severely drops (e.g. to its one tenth) due to doubling the number of connections. Then, the result of the exponential search gives the order of magnitude of the size of the connection tracking table. Before disclosing the possible algorithms to determine the size of the connection tracking table, we consider a three possible replacement policies of the NATxy gateway: The gateway does not delete any live connections until their timeout expires. The gateway replaces the live connections according to LRU (least recently used) policy. The gateway does a garbage collection, when its connection tracking table is full and a frame with a new four tuple arrives. During the garbage collection, it deletes the K least recently used connections, where K greater than 1. Now, we examine, what happens and how many validation frames arrive in the there cases. Let the size of the connection tracking table be S, and the number of preliminary frames be N, where S is less than N. The connections defined by the first S test frames are registered into the connection tracking table of the DUT, and the last N-S connections are lost. (It is a another question if the last N-S test frames are translated and forwarded in the preliminary or simply dropped.) During validation, the validation frames with four tuples corresponding to the first S test frames will arrive to the Initiator, and the other N-S validation frames will be lost. All connections are registered into the connection tracking table of the DUT, but the first N-S connections are replaced (and thus lost). During validation, the validation frames with four tuples corresponding to the last S test frames will arrive to the Initiator, and the other N-S validation frames will be lost. Depending on the values of K, S and N, maybe less than S connections will survive. In the worst case, only S-K+1 validation frames arrive, even though, the size of the connection tracking table is S. If we know that the stateful NATxy gateway uses the first or second replacement policy, and we also know that both R and r rates are low enough, then the final step of determining the size of the connection tracking table is simple. If Responder sent N validation frames and the Initator received N' of them, then the size of the connection tracking table is N'. In the general case, we perform a binary search to find the exact value of the connection tracking table capacity within E error. The search chooses the lower half of the interval if the DUT collapses OR the maximum connection establishment rate severely drops (e.g. to its half) otherwise it chooses the higher half. The search stops if the size of the interval is less than the E error. The algorithms for the general case are defined using C like pseudocode in . In practice, this algorithm may be made more efficient in a way that the binary search for the maximum connection establishment rate stops, if an elementary test fails at a rate under RS*beta or RS*gamma during the external search or during the final binary search for the capacity of the connection tracking table, respectively. (This saves a lot a execution time by eliminating the long lasting tests at low rates.)

E; D=CT-CS ) { C=(CS+CT)/2; R=binary_search_for_maximum_connection_establishment_rate(C,RS); if ( DUT_collapsed || R < RS*gamma) CT=C; // take the lower half of the interval else CS=C,RS=R; // take the upper half of the interval } // here the size of the connection tracking table is CS within E error ]]>

As for writing policy of the state table of the Responder, we RECOMMEND round robin, because it ensures that its entries are automatically kept fresh and consistent with that of the connection tracking table of the DUT. The Responder can read its state table in various orders, for example: pseudorandom round robin We RECOMMEND pseudorandom to follow the spirit of . Round robin may be used as a computationally cheaper alternative.