<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-mzhang-nfsv4-sequence-id-calibration-03"
     ipr="trust200902">
  <front>
    <title abbrev="Sequence ID calibration">Sequence ID calibration for
    mis-ordered requests</title>

    <author fullname="Zhang Mingqian" initials="M" role="editor"
            surname="Zhang">
      <organization>Huawei Technologies Co.</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <code/>

          <region/>

          <country>China</country>
        </postal>

        <phone/>

        <facsimile/>

        <email>zhangmingqian.zhang@huawei.com</email>
      </address>
    </author>

    <author fullname="Yang Jing" initials="J" role="editor" surname="Yang">
      <organization>Huawei Technologies Co.</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country>China</country>
        </postal>

        <phone/>

        <facsimile/>

        <email>yangjing8@huawei.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Sai Chakravarthy Tangudu" initials="C" role="editor"
            surname="Tangudu">
      <organization>Huawei Technologies Co.</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country>India</country>
        </postal>

        <phone/>

        <facsimile/>

        <email>sai.chakravarthy.tangudu@huawei.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Rijesh Kunhi Parambattu" initials="K" role="editor"
            surname="Parambattu">
      <organization>Huawei Technologies Co.</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country>India</country>
        </postal>

        <phone/>

        <facsimile/>

        <email>rijesh.kunhi.parambattu@huawei.com</email>

        <uri/>
      </address>
    </author>

    <date day="21" month="April" year="2023"/>

    <area>Transport</area>

    <workgroup>Network File System Version 4</workgroup>

    <keyword>Mis-ordered request</keyword>

    <abstract>
      <t>This document updates RFC7862, Network File System (NFS) version 4 minor version 2,
      by adding two operations to prevent the client from destroying session
      when getting the reply of a mis-ordered request with
      NFS4ERR_SEQ_MISORDERED.</t>

      <t>In NFSv4 minor version 1, sequence ID is used to ensure that the size
      of the needed reply cache is tightly bounded. If the server gets a
      mis-ordered request, the client will often break the session and
      establish a new session with the server. This approach results in a
      significant burden on the client and the server. During the process of
      session rebuilding, IO performance will be affected. This is especially
      troublesome when network latency is substantial, as, for example when
      client and server are in different locations. This document will propose
      extensions to NFSv4 that would allow client reconnection to be dispensed
      with.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "<bcp14>MUST</bcp14>", "<bcp14>MUST NOT</bcp14>",
      "<bcp14>REQUIRED</bcp14>", "<bcp14>SHALL</bcp14>", "<bcp14>SHALL
      NOT</bcp14>", "<bcp14>SHOULD</bcp14>", "<bcp14>SHOULD NOT</bcp14>",
      "<bcp14>RECOMMENDED</bcp14>", "<bcp14>NOT RECOMMENDED</bcp14>",
      "<bcp14>MAY</bcp14>", and "<bcp14>OPTIONAL</bcp14>" in this document are
      to be interpreted as described in BCP&nbsp;14 <xref format="default"
      sectionFormat="of" target="RFC2119"/> <xref format="default"
      sectionFormat="of" target="RFC8174"/> when, and only when, they appear
      in all capitals, as shown here.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>Using the process detailed in <xref format="default"
      sectionFormat="of" target="RFC8178"/>, the revisions in this document
      become an extension of NFSv4.2 <xref format="default" sectionFormat="of"
      target="RFC7862"/>. They are built on top of the external data
      representation (XDR) <xref format="default" sectionFormat="of"
      target="RFC4506"/> generated from <xref format="default"
      sectionFormat="of" target="RFC7863"/>.</t>

      <t>In NFSv4 minor version 1, according to <xref format="default"
      sectionFormat="of" target="RFC8881"/>, Error code NFS4ERR_SEQ_MISORDERED
      is returned by three operations.<list style="symbols">
          <t>The first operation is CREATE_SESSION. csa_sequence is one
          argument of this operation, which is used for serializing
          CREATE_SESSION via a per-client ID sequence number by the client. In
          CREATE_SESSION request, csa_sequence should be equal to ether the
          sequence ID in the client ID's slot(a retry), or the slot's sequence
          ID + 1(correct normal request). Otherwise, NFS4ERR_SEQ_MISORDERED
          will be returned from the server.</t>

          <t>The second operation is SEQUENCE. sa_sequenceid is one argument
          of this operation. In SEQUENCE request, If the difference between
          sa_sequenceid and the server's cached sequence ID at the slot ID is
          two (2) or more, or if sa_sequenceid is less than the cached
          sequence ID (accounting for wraparound of the unsigned sequence ID
          value), then the server <bcp14>MUST</bcp14> return
          NFS4ERR_SEQ_MISORDERED.</t>

          <t>The third operation is CB_ SEQUENCE, which is similar to
          SEQUENCE. csa_sequenceid is one argument of this operation. In CB_
          SEQUENCE request, If the difference between csa_sequenceid and the
          client&rsquo;s cached sequence ID at the slot ID is two (2) or more,
          or if csa_sequenceid is less than the cached sequence ID (accounting
          for wraparound of the unsigned sequence ID value), then the client
          <bcp14>MUST</bcp14> return NFS4ERR_SEQ_MISORDERED.</t>
        </list></t>

      <t>Mis-order requests may happen as a result of network partition,
      software bug, etc. For such request, the operations subsequent to
      SEQUENCE, if any, are not processed, and so slots state (sequence ID,
      cached reply) are not changed. That means, requests before this
      mis-ordered one were processed correctly and the session state was
      correct. In the current implementation, for most of the clients, this
      error code will trigger the requester breaking the session and creating
      a new session. This process unacceptably interferes with ongoing IO
      operations, especially for the IOs on the normal slots.</t>

      <t>For example of a persistent session, there are several slots on a
      session. Requests on the slots are being processed correctly and replies
      are being received normally. Suppose on one slot of them, a mis-ordered
      request is received by the server and a response with
      NFS4ERR_SEQ_MISORDERED error returns to the client. Then, the client is
      going to destroy the session and establish a new session. Before the new
      session is ready, new requests will not be performed until the pending
      operations finished. The effects on IOs of normal slots will become
      dramatic especially in network latency is substantial, as, for example
      when client and server are in different locations. The client has to
      break the session because it does not know what sequence is expected for
      that session and slot. This current cached sequence information would be
      available to the client eliminating any need to break the session.</t>

      <t>Two operations, SEQENCE_QUERY and CB_SEQENCE_QUERY, are added to
      query sequence ID cached when getting NFS4ERR_SEQ_MISORDERED error.</t>
    </section>

    <section title="Operations for Seqence ID calibration">
      <section title="Operation 76: SEQENCE_QUERY-Query sequence ID of designated session and slot for calibration">
        <section title="ARGUMENTS">
		  <sourcecode name="" type="" markers="true"><![CDATA[
///
/// /*
/// * structure for sequenceid query
/// */
/// struct SEQUENCE_QUERY4args {
/// sessionid4 sqa_sessionid;
/// slotid4 sqa_slotid;
/// };
///
 ]]>
			</sourcecode>
        </section>

        <section title="RESULTS">
			<sourcecode name="" type="" markers="true"><![CDATA[
///
/// struct SEQUENCE_QUERY4resok {
/// sessionid4 sqr_sessionid;
/// slotid4 sqr_slotid;
/// sequenceid4 sqr_sequenceid;
/// };
/// union SEQUENCE_QUERY4res switch (nfsstat4 sqr_status) {
/// case NFS4_OK: SEQUENCE_QUERY4resok sqr_resok4;
/// default: void;
/// };
///
 ]]>
			</sourcecode>
        </section>

        <section title="DESCRIPTION">
          <t>The SEQUENCE_QUERY operation is used by the client to query
          sequence ID cached for designated session and slot.</t>

          <t>SEQUENCE_QUERY <bcp14>MUST</bcp14> appear as the sole operation
          type of any COMPOUND in which it appears. Multiple SEQUENCE_QUERY
          operations can be used in one COMPOUND to query multiple sequence
          IDs cached for multiple slots. The error NFS4ERR_NOT_ONLY_OP will be
          returned when that constraint is not met. Operations other than
          SEQUENCE, SEQENCE_QUERY, BIND_CONN_TO_SESSION, EXCHANGE_ID,
          CREATE_SESSION, and DESTROY_SESSION, <bcp14>MUST NOT</bcp14> appear
          as the first operation in a COMPOUND.</t>

          <t>If SEQUENCE_QUERY is received on a connection not associated with
          the session via CREATE_SESSION or BIND_CONN_TO_SESSION, and
          connection association enforcement is enabled (see Section 18.35),
          then the server returns NFS4ERR_CONN_NOT_BOUND_TO_SESSION.</t>

          <t>The sqa_sessionid argument identifies the session to which this
          request applies. The sqr_sessionid result <bcp14>MUST</bcp14> equal
          sqa_sessionid.</t>

          <t>The sqa_slotid argument is the index in the reply cache for the
          request. The sqr_slotid result <bcp14>MUST</bcp14> equal
          sqa_slotid.</t>

          <t>The sqr_sequenceid field is the cached sequence ID on the slot.
          The client <bcp14>SHOULD</bcp14> use this value to calibrate
          sa_sequenceid in the next SEQUENCE operation, that is,
          sqr_sequenceid+1 <bcp14>SHOULD</bcp14> be used as the sequence ID of
          the next request on this slot.</t>
        </section>

        <section title="IMPLEMENTATION">
          <t>For CREATE_SESSION, SEQUENCE operations, if the sequence ID in
          the request is mis-ordered(see <xref format="default"
          sectionFormat="of" target="RFC8881"/> 18.46.3 Section), the replier
          will fail the request by NFS4ERR_SEQ_MISORDERED and keep the reply
          cache unchanged on the slot of this session. When getting
          NFS4ERR_SEQ_MISORDERED error code in the response, the client
          <bcp14>SHOULD</bcp14> query the cached sequence ID of the slot and
          session by SEQUENCE_QUERY to calibrate its sequence ID for the
          subsequent requests. That is, the sequence ID in next request on
          this slot <bcp14>SHOULD</bcp14> be sqr_sequenceid+1.</t>

          <t>SEQUENCE_QUERY will leave the state of the slot (sequence ID,
          cached reply) unchanged and lease of state related to the client ID
          not renewed.</t>

          <t>If the client is querying an unknown session ID to the server,
          the server <bcp14>SHOULD</bcp14> return NFS4ERR_BADSESSION in the
          response.</t>

          <t>If the client is attempting to access a slot the replier does not
          have in its slot table (It is possible the slot may have been
          retired), NFS4ERR_BADSLOT <bcp14>SHOULD</bcp14> be returned in the
          response.</t>
        </section>
      </section>

      <section title="Operation 16:CB_SEQUENCE_QUERY- Query backchannel sequence ID of designated session and slot for calibration">
        <section title="ARGUMENT">
		  <sourcecode name="" type="" markers="true"><![CDATA[
///
/// /*
/// * callback program structure for sequenceid query
/// */
/// struct CB_SEQUENCE_QUERY4args {
/// sessionid4 csqa_sessionid;
/// slotid4 csqa_slotid;
/// };
///
 ]]>
			</sourcecode>
        </section>

        <section title="RESULT">
		  <sourcecode name="" type="" markers="true"><![CDATA[
///
/// struct CB_SEQUENCE_QUERY4resok {
/// sessionid4 csqr_sessionid;
/// slotid4 csqr_slotid;
/// sequenceid4 csqr_sequenceid;
/// };
/// union CB_SEQUENCE_QUERY4res switch (nfsstat4 csqr_status){
/// case NFS4_OK: CB_SEQUENCE_QUERY4resok csqr_resok4;
/// default: void;
/// };
///
 ]]>
			</sourcecode>
        </section>

        <section title="DESCRIPTION">
          <t>CB_SEQUENCE_QUERY is used to calibrate sequence ID of the call
          back request of the server for the backchannel of the session.</t>

          <t>For each CB_COMPOUND request, the first operation
          <bcp14>MUST</bcp14> be CB_SEQUENCE or CB_SEQUENCE_QUERY. If any
          other operation is in the first position of CB_COMPOUND except
          CB_SEQUENCE_QUERY and CB_SEQUENCE, NFS4ERR_OP_NOT_IN_SESSION
          <bcp14>MUST</bcp14> be returned.</t>

          <t>If the first operation is CB_SEQUENCE, CB_SEQUENCE
          <bcp14>MUST</bcp14> appear once. The error NFS4ERR_SEQUENCE_POS
          <bcp14>MUST</bcp14> be returned when CB_SEQUENCE is found in any
          position in a CB_COMPOUND beyond the first. If the first operation
          of a CB_COMPOUND is CB_SEQUENCE_QUERY, CB_SEQUENCE_QUERY
          <bcp14>MUST</bcp14> be the sole operation type. There can be
          multiple CB_SEQUENCE_QUERY in this CB_COMPOUND to request multiple
          cached sequence IDs of designated sessions and slots. If any other
          operations are found in this CB_COMPOUND, NFS4ERR_NOT_ONLY_OP
          <bcp14>MUST</bcp14> be returned.</t>

          <t>The csqa_sessionid argument identifies the session to which this
          request applies. The csqr_sessionid result <bcp14>MUST</bcp14> equal
          csqa_sessionid.</t>

          <t>The csqa_slotid argument is the index in the reply cache for the
          request. The csqr_slotid result <bcp14>MUST</bcp14> equal
          sqa_slotid.</t>

          <t>The csqr_sequenceid field is the cached sequence ID on the slot.
          The server <bcp14>SHOULD</bcp14> use this value to calibrate
          csa_sequenceid in the next SEQUENCE operation, that is,
          csqr_sequenceid+1 <bcp14>SHOULD</bcp14> be used as the sequence ID
          of the next request on this slot.</t>
        </section>

        <section title="IMPLEMENTATION">
          <t>For CB_SEQUENCE operations, if the sequence ID in the call back
          request is mis-ordered(see <xref format="default" sectionFormat="of"
          target="RFC8881"/> 20.9 Section), the client will fail this request
          by NFS4ERR_SEQ_MISORDERED and keep the reply cache unchanged on the
          slot of this session. When getting NFS4ERR_SEQ_MISORDERED error code
          in the response, the server <bcp14>SHOULD</bcp14> query the cached
          sequence ID of the slot and session by CB_SEQUENCE_QUERY to
          calibrate its sequence ID for the subsequent requests. That is, the
          sequence ID in next call back request on this slot should be
          csqr_sequenceid+1. CB_SEQUENCE_QUERY will leave the state of the
          slot (sequence ID, cached reply) unchanged and the reply of
          CB_SEQUENCE_QUERY will not renew the lease of state related to the
          client ID on the server side.</t>

          <t>If the server is querying an unknown session ID to the client,
          the client <bcp14>SHOULD</bcp14> return NFS4ERR_BADSESSION in the
          response.</t>

          <t>If the server is attempting to access a slot the client does not
          have in its slot table (It is possible the slot may have been
          retired), NFS4ERR_BADSLOT <bcp14>SHOULD</bcp14> be returned in the
          response.</t>
        </section>
      </section>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This memo includes no request to IANA.</t>

      <t>Note to RFC Editor: this section may be removed on publication as an
      RFC.</t>
    </section>

    <section anchor="xdr_desc" numbered="true" removeInRFC="false"
             title="Extraction of XDR" toc="default">

      <t>This document contains the external data representation (XDR) <xref
      format="default" sectionFormat="of" target="RFC4506"/> description of
      the new open flags for delegating the file to the client. The XDR
      description is embedded in this document in a way that makes it simple
      for the reader to extract into a ready-to-compile form. The reader can
      feed this document into the following shell script to produce the
      machine readable XDR description of the new flags:</t>

      <sourcecode markers="true" name="" type="">#!/bin/sh grep '^ *///' $* |
      sed 's?^ */// ??' | sed 's?^ *///$??'</sourcecode>

      <t>That is, if the above script is stored in a file called "extract.sh",
      and this document is in a file called "spec.txt", then the reader can
      do:</t>

      <sourcecode markers="true" name="" type="">sh extract.sh &lt; spec.txt
      &gt; seqid_calibration.x</sourcecode>

      <t>The effect of the script is to remove leading white space from each
      line, plus a sentinel sequence of "///". XDR descriptions with the
      sentinel sequence are embedded throughout the document.</t>

      <t>Note that the XDR code contained in this document depends on types
      from the NFSv4.2 nfs4_prot.x file (generated from <xref format="default"
      sectionFormat="of" target="RFC7863"/>). This includes both nfs types
      that end with a 4, such as offset4, length4, etc., as well as more
      generic types such as uint32_t and uint64_t.</t>

      <t>While the XDR can be appended to that from <xref format="default"
      sectionFormat="of" target="RFC7863"/>, the various code snippets belong
      in their respective areas of the that XDR.</t>
    </section>

    <section anchor="sec_security" numbered="true" removeInRFC="false"
             title="Security Considerations" toc="default">

      <t>There are no new security considerations beyond those in <xref
      format="default" sectionFormat="of" target="RFC7862"/>.</t>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>The authors would like to acknowledge David Noveck, Thomas Haynes,
      Rick Macklem, Tom Talpey for reviews of the various versions of the
      draft.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>

      <?rfc include='reference.RFC.8881'?>

      <?rfc include='reference.RFC.8174'?>

      <?rfc include='reference.RFC.7862'?>

      <?rfc include='reference.RFC.7863'?>

      <?rfc include='reference.RFC.4506'?>

      <?rfc include='reference.RFC.8178'?>
    </references>

    <section title="An Appendix"/>
  </back>
</rfc>
