<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-mzhang-nfsv4-recursively-setting-04"
     ipr="trust200902">
  <front>
    <title abbrev="Recursively Setting Attributes">Recursively Setting
    Attributes of Subdirectories and files</title>

    <author fullname="Minqian Zhang" initials="M." surname="Zhang">
      <organization>Huawei Technologies</organization>

      <address>
        <postal>
          <street>1899 Xiyuan</street>

          <city>Chengdu</city>

          <code>611731</code>

          <region>High-tech West District</region>

          <country>China</country>
        </postal>

        <phone>+86-13547833949</phone>

        <facsimile/>

        <email>zhangmingqian.zhang@huawei.com</email>
      </address>
    </author>

    <author fullname="Sunil Kumar Bhargo" initials="S." surname="Bhargo">
      <organization>VMware</organization>

      <address>
        <postal>
          <street/>
        </postal>

        <phone>+</phone>

        <facsimile/>

        <email>marx_bhargav@yahoo.com</email>
      </address>
    </author>

    <author fullname="Rijesh Kunhi Parambattu" initials="R."
            surname="Parambattu">
      <organization>Huawei Technologies</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <code/>

          <region/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>rijesh.kunhi.parambattu1@huawei.com</email>
      </address>
    </author>

    <author fullname="Dongyu Geng" initials="D." surname="Geng">
      <organization>Huawei Technologies</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <code/>

          <region/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>gengdongyu@huawei.com</email>
      </address>
    </author>

    <author fullname="Yunfei Du" initials="Y." surname="Du">
      <organization>Huawei Technologies</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <code/>

          <region/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>duyunfei1@huawei.com</email>
      </address>
    </author>

    <date day="26" month="July" year="2024"/>

    <area>Transport Area</area>

    <workgroup>Network File System Version 4</workgroup>

    <keyword>Recursively setting</keyword>

    <abstract>
      <t>In the recent years, the concept of near-data computing has been
      widely recognized in storage architectures. The core idea is to process
      data nearby, reduce the overhead of network transmission, and utilize
      the computing capability of smart devices (such as intelligent NICs,
      smart SSDs, and DPUs). This reduces CPU and memory usage of clients
      (computing nodes) and improves data processing efficiency. This design
      idea is applied in NFSv4.2 or future NFS versions, such as Server-Side
      Copy, in which client sends the control command and the storage server
      copies data without transmitting between client and server. Compared
      with traditional copy operations, data is read from the source storage
      server and then written to the target storage server after two network
      transmissions. Data transmission on the network is reduced, and
      bandwidth resources are greatly released. In addition, the client
      changes from an original data copy executor to a data copy controller,
      and a specific execution action is executed by the storage server.
      Therefore, a large amount of computing resources and memory resources
      are saved on the client side.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119">RFC 2119</xref>.</t>
    </note>
  </front>

  <middle>
    <section title="Problem Statement">
      <t>In actual storage applications, users often recursively set the
      attributes of directories and subitems(their subfiles and
      subdirectories). Message interaction between client and server is
      complex, and the client consumes a lot of resources, which does not
      match the concept of near-data computing. FIG. 1 shows the existing
      flowchart of recursively setting the attributes of all files under
      directory.</t>

      <t>Step 1: The client sends the READDIR command to obtain the list of
      all files in dir1.</t>

      <t>Step 2: The storage server responds to the READDIR operation. If the
      directory contains many subdirectories and files the client needs to run
      the READDIR operation for multiple times.</t>

      <t>Step 3: The client sends a SETATTR request for each subdirectory and
      file.</t>

      <t>Step 4: The storage server responds to the SETATTR request.</t>

      <t>If the parent directory contains 100,000 files, the client needs to
      repeat step 3 and 4 for 100,000 files. The whole process consumes more
      CPU resources and memory resources of the client, and a large number of
      RPC messages are exchanged between the client and the storage server. As
      a result, an end-to-end time for the attribute set operation is
      relatively long.</t>

      <t><figure>
          <preamble>preamble to the figure.</preamble>

          <artwork><![CDATA[      
                 Client                                Server
                 +                                       +
                 |                                       |
                 |------ READDIR ----------------------->|
                 |<--------------------------------------|
                 |------ GETATTR ----------------------->|
                 |<--------------------------------------|
                 |------ SETATTR ----------------------->|
                 |<--------------------------------------|
                 |         ....                          |            
                 |                                       |

       Figure 1: Existing flowchart for recursive set operation

]]></artwork>

          <postamble>As you can see, this figure doodled and
          dawdled.</postamble>
        </figure></t>

      <t>Similar to the design of Server-Side Copy, in this proposal we
      propose four new operations to be used to recursively set the attributes
      of a directory and its subdirectories and subfiles. These operations can
      be in synchronous or asynchronous mode. These four new operations are
      RECURSIVE_SET, RECURSIVE_SET_STATUS, RECURSIVE_SET_CANCEL and
      CB_RECURSIVE_SET_NOTIFY.</t>

      <t>RECURSIVE_SET is used by client to request set attributes of the
      directories and files.</t>

      <t>RECURSIVE_SET_STATUS is used by client to query the status of the
      recursively set operation requested by operation RECURSIVE_SET.</t>

      <t>RECURSIVE_SET_CANCEL is used by client to cancel the recursively set
      operation.</t>

      <t>CB_RECURSIVE_SET_NOTIFY is used by server to notify client that the
      recursively set operation finished.</t>
    </section>

    <section anchor="IANA" title="Protocol Overview">
      <t>After adopting the concept of near data calculation, the above
      scenario can be optimized.</t>

      <t>Step 1: The client identifies that the object of the attribute
      setting is a directory and the attribute setting is recursive, and
      invokes the new operation RECURSIVE_SET in compound request, e.g.</t>

      <t>Compound request:</t>

      <t>SEQUENCE</t>

      <t>PUTFH (directory filehandle)</t>

      <t>RECURSIVE_SET</t>

      <t>SETATTR</t>

      <t>RECURSIVE_SET_STATUS</t>

      <t>Step 2: The storage server receives the compound request with
      RECURSIVE_SET operation before SETATTR, server identifies the filehandle
      as a directory filehandle create a recursively set task and start
      recursively querying all files in the directory, sets attributes for
      each file. If filehandle refers to a regular file, server SHOULD return
      NFS4ERR_NOTDIR.</t>

      <t>Step 3: The storage server responds to the request once the recursive
      set operation completes setting attributes of all subdirectories and
      files. RECURSIVE_SET can be one of the two types either synchronous or
      asynchronous. If client choose synchronous RECURSIVE_SET, server must
      respond to the client, once server finishes the operation. If the server
      fails to complete the attribute set within the timeout, the server
      responds to the client with the error code NFS4ERR_PENDING, with
      recursive task id and verifier to the client. Client queries the result
      periodically till the operation is completed on server side. The
      recommended timeout for the synchronous operation can be one third of
      the lease timeout.</t>

      <t>If client choose asynchronous setting, server will immediately return
      the error code NFS4ERR_PENDING with recursive task id and verifier to
      the client and client will start an observer task to monitor the server.
      Server will send callback operation to the client once server finishes
      RECURSIVE_SET operation. Client will terminate the observer task once
      client receives the callback notification from server.</t>

      <t>Compared to the original iterative process, the proposed process not
      only saves the CPU and memory usage of the client, but also
      significantly reduces the number of RPC&rsquo;s exchanged between the
      client and server. This greatly improves the performance of setting
      attributes in subdirectories and files.</t>

      <t>o If no backchannel is created when the client and server establish a
      connection, the client can only use the synchronous mode in the
      RECURSIVE_SET request. If the client uses the asynchronous mode, the
      server returns the error code NFS4ERR_CB_PATH_DOWN.</t>

      <t>o If a backchannel is already established the client can choose to
      use synchronous or asynchronous mode. Server reboot When server reboot,
      the client will get NFS4ERR_BADSESSION. Client SHOULD retry the
      RECURSIVE_SET operation after re-establishing the Clientid and after
      RECLAIM_COMPLETE procedure.</t>

      <t>Client reboot</t>

      <t>If the client sends the RECURSIVE_SET operation and later there is a
      network disruption between the client and server, the client lease may
      expire. After the lease expiration the server will terminate the
      RECURSIVE_SET operation, which might result in partially modified
      files/directories under the parent directory on which the RECURSIVE_SET
      operation was executed.</t>

      <t>Lease Consideration</t>

      <t>RECURSIVE_SET operation is tied to specific client instance, so if
      the client lease has expired the server should cancel the RECURSIVE_SET
      operation. In case of there are huge number of files need to be set
      attributes, the server can determine the timeout but the timeout must be
      lesser than lease time.</t>

      <t>Backchannel Consideration</t>

      <t>Before client initiate the RECURSIVE_SET operation to the server, the
      client MUST check if the client has a backchannel established with the
      server. If there is no backchannel then client MUST use only synchronous
      RECURSIVE_SET operation. If there is an existing backchannel then the
      client can use either synchronous or asynchronous RECURSIVE_SET
      operationrecursively setting. If the server wants to send a callback
      operation over the backchannel of a session and no backchannel exists
      for the session, the server cannot establish the backchannel because
      only the client can associate connections with the backchannel. If there
      is no such connection, the server indicate that the session has no
      backchannel by setting the SEQ4_STATUS_CB_PATH_DOWN_SESSION flag bit in
      the response to the next SEQUENCE operation from the client. The client
      then associate a connection with the session (or destroy the
      session).</t>

      <t>Grace Consideration</t>

      <t>RECURSIVE_SET operation must honor the server grace time. During
      server grace period, server should NFS4ERR_GRACE to the client and the
      client should retry the request till the grace period is over.</t>

      <t>Position Consideration</t>

      <t>RECURSIVE_SET operation MUST not be the first operation of the
      compound request and compound operation containing the RECURSIVE_SET op
      should always have the SEQUENCE as the first operation.</t>

      <t>Note to RFC Editor: this section may be removed on publication as an
      RFC.</t>
    </section>

    <section anchor="Security" title="Implementation Considerations">
      <t>A recommended Recursive Set operation in synchronous mode is shown in
      Figure 2.</t>

      <t>Step 1: The client sends a RECURSIVE_SET request. In the request,
      rsa_sync must be set to true.</t>

      <t>Step 2: If the storage server completes to recursively set the
      attributes within the timeout period, the storage server returns the
      result back to the client. If the attributes are not set within the
      timeout period, the server must generate rsr_callback_id and
      rsr_recursiveverf and return back to client. In addition, server must
      respond the client with NFS4ERR_PENDING.</t>

      <t>Step 3: The client sends a RECURSIVE_SET_STATUS query request. The
      request contains the information of rss_recursive_taskid. The value of
      rss_recursive_taskid should be set to rsr_callback_id which is obtained
      from the response of RECURSIVE_SET operation if the value of
      rss_recursive_taskid is the same as the value of rsr_callback_id cached
      on the storage server, the storage server returns the current status of
      the attribute set operation. Storage server return NFS4_OK if the server
      has set all the attributes, or NFS4ERR_PENDING if the operation is still
      in progress. If the server has encountered error during the attribute
      setting, then the result code must be cached and must be set in the
      response. If the value of rss_recursive_taskid in the request is
      different from the value cached on the server, the storage server
      returns the error code NFS4ERR_INVAL.</t>

      <t>Step 4: The client decodes the response. If the response is
      NFS4_PENDING, the client would retry the RECURSIVE_SET operation again,
      after a delay period. If the error code returned by the server is
      NFS4_OK, the recursive attribute setting is successful. If SETATTR
      operation has encountered an error, the recursive attribute setting
      fails. In this case, the client returns a response to the
      application.</t>

      <t>.</t>

      <t><figure>
          <preamble>preamble to the figure.</preamble>

          <artwork><![CDATA[      
                 Client                                                     Server
                 +                                                             +
                 |                                                             |
                 |------ RECURSIVE_SET(rsa_sync = 1) ------------------------> |
                 |                                                             |
                 |<-----Response(rsr_callback_id = 0, rsr_recursiveverf = 0)---|  within the timeout period
                 |                                                             |  
                 |                                                             |
                 |<----Response(rsr_callback_id = 1, rsr_recursiveverf = 1)----|  beyond the timeout period
                 |                                                             |  
                 |                                                             |
                 |                                                             |
                 |-------RECURSIVE_SET_STATUS(rss_recursive_taskid = 1)------> |
                 |                                                             |
                 |<------Response--------------------------------------------- |
                 |                                                             |   
                 |                                                             |          

                           Figure 2:  A synchronous Recursive Set

]]></artwork>

          <postamble>As you can see, this figure doodled and
          dawdled.</postamble>
        </figure></t>

      <t>An alternative Recursive Set operation in asynchronous mode is also
      given in Figure 3.</t>

      <t>Step 1: The client sends a RECURSIVE_SET request. In the request,
      rsa_sync flag should be set to false.</t>

      <t>Step 2: The storage server needs to generate rsr_callback_id and
      rsr_recursiveverf, and set the error code to NFS4ERR_PENDING. The
      storage server continue executing the recursive setting operation.</t>

      <t>Step 3: After receiving the response, and if the error code is
      NFS4ERR_PENDING, the client starts an asynchronous task to receive the
      callback message from the server.</t>

      <t>Step 4: The client creates an asynchronous listening task and matches
      rsr_callback_id and rsr_recursiveverf. Client matches rsr_callback_id
      and rsr_recursiveverf, and if both the parameters match then the
      response is a valid response. If rsr_callback_id can be matched but
      rsr_recursiveverf cannot be matched, client skip the message.</t>

      <t>Step 5: If the client does not receive the asynchronous message, the
      started task is forcibly terminated when the session is destroyed.</t>

      <t>If an error occurs when the storage server recursively set attributes
      of subdirectories and files, the storage server terminates the task and
      returns the error code to the client. All possible errors are subject to
      the error codes defined by SETATTR.</t>

      <t><figure>
          <preamble>preamble to the figure.</preamble>

          <artwork><![CDATA[   
                 Client                                                     Server
                 +                                                             +
                 |                                                             |
                 |------ RECURSIVE_SET(rsa_sync = 0) ------------------------->|
                 |                                                             |
                 |<------Response(rsr_callback_id = 1, rsr_recursiveverf = 1)--|  
                 |                                                             |  
                 |                                                             |
                 |<------CB_RECURSIVE_SET_NOTIFY-------------------------------|
                 |                                                             | 
                 |                                                             |
                 |                                                             |

                            Figure 3: An asynchronous Recursive Set

]]></artwork>

          <postamble>As you can see, this figure doodled and
          dawdled.</postamble>
        </figure></t>

      <t/>
    </section>

    <section anchor="Acknowledgements" title="Recursive Set Operations">
      <t>4.1 4.1 Operation TBD1: RECURSIVE_SET &ndash; Recursively sets the
      attributes of a directory and its subdirectories and files.</t>

      <t>ARGUMENT</t>

      <t>&lt;CODE BEGINS&gt;</t>

      <t>Struct RECURSIVE_SET4args {</t>

      <t>bool rsa_sync;</t>

      <t>};</t>

      <t>&lt;CODE ENDS&gt;</t>

      <t>RESULT</t>

      <t>&lt;CODE BEGINS&gt;</t>

      <t>struct recursive_set_response4 {</t>

      <t>recursive_taskid4 rsr_callback_id;</t>

      <t>verifier4 rsr_recursiveverf;</t>

      <t>};</t>

      <t>union RECURSIVE_SET4res (nfsstat4 rsr_status) {</t>

      <t>case NFS4_OK:</t>

      <t>recursive_set_response4 rsr_resok4;</t>

      <t>default:</t>

      <t>void;</t>

      <t>};</t>

      <t>&lt;CODE ENDS&gt;&gt;</t>

      <t>DESCRIPTION</t>

      <t>The RECURSIVE_SET operation is used by the client to recursively set
      the attributes of a directory and all its subdirectories and files. The
      operation should be placed before SETATTR in the compound operation.
      After the storage server receives the SETATTR combination operation, if
      the SETATTR operation is not preceded by RECURSIVE_SET, the original
      process remains unchanged. If the SETATTR operator is preceded by the
      RECURSIVE_SET operation, the storage server considers the attributes of
      the directory and its subdirectories and files to initiate recursive set
      mode.</t>

      <t>If the storage is successfully executed, the values of
      rsr_callback_id and rsr_recursiveverf are 0.</t>

      <t>If the recursive SETATTR operation in storage is not complete within
      the timeout period, the values of rsr_callback_id and rsr_recursiveverf
      are generated.</t>

      <t>If rsa_sync is set to true, then client can choose one of the below
      implementation.</t>

      <t>o According to the NFSv4 protocol, the client must wait for the
      response from the server. Therefore, the client can wait for the
      processing result from the server. A problem in this mode is that the
      current request occupies a slot in a session, resulting in a decrease in
      the number of available slots. If multiple tasks of the same type are
      being executed, no slot is available on the client in extreme cases.</t>

      <t>o The storage server determines the implementation duration. If the
      implementation duration is too long, the storage server may return
      non-zero values of rsr_callback_id and rsr_recursiveverf.</t>

      <t>After the client receives the request, the client waits for a period
      of time and executes RECURSIVE_SET_STATUS to query the execution
      progress of the current task. If the server does not finish the
      execution, NFS4ERR_PENDING is returned. After receiving the error code,
      the client retries the query after a period of time. If the execution is
      complete, NFS4_OK is returned.</t>

      <t>4.2 Operation TBD2: RECURSIVE_SET_STATUS &ndash; Query the result of
      the recursively setting the attributes of subdirectories and files</t>

      <t>ARGUMENT</t>

      <t>&lt;CODE BEGINS&gt;</t>

      <t>struct RECURSIVE_SET_STATUS4args {</t>

      <t>stateid4 rssa_recursive_taskid;</t>

      <t>};</t>

      <t>&lt;CODE ENDS&gt;</t>

      <t>RESULT</t>

      <t>&lt;CODE BEGINS&gt;</t>

      <t>#define NFS4ERR_PENDING 10090</t>

      <t>struct RECURSIVE_SET_STATUS4res {</t>

      <t>nfsstat4 rssr_status;</t>

      <t>};</t>

      <t>&lt;CODE ENDS&gt;</t>

      <t>DESCRIPTION</t>

      <t>rssa_recursive_taskid is the value same to rsr_callback_id in
      RECURSIVE_SET response. The RECURSIVE_SET_STATUS operation is used by
      the client to query the status of a recursively set task (attributes of
      subdirectories and files). Server must check if rssa_recursive_taskid
      match the task id in server and if the task on the storage server is
      complete, NFS4_OK is returned. If any error occurs during task
      execution, a response error code is returned and the error code is not
      extended or modified in this case so the error code is the same as the
      error code that may occur during the SETATTR operation. If the current
      setting task is not complete, NFS4_PENDING is returned.</t>

      <t>4.3 Operation TBD3: RECURSIVE_SET_CANCEL &ndash; Canceling a Running
      Task on the Client</t>

      <t>ARGUMENT</t>

      <t>&lt;CODE BEGINS&gt;</t>

      <t>//The following operation is used to cancel the recursive setting
      task that is being executed.</t>

      <t>struct RECURSIVE_SET_CANCEL4args {</t>

      <t>stateid4 rsca_recursive_taskid;</t>

      <t>};</t>

      <t>&lt;CODE ENDS&gt;</t>

      <t>RESULT</t>

      <t>&lt;CODE BEGINS&gt;</t>

      <t>struct RECURSIVE_SET_CANCEL4res {</t>

      <t>nfsstat4 rscr_status;</t>

      <t>};</t>

      <t>&lt;CODE ENDS&gt;</t>

      <t>DESCRIPTION</t>

      <t>RECURSIVE_SET_CANCEL is used to cancel the task that is being
      executed. The request packet contains rsca_recursive_taskid. The value
      of rsca_recursive_taskid is obtained from the response of RECURSIVE_SET.
      If the storage server fails to cancel the task, NFS4ERR_DELAY is
      returned. When receiving the message, the client delays the retry. If
      the current task is complete, NFS4_OK is returned.</t>

      <t>4.4 Operation TBD4: CB_RECURSIVE_SET_NOTIFY &ndash; Notify the
      recursively setting result to client</t>

      <t>ARGUMENT</t>

      <t>&lt;CODE BEGINS&gt;</t>

      <t>struct CB_RECURSIVE_SET_NOTIFY4args {</t>

      <t>nfs_fh4 crsna_fh;</t>

      <t>stateid4 crsna_recursive_taskid;</t>

      <t>verifier4 crsna_recursiveverf;</t>

      <t>nfsstat4 crsna_status;</t>

      <t>};</t>

      <t>&lt;CODE ENDS&gt;</t>

      <t>RESULT</t>

      <t>&lt;CODE BEGINS&gt;</t>

      <t>struct CB_RECURSIVE_SET_NOTIFY4res {</t>

      <t>nfsstat4 crsnr_status;</t>

      <t>};</t>

      <t>&lt;CODE ENDS&gt;</t>

      <t>DESCRIPTION</t>

      <t>CB_RECURSIVE_SET_NOTIFY is used to send the server callback to client
      to notify the client of the result of the task of recursively setting
      the attributes of subdirectories and files. Client check the
      crsna_recursive_taskid and crsna_recursiveverf and client will finish
      the wait task if they are matched the value received from previous
      RECURSIVE_SET response or will skip the notification in case of not
      match and return NFS4ERR_INVAL to server.</t>

      <t>Race condition between CB_RECURSIVE_SET_NOTIFY and
      RECURSIVE_SET_STATUS. A race condition can happen if the
      RECURSIVE_SET_STATUS is in flight and server has responded with
      CB_RECURSIVE_SET_NOTIFY. In this case the server would have cleaned up
      the recursive_taskid before the RECURSIVE_SET_STATUS is received from
      client. The server may return NFS4ERR_INVAL, and this should be
      gracefully handled by the client.</t>
    </section>

    <section title="Security Considerations">
      <t>TBD</t>
    </section>

    <section title="IANA Considerations">
      <t>TBD</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>

      <?rfc include="reference.RFC.7862"?>
    </references>

    <references title="Informative References">
      <reference anchor="InfRef">
        <front>
          <title/>

          <author>
            <organization/>
          </author>

          <date year="2004"/>
        </front>
      </reference>
    </references>

    <section title="An Appendix">
      <t/>
    </section>
  </back>
</rfc>
