<?xml version="1.0" encoding="US-ASCII"?>
<!-- This is built from a template for a generic Internet Draft. Suggestions for
     improvement welcome - write to Brian Carpenter, brian.e.carpenter @ gmail.com 
     This can be converted using the Web service at http://xml.resource.org/ -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<!-- You want a table of contents -->
<!-- Use symbolic labels for references -->
<!-- This sorts the references -->
<!-- Change to "yes" if someone has disclosed IPR for the draft -->
<!-- This defines the specific filename and version number of your draft (and inserts the appropriate IETF boilerplate -->
<?rfc sortrefs="yes"?>
<?rfc toc="yes"?>
<?rfc symrefs="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc topblock="yes"?>
<?rfc comments="no"?>
<rfc category="info" docName="draft-li-coinrg-compute-resource-scheduling-00"
     ipr="trust200902">
  <front>
    <title abbrev="Network Working Group">A Compute Resources Oriented
    Scheduling Mechanism based on Dataplane Programmability</title>

    <author fullname="Zhiqiang Li" initials="Z." surname="Li">
      <organization>China Mobile</organization>

      <address>
        <postal>
          <street/>

          <city>Beijing</city>

          <code>100053</code>

          <country>China</country>
        </postal>

        <email>lizhiqiangyjy@chinamobile.com</email>
      </address>
    </author>

    <author fullname="Kehan Yao" initials="K." surname="Yao">
      <organization>China Mobile</organization>

      <address>
        <postal>
          <street/>

          <city>Beijing</city>

          <code>100053</code>

          <country>China</country>
        </postal>

        <email>yaokehan@chinamobile.com</email>
      </address>
    </author>

    <author fullname="Yang Li" initials="Y." surname="Li">
      <organization>China Mobile</organization>

      <address>
        <postal>
          <street/>

          <city>Beijing</city>

          <code>100053</code>

          <country>China</country>
        </postal>

        <email>liyangzn@chinamobile.com</email>
      </address>
    </author>

    <date day="23" month="October" year="2021"/>

    <area>Routing</area>

    <workgroup>Network Working Group</workgroup>

    <keyword>Compute Force Network, Programmable Network Element, Compute
    Resources Lifetime Management,</keyword>

    <abstract>
      <t>With massive data growing in the internet, how to effectively use the
      compute resources has become a quite hot topic. In order to cool down
      the pressure in today's large data centers, some compute resources have
      been moved towards the edge, gradually forming a distributed Compute
      Force Network. Force is a physical cause which can change the state of a
      motion or an object. We refer the definition from physics and extend its
      philosophy to network that in future, the network can be a compute force
      which can facilitate the integration of different kinds of compute
      resources, no matter hardware or software, making the computation fast
      and effective. In this draft, we present a compute resources oriented
      scheduling mechanism based on dataplane programmability, which can
      effectively schedule and manage compute resources in the network.</t>
    </abstract>
  </front>

  <middle>
    <section anchor="intro" title="Introduction">
      <t>As Moore's law has been gradually reaching its limitation, the
      computation of massive data and diverse computational requirements can
      not be satisfied by simply upgrading the computation resources on a
      single chip. There become an emerging trend that domain specific
      computation resources like GPU, DPU and programmable switches are
      becoming more and more popular, generating diverse use cases in the
      network. For example, in network computing and in memory computing. In
      network computing means using programmable switches or DPUs to offload
      network functions so as to accelerate network speed. And in memory
      computing means that the computer memory does not only serve as the
      storage, but also provide the computation. With the development of these
      domain specific architectures, network should serve as a force which
      could facilitate the integration of all these different types of
      computation resources, in turn forming a Compute Force Network. In CFN,
      how to effectively schedule these computation resources is a topic
      that's worthy of studying.</t>

      <t>Current ways to do compute resources allocation include extending
      protocols like DNS so as to realize the awareness and scheduling of
      compute resources, but the management of these compute resources must be
      done in the centralized controller. a DNS client wants to do some
      computing tasks, e.g. Machine learning models training, and the client
      will send a request to DNS server. Then, DNS server will inform the
      client which compute node is available at the moment. However,
      activating and deactivate this compute node to work, e.g. creating a
      virtual machine, is done by centralized controller, which we think is
      not very efficient and timely, considering massive data waits to be
      computed in the network. The weakness above has provoked an idea to
      realize the scheduling and management of compute resources by extending
      current routing protocols like SRv6 with the help of programmable
      network elements. The detailed design is presented in this draft.</t>
    </section>

    <section title="Conventions Used in This Document">
      <section title="Terminology">
        <t>CFN Compute Force Network</t>

        <t>DNS Domain Name Service</t>

        <t>SRv6 Segment Routing over IPv6</t>

        <t>GPU Graphics Processing Unit</t>

        <t>DPU Data Processing Unit</t>
      </section>

      <section title="Requirements Language">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
        "OPTIONAL" in this document are to be interpreted as described in BCP
        14<xref target="RFC2119"/><xref target="RFC8174"/> when, and only
        when, they appear in all capitals, as shown here.</t>
      </section>
    </section>

    <section title="Design ">
      <t>The detailed design of the mechanism is presented in this section. A
      typical topology will be shown below and the definition of each part of
      the network topology will be given, and then the whole procedure will be
      explained clearly the second subsection.</t>

      <section title="Network Topology">
        <t>The network topology is shown in figure below where there are
        several major parts inside, namely consumer, computation manament
        node, compute node with programmable DPU, and programmable network
        element.</t>

        <figure align="center" title="Figure 1: Network Topology">
          <artwork type="ascii-art">                         +------------------+
                         |Compute node with |
                         |programmable DPU  |
+------------------+     +---------+--------+    +-----------------+
|Compute node with |               |             |Compute node with|
|programmable DPU  |      +--------+------+      |programmable DPU |
+--------+---------+      | programmable  |      +--------+--------+
         |             +--+network element+---+           |
         |             |  +---------------+   |           |
  +------+-------+     |                      |   +-------+-------+
  | programmable +-----+                      +---+ programmable  |
  |network element+----+                      +---+network element|
  +--------------+     |                      |   +---------------+
                       |                      |
                       |    +-----------+     |
                       +----+ Consumer  +-----+              
                            +-----------+   |   +-----------------+
                                            ----+   Computation   |
                                                | management node |
                                                +-----------------+</artwork>
        </figure>

        <t>- Consumer: End node generating computing tasks which need to be
        done by compute resources</t>

        <t>- Compute node: A network node that has the resources to finish
        computing tasks generated by consumers,e.g. a server or a cluster of
        servers.</t>

        <t>- Programmable DPU: An unit that is connected to a compute node and
        a programmable element, responsible for the lifetime management of
        compute node and the communication with programmable element.</t>

        <t>- Programmable network element: A network device which communicates
        with customers and programmable DPU, forwarding messages
        bidirectionaly including requests for computing resources, activating
        or deactivating specific compute resource, and other routing
        messages.</t>

        <t>- Computation management node: A network node that has the full
        view of the computation resources in the network, dynamically managing
        these resources and generate consuming receipt.</t>
      </section>

      <section title="Mechanism Statement">
        <t>In this section, the detailed procedure of the communication
        between the consumer and the compute management node which passes
        through programmable DPU, programmable network element, and compute
        node will be declared step by step .</t>

        <figure align="center" title="Figure 2: Computation Request Procedure">
          <artwork type="ascii-art">          1.Computation
                  Request   +---------------+
+----------+ +------------&gt; | Programmable  |
| Consumer |                |               |
+----------+ &lt;------------+ |Network Element|
              4.Compuation  +---+-+---------+
                Response        ^ |
                                | |
                                | |
     2.Compute Resource         | |  3.Registration
      Consuming Request         | |   Response
       Registration             | |
                  +-------------+ |
                  |               |
            +-----+------+        |
            |  Compute   +&lt;-------+
            | Management |
            |    Node    |
            +------------+</artwork>
        </figure>

        <t>* Step1: computation request registration. When a consumer wants to
        do some computing tasks, e.g. machine learning model training, it
        first needs to send a request message to the compute management node
        for computation resource pre-allocation. The message is passed through
        programmable network element where some modification on the packet
        header can be done on the dataplane. Information like computation
        category, configuration template can be added into packet header,
        which could notify the compute management node that what kind of
        computation resource it needs to shedule,e.g. how many GPUs are needed
        in the task. Afterwards, The management node will send back a message
        in which the specific computation node IP address is inserted. If no
        such comptation node is available at the moment, the manament node
        will send back a refusal. And at last, the programmable network
        element will forward the message to the consumer.</t>

        <figure align="center" title="Figure 3: Computation Activation ">
          <artwork type="ascii-art">               1.computation
                   task     +---------------+
+----------+ +------------&gt; | Programmable  |
| Consumer |                |               |
+----------+ &lt;------------+ |Network Element|
                            +-----+--+------+
                                  |  ^
                                  |  | 2.Computation
                                  |  | Message Routing
                                  |  |
               3.Activation       v  |
  +----------+              +-----+--+------+
  | Compute  | &lt;----------+ | Programmable  |
  |   Node   |              |     DPU       |
  +----------+ +----------&gt; +---------------+</artwork>
        </figure>

        <t>* Step 2:Computation activation. Consumer will send the actual
        computation task to programmable network element which will do some
        modification on the packet. The activation message of the compute node
        will be encapsulated into the packet which could enable the lifetime
        management of the computation and the working progress of the compute
        node. And then, the message will be forwarded to the programmable DPU
        directly connected to the compute node where the decapsulation of the
        packet will be done. The DPU will tell the compute node to work and
        dynamically monitor the state of the compute node until the task is
        finished.</t>

        <figure align="center" title="Figure 4: Consumption Finish">
          <artwork type="ascii-art">                            +---------------+
                            |  Computation  |
                            |Management Node|
                            +----+---+------+
                                 |   ^
                     3.Response  |   | 2.Finish
                                 |   |  Notification
                                 |   |
              1.Consumption      |   |
                 Finish          v   |
                 Request    +----+---+------+
+----------+ +------------&gt; | Programmable  |
| Consumer |                |               |
+----------+ &lt;------------+ |Network Element|
                            +------+--+-----+
                                   |  ^
                                   |  |
                                   |  |
                    4.Deactivation |  |
                                   v  |
+----------+                +------+--+-----+
| Compute  | +------------&gt; | Programmable  |
|   Node   |                |     DPU       |
+----------+ &lt;------------+ +---------------+
              5.Resource
                Reclaim</artwork>
        </figure>

        <t>* Step 3: When the compute node notify the consumer that the task
        has been finished, the consumer will decide whether there is any
        waiting task, if not, the consumer will send a consumption finish
        request to the computation management node. Like computation request
        registration, the programmable network element will then insert
        information of the compute node and forward the notification message
        to the computation management node. when the programmable network
        element receives a response message, it will start deactivation
        procedure and tell the compute node to collect back the resource used
        for previous computation. This is the end the lifetime of computation
        of a single task.</t>
      </section>
    </section>

    <section title="Typical Way of Realization">
      <t>The mechanism stated in above section can be realized by extending
      protocols like SRv6. The lifetime management message can be inserted
      dynamically in dataplane with the help of those programmable hardware.
      Such modification can be done flexibly and in line rate.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>TBD.</t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>TBD.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>

      <?rfc include="reference.RFC.8174"?>
    </references>
  </back>
</rfc>
