A Compute Resources Oriented Scheduling Mechanism based on Dataplane Programmability

As Moore's law has been gradually reaching its limitation, the computation of massive data and diverse computational requirements can not be satisfied by simply upgrading the computation resources on a single chip. There become an emerging trend that domain specific computation resources like GPU, DPU and programmable switches are becoming more and more popular, generating diverse use cases in the network. For example, in network computing and in memory computing. In network computing means using programmable switches or DPUs to offload network functions so as to accelerate network speed. And in memory computing means that the computer memory does not only serve as the storage, but also provide the computation. With the development of these domain specific architectures, network should serve as a force which could facilitate the integration of all these different types of computation resources, in turn forming a Compute Force Network. In CFN, how to effectively schedule these computation resources is a topic that's worthy of studying. Current ways to do compute resources allocation include extending protocols like DNS so as to realize the awareness and scheduling of compute resources, but the management of these compute resources must be done in the centralized controller. a DNS client wants to do some computing tasks, e.g. Machine learning models training, and the client will send a request to DNS server. Then, DNS server will inform the client which compute node is available at the moment. However, activating and deactivate this compute node to work, e.g. creating a virtual machine, is done by centralized controller, which we think is not very efficient and timely, considering massive data waits to be computed in the network. The weakness above has provoked an idea to realize the scheduling and management of compute resources by extending current routing protocols like SRv6 with the help of programmable network elements. The detailed design is presented in this draft.

The detailed design of the mechanism is presented in this section. A typical topology will be shown below and the definition of each part of the network topology will be given, and then the whole procedure will be explained clearly the second subsection.

The network topology is shown in figure below where there are several major parts inside, namely consumer, computation manament node, compute node with programmable DPU, and programmable network element.

+------------------+ |Compute node with | |programmable DPU | +------------------+ +---------+--------+ +-----------------+ |Compute node with | | |Compute node with| |programmable DPU | +--------+------+ |programmable DPU | +--------+---------+ | programmable | +--------+--------+ | +--+network element+---+ | | | +---------------+ | | +------+-------+ | | +-------+-------+ | programmable +-----+ +---+ programmable | |network element+----+ +---+network element| +--------------+ | | +---------------+ | | | +-----------+ | +----+ Consumer +-----+ +-----------+ | +-----------------+ ----+ Computation | | management node | +-----------------+ - Consumer: End node generating computing tasks which need to be done by compute resources - Compute node: A network node that has the resources to finish computing tasks generated by consumers,e.g. a server or a cluster of servers. - Programmable DPU: An unit that is connected to a compute node and a programmable element, responsible for the lifetime management of compute node and the communication with programmable element. - Programmable network element: A network device which communicates with customers and programmable DPU, forwarding messages bidirectionaly including requests for computing resources, activating or deactivating specific compute resource, and other routing messages. - Computation management node: A network node that has the full view of the computation resources in the network, dynamically managing these resources and generate consuming receipt.

In this section, the detailed procedure of the communication between the consumer and the compute management node which passes through programmable DPU, programmable network element, and compute node will be declared step by step .

1.Computation Request +---------------+ +----------+ +------------> | Programmable | | Consumer | | | +----------+ <------------+ |Network Element| 4.Compuation +---+-+---------+ Response ^ | | | | | 2.Compute Resource | | 3.Registration Consuming Request | | Response Registration | | +-------------+ | | | +-----+------+ | | Compute +<-------+ | Management | | Node | +------------+ * Step1: computation request registration. When a consumer wants to do some computing tasks, e.g. machine learning model training, it first needs to send a request message to the compute management node for computation resource pre-allocation. The message is passed through programmable network element where some modification on the packet header can be done on the dataplane. Information like computation category, configuration template can be added into packet header, which could notify the compute management node that what kind of computation resource it needs to shedule,e.g. how many GPUs are needed in the task. Afterwards, The management node will send back a message in which the specific computation node IP address is inserted. If no such comptation node is available at the moment, the manament node will send back a refusal. And at last, the programmable network element will forward the message to the consumer.

1.computation task +---------------+ +----------+ +------------> | Programmable | | Consumer | | | +----------+ <------------+ |Network Element| +-----+--+------+ | ^ | | 2.Computation | | Message Routing | | 3.Activation v | +----------+ +-----+--+------+ | Compute | <----------+ | Programmable | | Node | | DPU | +----------+ +----------> +---------------+ * Step 2:Computation activation. Consumer will send the actual computation task to programmable network element which will do some modification on the packet. The activation message of the compute node will be encapsulated into the packet which could enable the lifetime management of the computation and the working progress of the compute node. And then, the message will be forwarded to the programmable DPU directly connected to the compute node where the decapsulation of the packet will be done. The DPU will tell the compute node to work and dynamically monitor the state of the compute node until the task is finished.

+---------------+ | Computation | |Management Node| +----+---+------+ | ^ 3.Response | | 2.Finish | | Notification | | 1.Consumption | | Finish v | Request +----+---+------+ +----------+ +------------> | Programmable | | Consumer | | | +----------+ <------------+ |Network Element| +------+--+-----+ | ^ | | | | 4.Deactivation | | v | +----------+ +------+--+-----+ | Compute | +------------> | Programmable | | Node | | DPU | +----------+ <------------+ +---------------+ 5.Resource Reclaim * Step 3: When the compute node notify the consumer that the task has been finished, the consumer will decide whether there is any waiting task, if not, the consumer will send a consumption finish request to the computation management node. Like computation request registration, the programmable network element will then insert information of the compute node and forward the notification message to the computation management node. when the programmable network element receives a response message, it will start deactivation procedure and tell the compute node to collect back the resource used for previous computation. This is the end the lifetime of computation of a single task.