rtgwg Q. Xiong Internet-Draft X. Zhu Intended status: Standards Track ZTE Corporation Expires: 17 December 2026 15 June 2026 Precise Priority-based Flow Control Notification with RoCEv2 draft-xf-rtgwg-ppfc-rocv2-00 Abstract This document defines a format for Precise Priority-based Flow Control (PPFC) notifications within RoCEv2 (RDMA over Converged Ethernet version 2) networks. The proposed format enables network devices experiencing congestion to generate explicit congestion signals that can be efficiently carried back to the source clients. This facilitates fast, fine-grained flow control, complementing traditional end-to-end congestion control and improving performance in high-throughput, low-latency environments. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 17 December 2026. Copyright Notice Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved. Xiong & Zhu Expires 17 December 2026 [Page 1] Internet-Draft Precise Priority-based Flow Control Noti June 2026 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Conventions Used in This Document . . . . . . . . . . . . . . 3 2.1. Abbreviations . . . . . . . . . . . . . . . . . . . . . . 3 2.2. Requirements Language . . . . . . . . . . . . . . . . . . 3 3. PPFC Notification with RoCEv2 . . . . . . . . . . . . . . . . 3 4. Security Considerations . . . . . . . . . . . . . . . . . . . 5 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 5 6. Normative References . . . . . . . . . . . . . . . . . . . . 5 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 5 1. Introduction RoCEv2 (RDMA over Converged Ethernet version 2) is widely deployed in data centers and high-performance computing clusters to provide low- latency, kernel-bypass networking. However, managing congestion in these environments remains a challenge. End-to-end congestion control mechanisms, such as those using ECN or CNP (Congestion Notification Packets), can incur Round-Trip Time (RTT) delays before a sending host adjusts its rate. Precise Priority-based Control (PPFC) [I-D.xz-rtgwg-ppfc-notification] is a technique to instantly pause the traffic sources, based on explicit notifications from the network. The PPFC notification can be also sent to the traffic source directly with the message format which is designed to be: * Efficient: Adds minimal overhead. * Explicit: Carries detailed information about the congestion event (e.g., destination address, recommended action). * Interoperable: Can be processed by both network and hosts. This document defines a lightweight, generic encapsulation format for PPFC notifications within RoCEv2 networks. Xiong & Zhu Expires 17 December 2026 [Page 2] Internet-Draft Precise Priority-based Flow Control Noti June 2026 2. Conventions Used in This Document 2.1. Abbreviations RTT: Round-Trip Time TCP: Transfer Control Protocol RDMA: Remote Direct Memory Access Round-Trip Time QUIC: Quick UDP Internet Connections ECN: Explicit Congestion Notification CNP: Congestion Notification Packet GRH: Global Routing Header (InfiniBand) PPFC: Source Flow Control RoCEv2: RDMA over Converged Ethernet version 2 2.2. Requirements Language The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 3. PPFC Notification with RoCEv2 The PPFC Notification message reuses the basic RoCEv2 encapsulation but is identified as a new extension. The proposed packet format is as follows: Xiong & Zhu Expires 17 December 2026 [Page 3] Internet-Draft Precise Priority-based Flow Control Noti June 2026 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Ethernet Header (Destined to PPFC Target) | ^ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | IP Header (IPv4/IPv6) | | L2-L4 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |Headers | UDP Header (Dest Port = RoCEv2 message (4791)) | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | BTH (Base Transport Header, 96 bits): | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | | OpCode (0x81) |S|M|Pad| TVer | Partition Key | | +---------------+-+-+---+-------+-------------------------------+ | |F|B|P| RSV(5b) | DestQP (24 bits) | | +-+-+-+-+-+-+-+-+-----------------------------------------------+ | |A| RSV (7b) | PSN (24 bits) | | +-+-------------+-----------------------------------------------+ v PPFC Extension Fields (present when P=1): +---------------------------------------------------------------+ ~ Congested Destination IP (128 bits for IPv6) ~ +---------------------------------------------------------------+ | Flags |PT| Congestion Port | +---------------------------------------------------------------+ | Reserved | Pause Duration | +---------------------------------------------------------------+ | ICRC (32 bits) | +---------------------------------------------------------------+ | FCS (32 bits) | +---------------------------------------------------------------+ where: * UDP Destination Port: SHOULD be used to distinguish the RoCEv2 traffic. It indicates the RoCEv2 Base Transport Header (BTH). * P (1 bit): A new bit MUST be set to 1 for PPFC notification. This bit indicates that PPFC Extension Fields follow the BTH. * Congested Destination IP (variable) : Indicates the IPv4 or IPv6 address of the congested network node. * PT (2bits): indicates the action type of the PPFC notification for the ingress to take actions of traffic. It can be set to 00 "stop", 01 "resume", 10 "alarm", 11 "hold". * Congestion Port (16bits): Identifier of the congested port. Xiong & Zhu Expires 17 December 2026 [Page 4] Internet-Draft Precise Priority-based Flow Control Noti June 2026 * Pause Duration (16bits): Recommended pausing interval, in microseconds. The other fields are identical to the BTH fields. 4. Security Considerations To be discussed in future versions of this document. 5. IANA Considerations TBD. 6. Normative References [I-D.xz-rtgwg-ppfc-notification] Xiong, Q. and X. Zhu, "Precise Priority-based Flow Control Notification", Work in Progress, Internet-Draft, draft-xz- rtgwg-ppfc-notification-00, 12 June 2026, . [RFC768] Postel, J., "User Datagram Protocol", STD 6, RFC 768, DOI 10.17487/RFC0768, August 1980, . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . Authors' Addresses Quan Xiong ZTE Corporation Email: xiong.quan@zte.com.cn Xiangyang Zhu ZTE Corporation Email: zhu.xiangyang@zte.com.cn Xiong & Zhu Expires 17 December 2026 [Page 5]