<?xml version='1.0' encoding='utf-8'?>

<!DOCTYPE rfc [
  <!ENTITY RFC5040 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5040.xml">
  <!ENTITY RFC5041 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5041.xml">
  <!ENTITY RFC5042 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5042.xml">
  <!ENTITY RFC5043 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5043.xml">
  <!ENTITY RFC5044 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.5044.xml">
  <!ENTITY RFC6580 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6580.xml">
  <!ENTITY RFC6581 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6581.xml">
  <!ENTITY RFC6703 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6703.xml">  
  <!ENTITY RFC7306 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7306.xml">
]>


<?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
<?rfc toc="yes"?>
<?rfc sortrefs="yes"?>
<?rfc symrefs="yes"?>
<?rfc comments="yes"?>

<rfc xmlns:xi="http://www.w3.org/2001/XInclude" ipr="trust200902" docName="draft-kcrh-hpwan-state-of-art-02" category="info" obsoletes="" updates="" submissionType="IETF" xml:lang="en" tocInclude="true" sortRefs="true" symRefs="true" version="3">

  <front>
    <title abbrev="HP-WAN STATE OF ART">Current State of the Art for High Performance Wide Area Networks</title>
    <seriesInfo name="Internet-Draft" value="draft-kcrh-hpwan-state-of-art-02"/>

    <author initials="D." surname="King" fullname="Daniel King">
      <organization>Lancaster University</organization>
      <address>
        <email>d.king@lancaster.ac.uk</email>
      </address>
    </author>

    <author initials="T." surname="Chown" fullname="Tim Chown">
      <organization>Jisc</organization>
      <address>
      <email>tim.chown@jisc.ac.uk</email>
      </address>
    </author>

    <author initials="C." surname="Rapier" fullname="Chris Rapier">
      <organization>Pittsburgh Supercomputing Center</organization>
      <address>
        <email>rapier@psc.edu</email>
      </address>
    </author>

    <author initials="D" surname="Huang" fullname="Daniel Huang">
      <organization>ZTE Corporation</organization>
      <address>
      <email>huang.guangping@zte.com.cn</email>
      </address>
    </author>

    <date year="2025"/>

    <workgroup></workgroup>

    <keyword>HP-WAN</keyword>

    <abstract>

      <t>High Performance Wide Area Networks (HP-WANs) represent a critical 
         infrastructure for the modern global research and education community, 
         facilitating collaboration across national and international boundaries. 
         These networks, such as Janet, ESnet, GÉANT, Internet2, CANARIE, and others, 
         are designed to support the general needs of the research and education users they serve
         but also the the transmission of vast amounts of data generated by scientific 
         research, high-performance computing, distributed AI-training and large-scale simulations.</t>

      <t>This document provides an overview of the terminology and techniques used for 
         existing HP-WANS. It also explores the technological advancements, operational tools, 
         and future directions for HP-WANs, emphasising their role in enabling cutting-edge 
         scientific research, big data analysis, AI training and massive industrial data 
         analysis.</t>

    </abstract>

  </front>

  <middle>

    <section anchor="INTRO" numbered="true" toc="default">
      <name>Introduction</name>

      <t>High Performance Wide Area Networks (HP-WANs) are the backbone of global research 
         and education infrastructure, enabling the seamless transfer of vast amounts of data
         and supporting advanced scientific collaborations worldwide. These networks 
         are designed to meet the demanding requirements of data-intensive research fields, 
         including high-energy physics, climate modeling, genomics, and artificial 
         intelligence.</t>
         
      <t>The evolution of HP-WANs is deeply intertwined with the growing need for advanced 
         scientific research and the increasing globalisation of collaboration. Traditional 
         WANs, which were sufficient for general business and communication needs, quickly 
         became inadequate for the specialised requirements of research institutions. As 
         scientific endeavours began to generate larger datasets, ranging from terabytes 
         to petabytes, there arose a need for networks capable of transferring these massive 
         volumes of data reliably and securely across long distances.</t>  
         
      <t>The first HP-WANs emerged as specialised research networks, such as ESnet in the 
         United States, Janet in the UK, and GÉANT in Europe, developed to support the unique 
         needs of the scientific community. These networks were designed to provide high 
         bandwidth and ensure low latency, high reliability, and robust security, critical for applications like real-time data analysis, distributed computing, and
         remote instrumentation.</t>           
         
      <t>Today, HP-WANs are foundational to the research community and are leading the way in
         demonstrating how advanced networking technologies can be applied to other sectors. 
         They serve as testbeds for innovations in networking that eventually trickle down to 
         broader commercial applications. As we look toward the future, HP-WANs will continue 
         to play a critical role in enabling scientific discoveries and fostering international 
         collaboration, particularly as emerging technologies such as quantum computing and 
         the Internet of Things (IoT) push the boundaries of what these networks must 
         support.</t>  
 
      <t>This document explores the current state of the art in HP-WANs, examining the 
         technological advancements, operational challenges, and emerging trends shaping the 
         future of networks built for research, education, massive data analysis and 
         collaborative AI training at scale and speed. Through this exploration, we aim to 
         provide a better understanding of the current state of the art in high performance computing
         across wide area networking.</t>  

       <section anchor="BACK" numbered="true" toc="default">
        <name>Background</name>

        <t>High Performance Wide Area Networks (HPWANs) evolved as specialised networks initially designed to facilitate scientific research requiring high-speed data transfer, high reliability, and minimal latency. Early networks such as ESnet, Janet, and GÉANT emerged in response to the increasing data volumes generated by scientific and educational institutions, transforming traditional WAN capabilities.</t>
     
        <t>HPWANs have since grown integral to research and educational communities, supporting distributed scientific collaborations, large-scale simulations, and intensive data analysis. Their capabilities have been continually enhanced to meet rising demands, laying foundations for future networking technologies.</t>

       </section> 

    </section> 


    <section anchor="TERM" numbered="true" toc="default">
        <name>Terminology</name>

        <t>This document provides a lexicon terminology that relates to high performance 
           WANs.</t>

        <dl newline="false" spacing="normal">

          <dt>CERN:</dt>
          <dd>The European Organization for Nuclear Research, housing the Large Hadron Collider (LHC).</dd>

          <dt>High Performance Computing (HPC):</dt>
          <dd>Is a general term for computing with a high level
              of performance. Often high performance computing specifically refers to 
              running jobs which are very parallel, often running on hundreds or even 
              thousands of cores.</dd>

          <dt>High Performance Wide Area Network (HP-WAN):</dt>
          <dd>A type of Wide Area Network (WAN) designed specifically to meet the high-speed, 
              low-latency, and high-capacity needs of scientific research, education, and 
              data-intensive applications. These networks connect research institutions, 
              universities, and data centers across large geographical areas.</dd>

          <dt>Infiniband:</dt>
          <dd>Traditionally, a localised data interconnect used by many high performance computing (HPC) 
              systems providing high bandwidth and low latency.</dd> 

          <dt>National Research and Education Network (NREN):</dt>
          <dd>A specialised network supporting the research and education community within a 
              specific country or region. NRENs provide high-speed connectivity and other 
              services tailored to the needs of academic and research institutions.</dd>

          <dt>Remote direct memory access (RDMA):</dt>
          <dd>Enables one networked node to access another 
              networked nodes's memory without involving either computer's operating system 
              or interrupting either nodes's processing. This helps minimise latency and maximise 
              throughput, reducing memory bandwidth bottlenecks.</dd>

          <dt>RDMA over Converged Ethernet (RoCE):</dt>
          <dd>Traditionally, a network protocol which allows remote direct 
              memory access (RDMA) over a local Ethernet network. There are multiple RoCE versions. 
              RoCE v1 is an Ethernet link layer protocol and hence allows communication between any
              two hosts in the same Ethernet broadcast domain. RoCE v2 is an internet layer 
              protocol which means that RoCE v2 packets can be routed.</dd>

          <dt>Worldwide LHC Computing Grid (WLCG):</dt>
          <dd>Is a global network of over 170 computing centres across more than 40 countries, 
              designed to process, store, and analyse the vast amounts of data generated by the 
              Large Hadron Collider (LHC) at CERN.</dd>

          <dt>Performance Service Oriented Network monitoring Architecture(PerfSONAR):</dt>
          <dd>Is a network performance monitoring toolkit designed to provide end-to-end 
              performance measurement and monitoring across multi-domain network 
              infrastructures.</dd>

          <dt>Science DMZ:</dt>
          <dd>A model for deployment of infrastructure at a site (campus) to optimise 
              the performance of data transfers in and out of data transfer nodes 
              (DTNs) at the site – see https://fasterdata.es.net/science-dmz/. Elements 
              of the model include the local network architecture, tuning of DTNs, 
              selection of data transfer software, efficient implementation of 
              security policies, and persistent monitoring.</dd>

          <dt></dt>
          <dd></dd>

          <dt></dt>
          <dd></dd>


        </dl>
    </section>

    <section anchor="UC" numbered="true" toc="default">
      <name>Example Use Cases for HP-WANs</name>

        <t>HP-WAN applications have become synonymous with large-scale research and experimentation, big data, and AI. HPC and therefore HP-WAN,  is 
          driving continuous innovation in use cases across the following industries.</t>

      <ul spacing="normal">
        <li>High-Energy Physics Research, e.g., the Large Hadron Collider (LHC)</li>
        <li>Climate Modeling</li>
        <li>Radioastronomy, e.g., the Square Kilometre Array (SKA) project</li>
        <li>Healthcare, Genomics and Life Sciences</li>
        <li>AI training</li>
        <li>Media Content Creation</li>
        <li>Government and Defence</li>

      </ul>          

    <t>The data rates required by HPC applications vary significantly based on the application type and data scale.</t> 
        
    <t>Scientific simulations, such as climate modeling and molecular dynamics, typically demand data rates from 
       10 Gbps to over 100 Gbps due to the large volumes of data processed and moved between nodes and storage systems.</t>

    <t>In high-energy physics, such as experiments at CERN, data rates can reach hundreds of gigabits per second, with 
       aggregate peaks between site exceeding 1 Tbps currently, and predicted to rise to 10 Tbps, during intensive data processing.</t>

    <t>Healthcare, Genomics, and Life Sciences might typically operate at rates between 1 Gbps and 40 Gbps. These applications 
       require high throughput to handle large datasets efficiently, often through parallel data streams.</t>

    <t>AI learning and tasks, particularly those involving deep learning, require data rates ranging from 10 Gbps to 100 Gbps
      to ensure efficient data movement, keeping GPUs and other accelerators fully utilised.</t>

    <t>These varying data rates underscore the high demands of HPC applications, which are expected to grow as the field 
       evolves and datasets become larger.</t>
          
    </section>


    <section anchor="HP-WAN" numbered="true" toc="default">
      <name>Current Technologies Used in HP-WANs: Key Components</name>

      <t>High Performance Computing (HPC) networks are specialised networks designed to connect
         supercomputers and other high-performance computing resources, enabling them to 
         collaborate on computational tasks that require significant processing power, memory, 
         and data storage. These networks facilitate large-scale scientific research, complex simulations, and 
         data-intensive tasks that exceed the capabilities of standard computing systems.</t> 
         
      <t>The following sub-sections outline typical characterics and requirements for HP-WANs. 
         These technical requirements ensure that wide-area interconnects can meet the 
         demanding needs of distributed HPC environments, enabling researchers and 
         scientists to collaborate effectively globally.</t>

       <section anchor="ARCH" numbered="true" toc="default">
           <name>Architectural Elements</name>

           <t>Resource Controllers provide detailed control over individual network resources, such as routers and switches, ensuring efficient usage and reliable network performance through comprehensive monitoring and configuration.</t>

           <t>Network Controllers maintain global visibility of network topology, resource availability, and status, essential for path computation, resource reservation, and dynamic reconfiguration to meet stringent performance demands.</t>

           <t>End-to-End Orchestration translates user and application requirements into actionable network operations, enabling automated, policy-driven management and significantly improving resource responsiveness and optimisation.</t>
       </section>


       <section anchor="TOPO" numbered="true" toc="default">
        <name>Topology</name>
       
         <t>HPC networks can be broadly categorised into intra-site networks, which connect components
            within a single HPC site, such as a data centre, and inter-site networks, which link 
            multiple HPC sites across different geographical locations. Intra-site networks typically 
            use high-speed, low-latency non-Internet interconnects like InfiniBand or high-speed Ethernet. In 
            contrast, inter-site networks rely on dedicated high-capacity wide area networks (WANs) 
            to facilitate distributed computing and data sharing on a regional and global scale.</t>
            
         <t>Each NREN operator, e.g., Jisc in the case of Janet in the UK, will build and operate the 
            NREN infrastructure for its research and education users. This may typically take the form 
            of a well-provisioned backbone, with regional access networks extending to the end sites 
            (campuses, research organisations, etc). The NREN demarcation is typically at the campus 
            edge. In some countries the regional networks are operated separately.</t>
                       
         <t>The NRENs then typically have interconnects to other NRENs, forming a worldwide RE 
            network infrastructure. In Europe, GÉANT provides connectivity between the European NRENs 
            and then wider connectivity to the rest of the world. And NRENs will have other interconnects 
            to non-RE networks, e.g., via one or more national IXs, direct peerings to content providers 
            (including the big cloud providers) and then "catch-all" commodity connectivity via one or 
            more Tier 1 ISPs.</t>

         <t>Dedicated infrastructure is commonly used in HPC environments where performance, security, 
            and reliability are paramount. In these cases, the network infrastructure is built 
            exclusively for HPC applications, including dedicated fibre-optic connections, private 
            data centres, and specialised network transport like RDMA over Converged Ethernet (RoCE) 
            and InfiniBand nodes. The primary benefits of dedicated infrastructure are its ability to 
            provide optimised performance for HPC tasks, ensure high levels of security by preventing 
            unauthorised access, and maintain consistent reliability by avoiding congestion or performance 
            issues caused by other network traffic.</t>
            
         <t>Usually, the responsibility for networking within an end site or campus lies with that 
            organisation, e.g., a university IT department, while the operation of an HPC facility may 
            have dedicated (separate) staff. With the additional administrative domains of the NRENs and 
            inter-NREN backbones like GÉANT, end-to-end traffic may pass through many networks operated by 
            different organisations. To achieve optimal e2e performance, everyone needs to implement best 
            practices.</t>

       </section> 

       <section anchor="BW" numbered="true" toc="default">
        <name>Bandwidth and Latency</name>

         <t>The technical requirements for wide area interconnects between HPC sites are stringent, 
            given the unique demands of distributed high-performance computing. High bandwidth is a 
            primary requirement, as these interconnects must support the rapid transfer of large 
            datasets between sites, ensuring that data movement does not become a bottleneck in 
            computational workflows. HPC data flows might typical consume 1Gbit to beyond 400GBit/s.</t>
            
         <t>Low latency is equally critical, as many HPC applications. Latency requirements for 
            inter-DC locations will be in the low-millisecond range. This low latency is essential 
            for applications that require real-time or near-real-time data processing.</t>

       </section> 

       <section anchor="PROTO" numbered="true" toc="default">
        <name>Data Movement Protocols</name>
                           
         <t>Network-intensive applications like networked storage or cluster computing need 
            a network infrastructure with high bandwidth and low latency.</t>
                  
         <t>These interconnects may need to support specialised communication protocols 
            designed for HPC environments, such as Remote Direct Memory Access (RDMA) [RFC5040] and [RFC7306], which 
            optimises the performance of distributed HPC applications by reducing overhead and 
            improving data transfer efficiency.</t>
            
         <t>InfiniBand (IB) is another computer networking communications standard used in 
            high-performance computing that features very high throughput and very low latency. 
            InfiniBand is also used as either a direct or switched interconnect between servers 
            and storage systems, as well as an interconnect between storage systems.</t>
            
         <t>The advantages of RDMA and IB over other network application programming interfaces, 
            are lower latency, CPU load, and bandwidth. The downside with these specialised 
            protocols is the need for all interfaces and nodes to support the technique on the 
            end-to-end path.</t>
        
         <t>iWARP is a computer networking protocol that implements remote direct memory access (RDMA) 
            for efficient data transfer over Internet Protocol networks. Several IETF techniques 
            are used for iWARP:</t>
         
           <ul spacing="normal">
           <li>[RFC5040] A Remote Direct Memory Access Protocol Specification is layered over Direct Data Placement Protocol (DDP). It defines how RDMA Send, Read, and Write operations are encoded using DDP into headers on the network.</li>
           <li>[RFC5041] Direct Data Placement over Reliable Transports is layered over MPA/TCP or SCTP. It defines how received data can be directly placed into upper layer protocols receive buffer without intermediate buffers.</li>
           <li>[RFC5042] Direct Data Placement Protocol (DDP) / Remote Direct Memory Access Protocol (RDMAP) Security analyzes security issues related to iWARP DDP and RDMAP protocol layers.</li>                
           <li>[RFC5043] Stream Control Transmission Protocol (SCTP) Direct Data Placement (DDP) Adaptation defines an adaptation layer that enables DDP over SCTP. Elephant flows: For each burst, the intensity of each flow could reach up to the line rate of NICs.</li>
           <li>[RFC5044] Marker PDU Aligned Framing for TCP Specification defines an adaptation layer that enables preservation of DDP-level protocol record boundaries layered over the TCP reliable connected byte stream.</li>       
           <li>[RFC6580] IANA Registries for the Remote Direct Data Placement (RDDP) Protocol defines IANA registries for Remote Direct Data Placement (RDDP) error codes, operation codes, and function codes.</li>                
           <li>[RFC6581] Enhanced Remote Direct Memory Access (RDMA) Connection Establishment fixes shortcomings with iWARP connection setup.</li>
           <li>[RFC7306] Remote Direct Memory Access (RDMA) Protocol Extensions extends [RFC5040] with atomic operations and RDMA Write with Immediate Data.</li>        
         </ul>        
                 
       </section> 

       <section anchor="RTG" numbered="true" toc="default">
        <name>Forwarding Optimisation</name>

         <t>The scaling of HPC applications, especially across a WAN between multiple sites, 
            requires the ability to route the massive  traffic. Specifically, this requires network 
            infrastructure to provide several routing and forwarding characteristics, which are detailed below.</t>

         <ul spacing="normal">
           <li>Low entropy: Compared to traditional data center workloads, the number and the diversity
               of flows for workloads and flow patterns are usually repetitive and predictable.</li>
           <li>Burstiness: Flows usually exhibit the "on and off" nature in the time granularity of milliseconds.</li>
           <li>Jumbo frames: Ethernet frames larger than the standard maximum transmission unit (MTU) size of 1,500 bytes, 
               typically carrying payloads of up to 9,000 bytes. Using jumbo frames can significantly enhance network 
               efficiency and reduce CPU overhead.</li>                
           <li>Elephant flows: For each burst, the intensity of each flow could reach up to the line rate of NICs.</li>
         </ul>     

         <t>It should be noted that efficiently handling these elephant flows is crucial in HPC as they can otherwise 
            saturate network links, leading to congestion and reduced performance for other network traffic. Strategies 
            to manage elephant flows effectively, such as prioritising these flows or segmenting network traffic, help 
            maintain overall network performance and ensure that large data transfers do not hinder the execution of 
            other critical tasks within the HPC environment.</t>

         <t>HPC transport options include IP (both UDP and TCP), and emerging mechanisms such as QUIC. However, 
            each transport technology provides strengths and weaknesses. In all cases, the primary goal is to ensure the 
            effective high-throughput, low latency and jitter, low-packet loss ratio, transmission of massive data sets.</t>

       </section> 

       <section anchor="REL" numbered="true" toc="default">
        <name>Reliability and High Availability</name>

         <t>In HPC networks, the resilience of the data stream is important due to the critical need for precise, high-speed data 
            transfer. These networks must maintain continuous data flow to support large-scale computations, where even minor 
            interruptions or packet loss can severely impact performance, causing delays or incorrect results. Therefore, resilience 
            must be implemented to ensure the network can recover from disruptions without compromising speed or integrity.</t>
            
         <t>For retransmission and lossless data transfer, HPC networks must have mechanisms to handle data loss efficiently. They must 
            quickly retransmit lost or corrupted packets while maintaining a seamless data flow to avoid performance degradation.
            The requirement for lossless communication is essential to meet the needs of scientific computations, simulations, and 
            data-intensive tasks.</t>            
            
         <t>High availability and redundancy are also essential to prevent data loss and ensure continuous 
            operation, especially given that HPC tasks often run for extended periods and involve 
            critical research. These networks must also incorporate advanced security measures, 
            including encryption and secure access controls, to protect the often sensitive or 
            classified data being transmitted.</t>

       </section> 

       <section anchor="QOS" numbered="true" toc="default">
        <name>Quality of Service</name>

         <t>The network should support Quality of Service (QoS) mechanisms to prioritise 
            traffic, ensuring that critical HPC tasks receive the necessary bandwidth and 
            low-latency performance.</t>          

         <t>An approach may be needed to enable applications to request specific bandwidth or latency guarantees, 
            ensuring that high-priority tasks receive required resources.</t>
            
         <t>Differentiated Services (Diffserv) offers a flexible method to manage traffic prioritization without the 
            need for an explicit request-and-grant process. Diffserv operates by marking packets with different 
            priority levels, allowing the network to prioritize and protect access to capacity for critical tasks. 
            This approach may be useful in HPC environments where dynamic traffic patterns require adaptive resource management.</t>

       </section> 

       <section anchor="CC" numbered="true" toc="default">
        <name>Congestion Control</name>
            
         <t>Congestion control mechanisms ensure that data transfers between nodes and across networks 
            are efficient and do not overwhelm the HPC network infrastructure. By managing and 
            regulating the flow of data, congestion control mechanisms help prevent bottlenecks, 
            reduce latency, and maintain high throughput, which are essential for the performance 
            and reliability of HPC applications that require the rapid movement of large volumes
            of data across distributed systems.</t>
            
         <t>Depending on the transport technology used in the HPC enviroment, several congestion control
            schemes may be use:</t>

         <ul spacing="normal">
           <li>InfiniBand Congestion Control</li>
           <li>RDMA-based Data Center Quantized Congestion Notification (DCQCN)</li>
           <li>TCP-based Bottleneck Bandwidth and Round-Trip Time (BBRv3)</li>
           <li>Explicit Congestion Protocol (XCP)</li>
         </ul>   

    
       </section> 
       
       <section anchor="PERF" numbered="true" toc="default">
        <name>Performance Monitoring</name>

         <t>End-to-end performance measurement and monitoring across multi-domains and network infrastructures are 
            important in HPC environments. They provide a method to diagnose and troubleshoot network performance 
            issues that can affect data-intensive applications and distributed computing tasks commonly found in 
            HPC.</t>
            
         <t>PerfSONAR is a network measurement toolkit commonly used. It is designed to provide federated coverage 
            of network paths. It provides an interface that allows for the scheduling of measurements, storage of 
            data, and generate visualisations.</t>

       </section> 

       <section anchor="SCAL" numbered="true" toc="default">
        <name>Scalability</name>

         <t>Scalability is another crucial aspect, allowing the network to expand efficiently as 
            computational needs grow, accommodating additional sites or increased capacity without 
            significant reconfiguration. Interoperability is also necessary, ensuring that the 
            network can communicate seamlessly across different types of hardware, software, and 
            protocols used at various HPC sites.</t>

       </section> 

       <section anchor="NRG" numbered="true" toc="default">
        <name>Sustainability and Energy Efficiency</name>

        <t>As HPWANs continue to expand, sustainability and energy efficiency are becoming critical considerations. The operational scale of these networks—spanning global infrastructures and data-intensive applications—poses significant environmental and economic challenges. Future HP-WAN deployments will increasingly prioritise energy-efficient network components, smart power management systems, and sustainable operational practices.</t>

        <t>Emerging approaches include adaptive network management strategies designed to reduce energy consumption during periods of lower utilisation and leveraging advanced technologies such as optical networking and energy-aware routing protocols. Furthermore, industry-wide initiatives are focusing on measuring and reducing the carbon footprint of data transfers and network operations, contributing to broader climate goals.</t>
 
       </section>



       <section anchor="SCHD" numbered="true" toc="default">
        <name>Resource Scheduling</name>

         <t>[Editor's Note - Do we need to discuss service and resource
            scheduling?]</t>

       </section> 
       
    </section>


    <section anchor="EXMP" numbered="true" toc="default">
      <name>Examples of HP-WANs</name>

      <t>The following sub-sections highlight examples of HP-WANS, and
         their technical specifications.</t>

       <section anchor="GÉANT" numbered="true" toc="default">
        <name>GÉANT</name>

        <t>The GÉANT network is a pan-European data network dedicated to research and education, providing 
           high-speed, high-capacity connectivity across Europe, between European NRENs and to other worldwide NRENs. 
           It is an essential infrastructure for HPC applications, enabling collaboration and data sharing among research institutions,
           universities, and HPC centers across the continent and beyond.</t>
           
        <t>The core of GÉANT operates at speeds of up to 600 Gbps, using Dense Wavelength Division Multiplexing
           (DWDM) technology. This provides connectivity suitable for HPC applications, particularly those 
           involving large-scale simulations, scientific research, and real-time data processing. Reliability is 
           provided by using multiple optical underlay paths for data to travel between GÉANT nodes. This design 
           ensures high availability and reliability, which is crucial for the continuous operation of HPC 
           environment.</t>   
        
        <t>The GÉANT network integrates PerfSONAR for real-time network performance monitoring and reporting of IP performance metrics [RFC6703] , allowing HPC users 
           to detect and troubleshoot potential issues that could impact data transfer and overall performance. This 
           ensures that the high-performance requirements of HPC applications are met consistently across the network.</t>
        
        <t>GÉANT provides specialised services for specific HPC projects, such as the LHC Optical Private Network (LHCOPN) 
           and LHC Open Network Environment (LHCONE), which are critical for supporting the data-intensive needs of the 
           Large Hadron Collider (LHC) at CERN. These services offer dedicated, high-bandwidth connections that are 
           optimised for the massive data flows generated by LHC experiments.</t>   
           
        <t>The GÉANT network connects over 50 million users across more than 10,000 institutions in 40 countries. This 
           extensive reach supports a wide range of HPC applications by enabling seamless collaboration between 
           geographically dispersed research facilities. Beyond Europe, GÉANT connects to other major research and 
           education networks, including Internet2 in the United States and CANARIE in Canada, allowing for global 
           HPC collaborations and data exchanges.</t>   

       </section> 

       <section anchor="Janet" numbered="true" toc="default">
        <name>Janet</name>

        <t>The Janet network is the UK NREN, operated by Jisc. First established in 1984, backbone links now run at up 
           to 800Gbps, with a growing number of sites connected at 100Gbps, in some cases with multiple 100G links. A 
           typical university site will have multiple 10G links.</t>
           
        <t>Janet connects to other RE networks via a 400G resilient link to GÉANT. It has a presence in multiple IXes, 
           predominantly LINX, connects/peers directly to many content and cloud providers, and has commodity connectivity 
           via Tier1 ISPs.  The total aggregate external capacity is around 4-5 Tbit/s.</t> 
        
        <t>Some private, dedicated optical links are used by Janet sites, e.g., the CERN to RAL (UK Tier 1 site) LHCOPN 
           link, which is a 200G path.</t>

       </section> 

      <section anchor="Effingo" numbered="true" toc="default">
        <name>Google Effingo</name>

        <t>Google Effingo is a state-of-the-art, high-performance infrastructure designed to meet the demanding data processing and 
           storage needs of large-scale machine learning (ML), artificial intelligence (AI), and computational workloads. As part of 
           Google's cloud offering, Effingo is an example of how WAN infrastructure supports high-performance computing 
           applications across diverse industries and research areas.</t>
           
        <t>Effingo leverages a global network of data centers interconnected with high-capacity, low-latency WAN links. These links 
           facilitate rapid data exchange and provide the performance required to handle real-time AI model training, complex 
           simulations, and large-scale data analytics. The network is optimised for high-throughput workloads, where low latency and 
           reliability are critical for processing large datasets across vast geographical areas, and more than 100 data center sites.</t>   
           
        <t>Effingo utilises a private global network of high-capacity fiber links, combined with packet-layer protocols to deliver low-latency,
           high-speed data transfer across continents. This connectivity enables global collaboration between research centers, universities, 
           and data-driven enterprises, allowing them to share large datasets and results.</t>   
        
        <t>Currently, Effingo daily data transfers exceeds 1 exabytes.</t>   

       </section>


       <section anchor="ESNET" numbered="true" toc="default">
        <name>Energy Sciences Network</name>

        <t>The Energy Sciences Network (ESnet) is a high-performance network dedicated to supporting scientific research within the United States, operated by the U.S. Department of Energy (DOE). Established in 1986, ESnet interconnects national laboratories, supercomputing centres, universities, and research institutions, enabling collaborative scientific projects, data-intensive applications, and high-performance computing (HPC) tasks across multiple geographical locations.</t>

        <t>ESnet delivers high-capacity, low-latency connectivity through its robust fibre-optic backbone, employing advanced optical networking technologies and dynamic circuit provisioning services. It supports data transfer rates ranging from tens of gigabits per second up to multi-hundred gigabit per second capacities, essential for demanding scientific workflows such as high-energy physics experiments, climate modelling, and large-scale genomic research.</t>
        
        <t>A key feature of ESnet is its use of specialised services such as the On-Demand Secure Circuits and Advance Reservation System (OSCARS), providing dynamic, guaranteed-bandwidth paths that allow researchers to reserve network capacity tailored specifically to their project's needs. Additionally, the network incorporates advanced orchestration platforms like SENSE, offering intent-driven, automated management to ensure optimal network resource utilisation and agile response to evolving scientific requirements.</t>
        
        <t>ESnet’s infrastructure integrates comprehensive monitoring and diagnostic tools such as PerfSONAR, ensuring end-to-end network visibility and performance analysis across institutional boundaries. This facilitates proactive identification and resolution of performance bottlenecks, maintaining the reliability and efficiency necessary for HPC operations.</t>
        
        <t>With interconnections to international research networks, including GÉANT, Janet, Internet2, and CANARIE, ESnet provides global reach, facilitating extensive international collaboration and enabling the seamless exchange of data among scientific communities worldwide.</t>        

       <section anchor="DYNNET" numbered="true" toc="default">
        <name>Practical Examples of Dynamic Network Management</name>

        <t>ESnet's OSCARS system exemplifies dynamic, advanced reservation, and circuit provisioning, demonstrating the practical application of HPWAN capabilities in operational scientific networks.</t>

        <t>The SENSE platform further illustrates how intent-based networking and automation can simplify complex resource allocation processes, significantly improving network agility and scalability.</t>
       </section>

         
       </section> 

       <section anchor="I2" numbered="true" toc="default">
        <name>Internet2</name>

        <t>Internet2 is a high-performance networking consortium serving the United States research and education community. Established in 1996, Internet2 provides advanced networking infrastructure specifically designed to support collaborative research, scientific discovery, and innovation among educational institutions, government laboratories, and industry partners.</t>

        <t>Internet2 operates an advanced optical backbone network capable of multi-terabit speeds,also delivering exceptionally high-capacity and low-latency connections. As with aforementioned networks it supports dynamic bandwidth allocation, advanced monitoring through tools, and federated identity management.</t>
        
       </section> 

       <section anchor="CN" numbered="true" toc="default">
        <name>CANARIE</name>

        <t>CANARIE is Canada's national research and education network, established in 1993, dedicated to providing robust, high-performance connectivity for research, education, and innovation. It interconnects universities, research centres, healthcare institutions, and government laboratories across Canada, as well as facilitating international collaboration through global interconnections with networks such as GÉANT, Internet2, and ESnet.</t>
 
        <t>As with other regions the CANARIE network operates using a high-capacity fibre-optic backbone, delivering advanced networking services tailored specifically for demanding scientific and research applications. The network provides dynamic, software-driven capabilities, including dedicated high-speed links, automated resource allocation, and integrated identity and access management solutions. Additionally, CANARIE supports advanced services like the Digital Accelerator for Innovation and Research (DAIR), enabling cloud-based research and development.</t>
        
       </section> 
 
        <section anchor="AP" numbered="true" toc="default">
        <name>Asia-Pacific Advanced Network</name>

        <t>TBA</t>
        
       </section> 
 
    </section>
    
    <section anchor="FUT" numbered="true" toc="default">
      <name>Emerging Trends and Future Directions</name>

        <t>As HP-WANs continue to evolve, driven by emerging requirements from scientific research, high-performance computing, distributed artificial intelligence, and industrial data analytics. Several key trends and future directions are shaping the next generation of HP-WANs.</t>

        <section anchor="NETCONT" numbered="true" toc="default">
        <name>Integrated Resource and Network Control</name>

        <t>Enhanced integration between resource controllers and network controllers for scheduled services to maximise network efficiency. This tighter integration aims to deliver more granular and efficient control over network resources, enabling dynamic, on-demand bandwidth allocation and optimised resource allocation decisions. Such integration facilitates more effective orchestration of network resources, aligning network performance closely with application requirements</t>
        
       </section> 

        <section anchor="IBN" numbered="true" toc="default">
        <name>Intent-Based Networking and Automation</name>

        <t>Intent-based networking (IBN) and automation technologies are increasingly used in the role in the management and orchestration of HP-WANs. IBN allows network administrators to define desired network states or outcomes, with automated systems translating these intents into actionable network configurations. As discussed earlier, platforms such as ESnet's SENSE provide valuable practical demonstrations of how intent-driven orchestration can significantly enhance agility, scalability, and operational efficiency.</t>
        
       </section> 

        <section anchor="SIG" numbered="true" toc="default">
        <name>Network Signalling</name>

        <t>As the scale and complexity of HP-WAN deployments grow, efficient signalling mechanisms become increasingly critical, especially when running HPWAN services over shared public infrastructure.</t>
 
        <t>Applications may want to signal their desired bandwidth to the network, enabling more precise rate negotiation and collaborative congestion control, to achieve a targeted competition time for the data transfer.</t>

        <t>Therefore, efficient and scalable signalling approaches are vital for dynamic resource allocation in HPWAN environments. Effective protocols must support rapid dissemination of resource states and swift propagation of requests between network components, minimising latency and overhead.</t>

        <t>Desirable signalling mechanisms in HPWAN include extensibility, low overhead, real-time responsiveness, and robustness, supporting diverse technologies and ensuring reliable, high-performance communication.</t>
                
       </section> 

    </section>


    <section anchor="IANA" numbered="true" toc="default">
      <name>IANA Considerations</name>

        <t>This document makes no requests for action by IANA.</t>

    </section>


    <section anchor="SEC" numbered="true" toc="default">
      <name>Security Considerations</name>

      <t>The security requirements for HPC networks, particularly in inter-data center scenarios,
         are crucial to ensuring the integrity, confidentiality, and availability of sensitive 
         data and computational resources. These requirements are stringent due to the high-value 
         and often sensitive nature of the data processed within HPC systems, such as research data
         in fields like national defense, pharmaceuticals, and climate science.</t>

     </section>


    <section anchor="ACK" numbered="true" toc="default">
      <name>Acknowledgements</name>

      <t>This document was partly motivated by the discussion occurring on the IETF hp-wan@ietf.org mailing list.</t> 
         
      <t>The authors would like to thank Gorry Fairhurst and Zahed Sarkerfor their reviews and suggestions.</t>

    </section>

   <section anchor="contributors" numbered="false" toc="default">
      <name>Contributors</name>

      <t>The following authors contributed significantly to this document:</t>
        <artwork name="" type="" align="left" alt="">
          <![CDATA[

   Nicholas Race
   Lancaster University
   United Kingdom
   Email: n.race@lancaster.ac.uk
          ]]>
       </artwork>
    </section>


  </middle>

  <back>

    <references>
    
      <name>Normative References</name>

    </references>

    <references>
    
      <name>Informative References</name>

      &RFC5040;
      &RFC5041;
      &RFC5042;
      &RFC5043;
      &RFC5044;
      &RFC6580;      
      &RFC6581;
      &RFC6703;     
      &RFC7306;                  
    </references>

  </back>

</rfc>
