<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="exp" docName="draft-jain-lisp-network-ai-infra-02"
     ipr="trust200902" obsoletes="">
  <front>
    <title abbrev="LISP Network AI Infra">LISP-Based Network for AI
    Infrastructure</title>

    <author fullname="Prakash Jain" initials="P" surname="Jain">
      <organization>MIPS</organization>

      <address>
        <postal>
          <street/>

          <city>San Jose</city>

          <region>CA</region>

          <code/>

          <country>USA</country>
        </postal>

        <email>prjain@mips.com</email>
      </address>
    </author>

    <author fullname="Sanjay Hooda" initials="S" surname="Hooda">
      <organization>Cisco Systems</organization>

      <address>
        <postal>
          <street/>

          <city>San Jose</city>

          <region>CA</region>

          <code/>

          <country>USA</country>
        </postal>

        <phone/>

        <facsimile/>

        <email>shooda@cisco.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Durgesh Srivastava" initials="D" surname="Srivastava">
      <organization>DataraAI</organization>

      <address>
        <postal>
          <street/>

          <city>Cupertino</city>

          <region>CA</region>

          <code/>

          <country>USA</country>
        </postal>

        <phone/>

        <facsimile/>

        <email>durgesh@DataraAI.ai</email>

        <uri/>
      </address>
    </author>

    <author fullname="Padma Pillay-Esnault" initials="P"
            surname="Pillay-Esnault">
      <organization>Independent</organization>

      <address>
        <postal>
          <street/>

          <city>San Jose</city>

          <region>CA</region>

          <code/>

          <country>USA</country>
        </postal>

        <phone/>

        <facsimile/>

        <email>padma.ietf@gmail.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Victor Moreno" initials="V" surname="Moreno">
      <organization>Google</organization>

      <address>
        <postal>
          <street/>

          <city>Mountainview</city>

          <region>CA</region>

          <code/>

          <country>USA</country>
        </postal>

        <phone/>

        <facsimile/>

        <email>vimoreno@google.com</email>

        <uri/>
      </address>
    </author>

    <date day="21" month="August" year="2025"/>

    <abstract>
      <t>The LISP control plane provides the mechanisms to support both
      Scale-Up and Scale-Out backend networks within AI infrastructure. This
      document outlines how LISP can enable a unified control plane
      architecture that accommodates both scaling technologies, offering
      flexibility in deployment. This approach allows AI/ML
      applications&mdash;whether focused on training or inference&mdash;to
      operate workloads efficiently with resiliency on a converged
      infrastructure, supporting diverse deployment scenarios using the same
      underlying network fabric.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in BCP 14 <xref
      target="RFC2119">RFC 2119 </xref> <xref target="RFC8174">RFC 8174
      </xref>.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>This document outlines the architecture and mechanisms for
      implementing a LISP-based unified control plane to converge both
      Scale-Up and Scale-Out backend networks within data center AI
      infrastructure. The proposed architecture leverages LISP protocol
      capabilities to meet the demanding requirements of AI
      workloads&mdash;high bandwidth, ultra-low latency, deterministic
      behavior, and scalable network performance&mdash;utilizing appropriate
      methods tailored to each scaling technologies. While LISP provides both
      the data plane and control plane mechanisms, this document focuses
      specifically on the unified control plane.</t>

      <t>The decision to send a flow over either the Scale-Up or Scale-Out
      network is determined by the traffic&rsquo;s destination. For example,
      intra-POD traffic (which may be non-IP) is directed to the Scale-Up
      network, while inter-POD traffic (primarily IP-based) is routed through
      the Scale-Out network. The segmentation allows both scaling domains to
      function as distinct network domains, enabling flexible deployment
      options&mdash;including Scale-Up only, Scale-Out only, or hybrid
      configurations&mdash;without requiring changes to the underlying
      architecture.</t>

      <t>The unified solution for Scale-Up and Scale-Out networks enables the
      extension of either or both scaling domains within data center backend
      networks, with optimizations tailored for latency, bandwidth, and packet
      loss. These enhancements help meet the performance demands of diverse
      AI/ML workloads, including inference and training. A key advantage of
      this architecture is its flexibility: while most data centers require
      both Scale-Up and Scale-Out capabilities, some applications may not be
      agnostic to the underlying network and may assume exclusive use of one
      model. In such cases, LISP-based segmentation ensures that specific
      functionalities are confined to either the Scale-Up or Scale-Out domain,
      preserving architectural integrity while supporting security and
      application-specific requirements.</t>
    </section>

    <section title="Definition of Terms">
      <t>&bull; LISP Terminology: All terms related to the Locator/ID
      Separation Protocol (LISP)&mdash;including&nbsp;EID&nbsp;(Endpoint
      Identifier),&nbsp;RLOC&nbsp;(Routing
      Locator),&nbsp;xTR&nbsp;(Ingress/Egress Tunnel
      Router),&nbsp;MS/MR&nbsp;(Mapping System),
      and&nbsp;publication-subscription etc&mdash;are used as defined in the
      LISP specifications: [RFC9300], [RFC9301], and [RFC9437].</t>

      <t>&bull; Accelerator (Acc): Specialized hardware designed to accelerate
      AI workloads. Examples include: GPU (Graphics Processing Unit), TPU
      (Tensor Processing Unit), NPU (Neural Processing Unit)</t>

      <t>&bull; PoD (Point of Delivery / Performance-Optimized Datacenter): A
      modular unit within a data center that integrates compute, storage, and
      networking resources to deliver localized services. PoDs are designed
      for scalability and performance optimization.</t>

      <t>&bull; Scale-Up Network: A segment of the data center backend network
      optimized for&nbsp;intra-PoD&nbsp;communication, typically among up to
      1,024 accelerators. It supports ultra-low latency operations through
      direct load/store memory access between accelerators.</t>

      <t>&bull; Scale-Out Network: A segment of the data center backend
      network designed for&nbsp;inter-PoD&nbsp;communication across clusters
      of thousands of accelerators. It leverages&nbsp;RDMA (Remote Direct
      Memory Access)&nbsp;for efficient, high-throughput data transfer between
      distributed compute nodes.</t>

      <t>&bull; INCC (In-Network Collective Communication): A technique that
      offloads collective communication operations to network switches to
      reduce data movement and improve performance. These
      operations&mdash;such
      as&nbsp;AllReduce,&nbsp;Broadcast,&nbsp;Gather/Scatter,
      and&nbsp;AllToAll&mdash;are essential for synchronizing data across
      multiple compute nodes during distributed AI training.</t>
    </section>

    <section title="AI Infra System Architecture">
      <t>In LISP based backend data center network architecture for AI, the
      Scale-Up network usually manages intra-PoD non-IP accelerator traffic
      and is optimized for ultra-low latency operations, while the Scale-Out
      network handles inter-PoD IP traffic with high throughput and
      scalability, adapting dynamically via LISP pub-sub mechanisms.
      Instance-IDs (IIDs) provide segmentation between domains and enable
      tenant- or job-level isolation and policy enforcement. The architecture
      supports multi-homing and multi-pathing for reliability, and allows
      flexible deployment options&mdash;Scale-Up only, Scale-Out only, or
      hybrid&mdash;without requiring changes to the underlying network
      infrastructure.</t>

      <figure>
        <artwork><![CDATA[   IP:10.0.0.1  IP:10.0.0.2                 IP:20.0.0.3  IP:20.0.0.4
    '--------'  '--------'                   '--------'  '--------'
    : AccID 1:  : AccID 2:                   : AccID 3:  : AccID 4:
    '--------'  '--------'                   '--------'  '--------'
        |     /\     |                           |     /\     |
    RLOC=IP_A1  RLOC=IP_A2                   RLOC=IP_B1  RLOC=IP_B2
    +-+--+--+    +-+--+--+                    +-+--+--+    +-+--+--+
   .| xTR A1|.-..| xTR A2|.-.                .| xTR B1|.-..| xTR B2|.-.
  ( +-+--+--+    +-+--+--+   )              ( +-+--+--+    +-+--+--+   )
  (         PoD A            )              (         PoD B            )
 (    +-------------------+   )            (    +-------------------+   )
  (   Scale-Up MS/MR & INCC  )              (   Scale-Up MS/MR & INCC  )
 (    +-------------------+ )              (    +-------------------+ )
  (     EID:10.0.0.0/16    )                (     EID:20.0.0.0/16    ) 
   (+-+--+--+    +-+--+--+)                  (+-+--+--+    +-+--+--+)
    | xTR A3|.-..| xTR A4|                    | xTR B3|.-..| xTR B4|
    +-+--+--+    +-+--+--+                    +-+--+--+    +-+--+--+
   RLOC=IP_A3    RLOC=IP_A4                  RLOC=IP_B3    RLOC=IP_B4
             \       |                            |       /
              .-._..-._.--._..|._.,.-._.,|-._.-_._.-.._.-.
          .-.'                                            '.-.
         (                  Scale-Out Fabric                  )
        (                   +--+--+   +-+---+                  )
       (                    |MS/MR|   |MS/MR|                   )
        (                   +-+---+   +-----+                  )
         (          (Scale-Out Mapping System with INCC)      )
          '.-._.'.-._.'--'._.'.-._.'.-._.'.-._.'.-._.'.-._.-.'
            /        |                           |       \
   RLOC=IP_C3    RLOC=IP_C4                  RLOC=IP_D3   RLOC=IP_D4
    +-+--+--+    +-+--+--+                   +-+--+--+    +-+--+--+
   .| xTR C3|.-..| xTR C4|.-.               .| xTR D3|.-..| xTR D4|.-.
  ( +-+--+--+    +-+--+--+   )             ( +-+--+--+    +-+--+--+  )
  (         PoD C             )            (          PoD D           )
 (    +-------------------+    )          (    +-------------------+   )
  (   Scale-Up MS/MR & INCC   )            (   Scale-Up MS/MR & INCC  )
 (    +-------------------+  )            (    +-------------------+ )
  (     EID:30.0.0.0/16    )               (     EID:40.0.0.0/16    ) 
   (+-+--+--+    +-+--+--+)                 (+-+--+--+    +-+--+--+)
    | xTR C1|.-..| xTR C2|                   | xTR D1|.-..| xTR D2|
    +-+--+--+    +-+--+--+                   +-+--+--+    +-+--+--+
    RLOC=IP_C1  RLOC=IP_C2                   RLOC=IP_D1  RLOC=IP_D2
        |     \/     |                            |    \/     |
    '--------'  '--------'                   '--------'  '--------'
    : AccID 5:  : AccID 6:                   : AccID 7:  : AccID 8:
    '--------'  '--------'                   '--------'  '--------'
   IP:30.0.0.5  IP:30.0.0.6                 IP:40.0.0.7  IP:40.0.0.8

   Figure 1: AI Infra System with converged Scale-Up and Scale-Out]]></artwork>
      </figure>

      <t/>

      <t>Figure 1 illustrates example system architecture of a data center AI
      infrastructure. The system comprises four PoDs (A, B, C, and D), each
      equipped with a Scale-Up accelerator fabric, interconnected via a
      Scale-Out fabric. Within each PoD, accelerators communicate over the
      Scale-Up network, while inter-PoD communication is handled by the
      Scale-Out network. Each PoD includes xTRs (Tunnel Routers) that register
      accelerator EIDs with the LISP Mapping System, enabling seamless
      connectivity across the AI infrastructure. This architecture supports AI
      applications for both inference and training.</t>

      <t>The recommended deployment model maps intra-PoD traffic (which may be
      non-IP) to the Scale-Up network, and inter-PoD traffic (typically
      inter-subnet IP) to the Scale-Out network. This document assumes this
      traffic mapping model when describing packet flows.</t>

      <t>To support this unified architecture, the control plane utilizes two
      primary types of EID-to-RLOC mappings for accelerators:</t>

      <t>&bull; Scale-Up EID &#10216;IID, AccID&#10217; &rarr; RLOC
      &#10216;IP&#10217;: Used for the Scale-Up network.
      The&nbsp;AccID&nbsp;may be a non-IP identifier.</t>

      <t>&bull; Scale-Out EID &#10216;IID, AccIP&#10217; &rarr; RLOC
      &#10216;IP&#10217;: The traditional LISP mapping, used for the Scale-Out
      with IP-based addressing.</t>
    </section>

    <section title="Scale-Out Network Architecture">
      <t>The Figure 2 illustrates the Scale-Out backend network architecture
      of Data Center's AI infrastructure.</t>

      <figure>
        <artwork><![CDATA[           PoD A                                    PoD B          
        EID:10.0.0.0/16                          EID:20.0.0.0/16   
    +-+--+--+    +-+--+--+                  +-+--+--+    +-+--+--+
    | xTR A3|.-..| xTR A4|                  | xTR B3|.-..| xTR B4|
    +-+--+--+    +-+--+--+                  +-+--+--+    +-+--+--+
    RLOC=IP_A3   RLOC=IP_A4                 RLOC=IP_B3   RLOC=IP_B4
             \       |                           |       /
              .-._..-._.--._..|._.,.-._.,|-._.-_._.-.._.-.
          .-.'                                            '.-.
         (                    Scale-Out Fabric                )
        (               (RLOC Space Across PoDs)               )
        (            +----------------------------------+     )
       (             (Scale-Out Mapping System with INCC)      )
        (            +----------------------------------+       )
         (                  +--+--+   +-+---+                   )
        (                   |MS/MR|   |MS/MR|                  )
         (                  +-+---+   +-----+                 )
          '.-._.'.-._.'--'._.'.-._.'.-._.'.-._.'.-._.'.-._.-.'
            /        |                           |       \
    RLOC=IP_C3   RLOC=IP_C4                  RLOC=IP_D3   RLOC=IP_D4
    +-+--+--+    +-+--+--+                   +-+--+--+    +-+--+--+
    | xTR C3|.-..| xTR C4|                   | xTR D3|.-..| xTR D4|
    +-+--+--+    +-+--+--+                   +-+--+--+    +-+--+--+    
        EID:30.0.0.0/16                          EID:40.0.0.0/16     
           PoD C                                     PoD D          

    Figure 2: Scale-Out AI Infra Network Architecture]]></artwork>
      </figure>

      <section title="Scale-Out Registration and Fabric Paths">
        <t>Each PoD is assigned one (or more) unique IP subnet, which is
        provisioned and registered with both the Scale-Out and Scale-Up
        mapping systems by the xTRs/pxTRs participating in the Scale-Out
        network, as illustrated in Figure 2. MAP-Register message format is as
        specified in [RFC9301] and [I-D.ietf-lisp-site-external-connectivity].
        Additional Accelerator metadata MAY be encoded using vendor-specific
        LCAF types as defined in [RFC9306]</t>

        <t>Once registered or de-registered, these subnets changes are
        published to all remote xTRs/pxTRs participating in the Scale-Out
        network and to all local xTRs of Scale-Up network within the PoD,
        following the publication mechanisms outlined in [RFC9437] and
        [I-D.ietf-lisp-site-external-connectivity]. MAP-Notify/Publication
        message format is as specified in [RFC9301] and
        [I-D.ietf-lisp-site-external-connectivity]. Additional Accelerator
        metadata MAY be encoded in the MAP-Notify or Publication message
        formats using vendor-specific LCAF types [RFC9306].</t>

        <t>Upon receipt, the published subnets are added to the map-cache
        (routing table) of each xTR or pxTR of both Scale-Out and Scale-Up
        domains, establishing forwarding paths toward the xTRs/pxTRs
        responsible for the respective subnet.</t>

        <t>Each accelerator's IP Address within a PoD is also registered with
        the local mapping system (Scale-Up MS/MR in Figure 1) by its
        associated xTR (RLOC). Within the PoD, these registrations (~1024
        Accelerators) are published to all xTRs including remote xTRs/pxTRs,
        where they are added to the map-cache or routing table as the path to
        the registering xTRs. Path change mechanisms (due to failures,
        congestions etc) are described in later sections.</t>
      </section>

      <section title="Scale-Out Subscription and Publication">
        <t>When an accelerator connected via an ITR initiates communication
        with a remote accelerator, the ITR sends a Map-Request for the remote
        EID. The request includes the&nbsp;N-bit&nbsp;set in the EID-Record,
        indicating that the ITR wishes to be notified of any changes to the
        RLOC-set associated with that EID, as defined by the publish-subscribe
        mechanism [RFC9437]. MAP-Request/Subscription message format is as
        specified in [RFC9301] and [I-D.ietf-lisp-site-external-connectivity].
        Additional Accelerator metadata MAY be encoded using vendor-specific
        LCAF types as defined in [RFC9306].</t>

        <t>Any xTR or pxTR in PoD that had subscribed to updates for the
        Scale-Out EID&mdash;via the Scale-Out Mapping System (e.g., Scale-Out
        MS/MR) or Scale-Up Local Mapping System (e.g., Scale-Up
        MS/MR)&mdash;receives a Map-Notify from the mapping systems. This
        notification includes the updated RLOC-set (e.g., addition or removal
        of locators), as specified in the Mapping Notification Publish
        Procedures in [RFC9437]. MAP-Notify/Publication message format is as
        specified in [RFC9301] and [I-D.ietf-lisp-site-external-connectivity].
        Additional Accelerator metadata MAY be encoded in the MAP-Notify or
        Publication message formats using vendor-specific LCAF types
        [RFC9306].</t>

        <t>Upon receipt, the published EID is added to the map-cache (routing
        table) of each xTR or pxTR, establishing forwarding paths toward the
        xTR(s) responsible for the respective EID.</t>

        <t>When a destination accelerator is either undiscovered or
        deregistered in the Mapping System, it is treated as an&nbsp;Unknown
        Accelerator. In such cases, the Map-Server&nbsp;SHOULD&nbsp;respond to
        a Map-Request or subscription targeting the unknown accelerator with
        a&nbsp;Negative Map-Reply&nbsp;specifying the action&nbsp;"Drop" as in
        [RFC9301] and [I-D.ietf-lisp-site-external-connectivity].</t>
      </section>

      <section title="Scale-Out Packet Flow">
        <t>Inter-PoD traffic is encapsulated using Layer 3 (L3) encapsulation,
        following the procedures defined in [RFC9300], [RFC9301], and
        [RFC9437]. It is assumed that the accelerator in PoD A is aware of the
        EID (IP address) of the destination accelerator in PoD D&mdash;this
        may be obtained via packet inspection, address resolution, or
        provisioning. The following example illustrates a unicast packet flow
        and the associated control plane operations for the topology shown in
        Figure 1, where an accelerator in PoD A communicates with an
        accelerator in PoD D:</t>

        <t>&bull; Fabric paths across PoDs are established as described in
        Section 4.1.</t>

        <t>&bull; Within a PoD, each accelerator (Scale-Out EID eg 40.0.07) is
        also registered with the local mapping system (Scale-Up MS/MR in
        Figure 1) by its associated xTRs (RLOCs IP_D1 &amp; IP_D2) as
        described in Section 4.1. These registrations (mapping
        40.0.0.7-&gt;IPD1 &amp; IPD2) are published to all xTRs/pxTRs within
        the PoD D, where they are added to the map-cache or routing table as
        the path towards the registering xTRs D! &amp; D2 for Accelerator
        40.0.0.7. Accelerators, unknown or external to PoD are registered as
        pETRs to local mapping system, as defined in
        [I-D.ietf-lisp-site-external-connectivity].</t>

        <t>&bull; Accelerator 1 in PoD A sends an IP packet with source
        address&nbsp;10.0.0.1&nbsp;and destination address&nbsp;40.0.0.7.
        Since the destination lies in a different subnet, the local xTR in PoD
        A forwards the packet based on its map-cache/routing table to pxTR A3,
        which acts as the default gateway for the PoD. pxTR A3 then forwards
        the packet to xTR D3, using its map-cache entry that contains the
        subnet information for PoD D, as published by the Scale-Out mapping
        system described in section 4.1.</t>

        <t>&bull; xTR/pxTR D3 performs a Layer 3 lookup in its local map-cache
        for the destination IP&nbsp;40.0.0.7. Since Accelerator 7 is
        registered with the local mapping system and is published in PoD D,
        xTR D3 has a valid map-cache entry pointing to xTR D1 and D2 (with
        RLOCs&nbsp;IP_D1&nbsp;and&nbsp;IP_D2).</t>

        <t>&bull; xTR D3 encapsulates the packet using LISP, setting the
        destination RLOC to either&nbsp;IP_D1&nbsp;or&nbsp;IP_D2, depending on
        the load-balancing or redundancy policy.</t>
      </section>

      <section title="Scale-Out Path Change">
        <t>When the Publish/Subscribe mechanism [RFC9437] is used, the
        signaling flow to manage accelerator path changes (due to any failure,
        congestion etc) proceeds as follows:</t>

        <t>&bull; Registration: Upon attachment of Accelerator 7 to PoD D, the
        local ETR D1/D2 updates its local database with the mapping for the
        EID&nbsp;&lt;IID1, 40.0.0.7&gt;. The ETR then sends a Map-Register
        message to the local mapping system, registering its RLOCs
        (e.g.,&nbsp;IP_D1,&nbsp;IP_D2) as locators for the EID. The mapping
        system is updated with this EID-to-RLOC association.</t>

        <t>&bull; First Communication Request/Subscription: When Accelerator 1
        connected via an ITR A1 initiates communication with a remote
        Accelerator 7, the ITR sends a Map-Request for the remote EID. The
        request includes the&nbsp;N-bit&nbsp;set in the EID-Record, indicating
        that the ITR wishes to be notified of any changes to the RLOC-set
        associated with that EID, as described in section 4.2.</t>

        <t>&bull; Deregistration: When Accelerator 7 is detached, the mapping
        system receives a deregistration message for the EID&nbsp;&lt;IID1,
        40.0.0.7&gt;&nbsp;from ETR D1. It then sends a Map-Notify to ETR D1 to
        confirm the deregistration. ETR D1 subsequently removes the local
        mapping entry and ceases to advertise the EID.</t>

        <t>&bull; Notification/Publication: Any xTR or pxTR in PoD D (or
        elsewhere) that had subscribed to updates for the EID&nbsp;&lt;IID1,
        40.0.0.7&gt;&mdash;typically via the local mapping system (e.g.,
        Scale-Up MS/MR)&mdash;receives a Map-Notify from the mapping system.
        This notification includes the updated RLOC-set (e.g., addition or
        removal of locators), as specified in the Mapping Notification Publish
        Procedures in [RFC9437].</t>

        <t>&bull; Map-Cache Update: Upon receiving the Map-Notify, the
        subscribing ITR (e.g., xTR D3) updates its local map-cache or routing
        table to reflect the new RLOC-set for the EID (Acccelerator
        7/40.0.0.7). For example, if&nbsp;IP_D1&nbsp;is removed for
        Accelerator 7 (40.0.0.7), xTR D3 stops forwarding traffic to that
        locator (IP_D1), ensuring accurate and up-to-date routing
        behavior.</t>
      </section>

      <section title="Deployment Considerations">
        <section title="Scale-Out Segmentation and INCC">
          <t>LISP Scale-Out segmentation is based on the use and propagation
          of Instance-IDs (IIDs), which are treated as part of the EID in
          control plane operations. The encoding format for IIDs is defined in
          [RFC8060]. Instance-IDs are unique within a given Mapping System and
          MAY be used to distinguish between Scale-Up and Scale-Out
          domains.</t>

          <t>A key aspect of Scale-Out segmentation is the ability to
          associate In-Network Collective Communication (INCC) groups with
          specific IIDs. In this model, an INCC domain&mdash;functionally
          equivalent to a Virtual Forwarding (VRF) instance&mdash;can be
          mapped to a corresponding IID representing a Scale-Out domain.
          Alternatively, each INCC group may be mapped in a 1:1 relationship
          with a unique Scale-Out segment instance.</t>

          <t>This use of Instance-IDs enables support for multiple Scale-Out
          segments, similar to extended VRFs or multi-VPN, as described in
          [I-D.ietf-lisp-vpn].</t>
        </section>

        <section title="Scale-Out Mappings">
          <t>When an accelerator is attached or detected in an ETR that
          provides Scale-Out services and path change, Scale-Out Mappings are
          registered to the mapping system with the following structures:</t>

          <t>* The EID 2-tuple (IID, Acc-IP Address) with its binding to a
          corresponding ETR locator set (RLOC IP Address).</t>

          <t>* The EID 2-tuple (IID, Acc-IP Subnet) with its binding to a
          corresponding pETR locator set (RLOC IP Address).</t>

          <t>The registration of these Accelerator/Subnet EIDs MUST follow the
          LCAF format as defined in [RFC8060] with the specific EID record as
          specified in [RFC9301] and can be used with pETR registration
          [I-D.ietf-lisp-site-external-connectivity].</t>
        </section>

        <section title="Scale-Out Mapping System (MS/MR)">
          <t>Scale-Out (across PoDs) Mapping System also uses services from
          Scale-Up Mapping Systems (within PoDs) to establish end to end
          Scale-Out network of Accelerators.</t>

          <t>The interface between xTRs/pxTRs and the Mapping System follows
          the procedures defined in [RFC9301] and
          [I-D.ietf-lisp-site-external-connectivity]. The addition and removal
          of subnets are handled through pxTR registration/deregistration and
          publication processes, as described in Section 4.1 and Section
          4.5.2. Mapping System MAY be implemented as a distributed mapping
          system to avoid single point of failure.</t>

          <t>To support system convergence following an accelerator or subnet
          path change, the local Mapping System (Scale Up MS/MR) MUST also
          send a Map-Notify message to the full RLOC set&mdash;including all
          relevant pxTRs&mdash;within the PoD where the affected EID was last
          registered. This notification is triggered upon receiving a
          registration update for that specific accelerator or subnet EID. The
          Map-Notify serves to indicate the unavailability or change in the
          accelerator&rsquo;s path, as detailed in Section 4.3.</t>
        </section>

        <section title="Scale-Out Unknown Accelerators">
          <t>When a destination accelerator is either undiscovered or
          deregistered in the Mapping System, it is treated as an&nbsp;Unknown
          Accelerator. In such cases, the Map-Server&nbsp;SHOULD&nbsp;respond
          to a Map-Request or subscription targeting the unknown accelerator
          with a Negative Map-Reply&nbsp;specifying the action "Drop" as per
          [RFC9301] and [I-D.ietf-lisp-site-external-connectivity].</t>

          <t>Alternatively, the forwarding plane may be configured to default
          to the&nbsp;"Drop"&nbsp;action for Unknown Accelerators, thereby
          suppressing any forwarding attempts toward unregistered or
          unreachable destinations</t>
        </section>
      </section>
    </section>

    <section title="Scale-Up Network Architecture">
      <t>Scale Up network architecture is as shown in Figure 3. This section
      uses PoD C &amp; PoD D to describe the details.</t>

      <figure>
        <artwork><![CDATA[            /        |                           |       \
    RLOC=IP_C3   RLOC=IP_C4                  RLOC=IP_D3   RLOC=IP_D4
    +-+--+--+    +-+--+--+                   +-+--+--+    +-+--+--+
   .| xTR C3|.-..| xTR C4|.-.               .| xTR D3|.-..| xTR D4|.-.
  ( +-+--+--+    +-+--+--+   )             ( +-+--+--+    +-+--+--+   )
  (         PoD C            )             (          PoD D           )
 (    +-------------------+   )           (   +--------------------+   )
  (   Scale-Up MS/MR & INCC  )             (  Scale-Up MS/MR & INCC   )
 (    +-------------------+ )             (   +--------------------+ )
  (     EID:30.0.0.0/16    )               (     EID:40.0.0.0/16    ) 
   (+-+--+--+    +-+--+--+)                 (+-+--+--+    +-+--+--+)
    | xTR C1|.-..| xTR C2|                   | xTR D1|.-..| xTR D2|
    +-+--+--+    +-+--+--+                   +-+--+--+    +-+--+--+
    RLOC=IP_C1  RLOC=IP_C2                   RLOC=IP_D1  RLOC=IP_D2
        |     \/     |                            |    \/     |
    '--------'  '--------'                   '--------'  '--------'
    :  Acc   :  :  Acc   :                   :  Acc   :  :  Acc   :
    :  ID 5  :  :  ID 6  :                   :  ID 7  :  :  ID 8  :
    '--------'  '--------'                   '--------'  '--------'
   IP:30.0.0.5  IP:30.0.0.6                 IP:40.0.0.7  IP:40.0.0.8
                  
  Figure 3: Scale-Up AI Infra Network Architecture]]></artwork>
      </figure>

      <section title="Scale-Up Registration">
        <t>Each accelerator within a PoD is registered by its associated
        xTR(s) (RLOCs) using its&nbsp;AccID&nbsp;as the Scale-Up EID, with the
        registration sent to the local mapping system (Scale-Up MS/MR shown in
        Figure 3). The&nbsp;AccID&nbsp;MAY be a non-IP identifier and MAY be
        encoded using the LISP Name Encoding format defined in [RFC9735].
        MAP-Register message format is as specified in [RFC9301]. Additional
        Accelerator metadata MAY be encoded using vendor-specific LCAF types
        as defined in [RFC9306].</t>

        <t>The local mapping system then publishes the EID-to-RLOC mappings
        (i.e.,&nbsp;AccID&nbsp;to xTR(s)) to all subscribed and authorized
        xTRs within the PoD, in accordance with [RFC9437].
        MAP-Notify/Publication message format is as specified in [RFC9301].
        Additional Accelerator metadata MAY be encoded in the MAP-Notify or
        Publication message formats using vendor-specific LCAF types
        [RFC9306]. These published mappings are subsequently added to the
        map-cache or routing table of remote xTRs/pxTRs within the same PoD,
        establishing the forwarding path to the registering xTR(s).</t>
      </section>

      <section title="Scale-Up Subscription and Publication">
        <t>When an accelerator connected via an ITR initiates communication
        with a remote accelerator, the ITR sends a Map-Request for the remote
        EID. The request includes the&nbsp;N-bit&nbsp;set in the EID-Record,
        indicating that the ITR wishes to be notified of any changes to the
        RLOC-set associated with that EID, as defined by the publish-subscribe
        mechanism [RFC9437]. MAP-Request/Subscription message format is as
        specified in [RFC9301] and [I-D.ietf-lisp-site-external-connectivity].
        Additional Accelerator metadata MAY be encoded using vendor-specific
        LCAF types as defined in [RFC9306].</t>

        <t>Any xTR or pxTR in PoD that had subscribed to updates for the EID
        &mdash;via the Scale-Up Mapping System (e.g., Scale-Up MS/MR)&mdash;
        receives a Map-Notify from the mapping system. This notification
        includes the updated RLOC-set (e.g., addition or removal of locators),
        as specified in the Mapping Notification Publish Procedures in
        [RFC9437]. MAP-Notify/Publication message format is as specified in
        [RFC9301] and [I-D.ietf-lisp-site-external-connectivity]. Additional
        Accelerator metadata MAY be encoded in the MAP-Notify or Publication
        message formats using vendor-specific LCAF types [RFC9306].</t>

        <t>Upon receipt, the published EID is added to the map-cache (routing
        table) of each xTR or pxTR, establishing forwarding paths toward the
        xTR(s) responsible for the respective EID.</t>

        <t>When a destination accelerator is either undiscovered or
        deregistered in the Mapping System, it is treated as an&nbsp;Unknown
        Accelerator. In such cases, the Map-Server&nbsp;SHOULD&nbsp;respond to
        a Map-Request or subscription targeting the unknown accelerator with
        a&nbsp;Negative Map-Reply&nbsp;specifying the action&nbsp;"Drop" as
        per [RFC9301] and [I-D.ietf-lisp-site-external-connectivity]..</t>
      </section>

      <section title="Scale-Up Packet Flow">
        <t>Scale-Up packet (could be non-IP) utilizes Scale-Up technology
        (e.g., PCIe or Ethernet) and is encapsulated accordingly. This section
        illustrates an example of Scale-Up unicast packet flow and the
        associated control plane operations, based on the topology shown in
        Figure 3. In this scenario, Accelerator 7 in PoD D communicates with
        Accelerator 8, also located in PoD D. It is assumed that Accelerator 7
        is aware of Accelerator 8&rsquo;s AccID (e.g., learned via packet
        exchange, dynamic resolution, or a management interface).</t>

        <t>&bull; Each Accelerator within a PoD is registered by its xTRs
        (RLOCs) using its AccID (e.g., AccID 7, AccID 8) as the EID, as
        described in Section 5.1.</t>

        <t>&bull; If a path is not pre-established (e.g. default or during
        provisioning), when Accelerator 7 (connected via xTR D1) initiates
        communication with Accelerator 8 (connected via xTR D2), ITR D1 issues
        a Map-Request for Accelerator 8. Following the Mapping Request
        Subscribe Procedures defined in [RFC9437], the Map-Request includes
        the N-bit set on the EID-Record to ensure the ITR is notified of the
        mapping and any RLOC-set changes for the Accelerator.</t>

        <t>&bull; The local mapping system publishes all the AccID-to-xTR(s)
        mappings (EID-to-RLOC(s)) to subscribing xTRs/pxTRs. These subscribing
        xTRs/pxTRs then establish a path to the registering xTR(s) within the
        PoD, as outlined in Section 5.1.</t>

        <t>&bull; When Accelerator 7 sends a Scale-Up packet, it includes the
        destination AccID 8 and source AccID 7, in accordance with [RFC9300]
        and [RFC9301].</t>

        <t>&bull; ITR D1 performs a lookup in its local map-cache for the
        destination AccID 8.</t>

        <t>&bull; Since ITR D1 already has the EID-to-RLOC mapping for AccID 8
        (pointing to xTR D2), it encapsulates all subsequent packets to AccID
        8 using destination RLOC IP_D2 and source RLOC IP_D1.</t>
      </section>

      <section title="Scale-Up Path Change">
        <t>This section describes the mechanism for handling path changes of
        an accelerator within a PoD due to failure, upgrade, congestion etc,
        while maintaining uninterrupted communication among accelerators
        connected via multipath in the same Scale-Up domain. The mechanism
        ensures fast convergence of the Scale-Up network when an
        accelerator&rsquo;s path changes. Updates to ITR map-caches are
        managed using the Publish/Subscribe mechanisms defined in [RFC9437].
        The following steps outline the signaling and packet flow when the
        link between Accelerator 8 and xTR D2 (as shown in Figure 3) becomes
        unavailable due to congestion, link failure, or other issues:</t>

        <t>&bull; Initially, when Accelerator 7 (connected via ITR D1)
        establishes communication with Accelerator 8, ITR D1 issues a
        Map-Request for Accelerator 8. In accordance with the Mapping
        Request/Subscribe Procedures defined in [RFC9437], the Map-Request
        includes the N-bit set on the EID-Record to enable notification of any
        RLOC-set changes for Accelerator 8.</t>

        <t>&bull; When Accelerator 8 experiences a path change within PoD D
        (e.g., the link to xTR D2 becomes unavailable or congested), ETR D2
        removes the local mapping for Accelerator 8&rsquo;s EID &#10216;IID,
        AccID 8&#10217;. It then sends a Map-Deregister message to the local
        mapping system, deregistering RLOC IP_D2 as a locator for that
        EID.</t>

        <t>&bull; Upon receiving the deregistration, the mapping system
        updates the locator set for Accelerator 8&rsquo;s EID by removing
        IP_D2 and sends a Map-Notify back to ETR D2. ETR D2 then deletes the
        mapping from its local database and ceases registration for
        &#10216;IID, AccID 8&#10217;.</t>

        <t>&bull; Any ITR or PiTR participating in the same Scale-Up domain
        (associated with IID) that was previously encapsulating traffic to
        AccID 8 would have subscribed to receive updates on RLOC-set changes.
        The local mapping system publishes the updated locator set to these
        subscribers by sending Map-Notify messages, as defined in the Mapping
        Notification Publish Procedures in [RFC9437].</t>

        <t>&bull; Upon receiving the Map-Notify, the ITR updates its local
        map-cache for EID &#10216;IID, AccID 8&#10217;. Once the cache is
        updated, traffic is redirected and tunneled to the new xTRs (e.g., xTR
        D1), and traffic via xTR D2 is halted.</t>
      </section>

      <section title="Deployment Considerations">
        <section title="Scale-Up Segmentation and INCC">
          <t>Similar to Scale-Out segmentation, LISP Scale-Up segmentation is
          based on the propagation and use of Instance-IDs (IIDs), which are
          treated as part of the EID in control plane operations. The encoding
          of Instance-IDs is defined in [RFC8060]. These IIDs are unique
          within a Mapping System and&nbsp;may&nbsp;be used to distinguish
          between Scale-Up and Scale-Out domains.</t>

          <t>A key aspect of Scale-Up segmentation is the potential mapping of
          INCC groups to Instance-IDs. In this context, an INCC
          Domain&mdash;functionally equivalent to a VRF as a forwarding
          context&mdash;can be mapped to an IID representing a Scale-Up
          domain. Alternatively, an INC group may be mapped directly in a
          one-to-one relationship with a Scale-Up segment instance.</t>

          <t>Instance-IDs enable support for multiple Scale-Up segments,
          similar to extended VRFs or multi-VPN, as described in
          [I-D.ietf-lisp-vpn].</t>
        </section>

        <section title="Scale-Up Mappings">
          <t>When an accelerator is attached to or detached from an ETR
          providing Scale-Up services, a corresponding Scale-Up EID is
          registered or deregistered with the mapping system. The Scale-Up
          mapping follows this structure:</t>

          <t>&bull; EID Tuple: The Endpoint Identifier is represented as a
          2-tuple&nbsp;(IID, AccID), where:</t>

          <t>&bull; AccID&nbsp;may be a non-IP identifier. If the AccID is
          non-IP based, it&nbsp;may&nbsp;be encoded using the mechanisms
          described in [RFC9735].</t>

          <t>&bull; The structure of the Accelerator EID record adheres to the
          format defined in [RFC9301].</t>

          <t>&bull; The AccID is bound to a locator set consisting of one or
          more IP RLOCs.</t>
        </section>

        <section title="Scale-Up Mapping System (MS/MR)">
          <t>The interface between xTRs and the Mapping System is defined in
          [RFC9301]. All accelerators are registered with the local Scale-Up
          Mapping System. Mapping System MAY be implemented as a distributed
          mapping system to avoid single point of failure.</t>

          <t>To support rapid system convergence following a path change, the
          Map-Server&nbsp;MUST&nbsp;send a Map-Notify to the entire RLOC set
          within the PoD that last registered the same EID, as well as to any
          xTRs in the PoD that have subscribed to that EID. This Map-Notify
          serves to track changes in the path of Accelerator EIDs, as
          described in Section 5.3.</t>
        </section>

        <section title="Scale-Up Unknown Accelerators">
          <t>When a destination accelerator is either undiscovered or
          deregistered in the Mapping System, it is treated as an&nbsp;Unknown
          Accelerator. In such cases, the Map-Server&nbsp;SHOULD&nbsp;respond
          to a Map-Request or subscription targeting the unknown accelerator
          with a&nbsp;Negative Map-Reply&nbsp;specifying the
          action&nbsp;"Drop".</t>

          <t>Alternatively, the forwarding plane may be configured to default
          to the&nbsp;"Drop"&nbsp;action for Unknown Accelerators, thereby
          suppressing any forwarding attempts toward unregistered or
          unreachable destinations.</t>
        </section>

        <section title="IP Forwarding of Scale-Up Traffic">
          <t>Providing non-IP extensions to cloud platforms is not always
          feasible. As a result, ip/subnets might need to be used and extended
          using Layer 3 (L3) to support intra PoD traffic as well.</t>
        </section>
      </section>
    </section>

    <section title="Multihoming &amp; Multipaths">
      <t>Multihoming support relies on the mechanisms defined in [RFC9300] and
      [RFC9301] to enable LISP-based multihoming for accelerators within a
      backend network. To illustrate the multihoming packet flow, this section
      references&nbsp;Figure 3. For example, in Figure 3,&nbsp;xTRs D1 and
      D2&nbsp;within&nbsp;PoD D&nbsp;provide multihoming services
      for&nbsp;Accelerators 7 and 8.</t>

      <section title="Multihomed Accelerators Registration">
        <t>The&nbsp;Site-ID, as defined in [RFC9301], serves as an identifier
        for logically grouping multiple xTRs that provide multihoming within a
        Scale-Up domain (e.g., a PoD). All EID-to-RLOC mappings from ETRs in a
        multihomed Scale-Up PoD&nbsp;MUST&nbsp;be registered with the
        corresponding Site-ID (e.g., PoD ID) by setting the&nbsp;'I'
        bit&nbsp;in the Map-Register message.</t>
      </section>

      <section title="Multihoming xTRs/RLOCs Merging">
        <t>Supporting multihoming requires that participating xTRs discover
        one another and implement multipath forwarding procedures. This is
        achieved through the registration of a common accelerator EID by all
        participating xTRs. Each registration includes the&nbsp;PoD ID&nbsp;as
        the&nbsp;Site-ID, indicating the PoD in which multihoming is being
        provided. The Mapping System merges these registrations and notifies
        all participating xTRs with the aggregated locator set.
        Using&nbsp;Figure 3&nbsp;as a reference, the xTR discovery process in
        a multihomed Scale-Up group proceeds as follows:</t>

        <t>&bull; xTR D1&nbsp;registers the EID&nbsp;"POD-D-AccID-7"&nbsp;with
        its locator set containing&nbsp;IP_D1.</t>

        <t>&bull; The&nbsp;Map-Server&nbsp;creates a mapping entry: EID
        ("POD-D-AccID-7") &rarr; RLOC (IP_D1), and sends
        a&nbsp;Map-Notify&nbsp;to xTR D1 with this mapping.</t>

        <t>&bull; xTR D2&nbsp;then registers the same
        EID&nbsp;"POD-D-AccID-7"&nbsp;with its locator set
        containing&nbsp;IP_D2.</t>

        <t>&bull; The&nbsp;Map-Server&nbsp;merges this new registration with
        the existing one, resulting in: EID ("POD-D--AccID-7") &rarr; RLOC
        {IP_D1, IP_D2}. It then sends a&nbsp;Map-Notify&nbsp;to both xTR D1
        and xTR D2 with the updated locator set.</t>

        <t>&bull; Whenever an xTR joins or leaves a multihoming group,
        the&nbsp;Map-Server MUST&nbsp;send an updated&nbsp;Map-Notify&nbsp;to
        all remaining participating xTRs to ensure they maintain an accurate
        and synchronized view of the locator set. As a result, all
        participating xTRs maintain an up-to-date view of the multihomed
        group, enabling coordinated multipath forwarding.</t>
      </section>

      <section title="Multihoming/Multipath forwarding">
        <t>In a PoD, both Scale-Out and Scale-Up xTRs can be used to provide
        multihomed access and forward traffic to and from remote PoDs or
        accelerators. Unicast traffic is typically load-balanced or sprayed
        across the multiple xTRs that have registered the accelerator's EID.
        In multicast scenarios, only the designated&nbsp;Scale-Up xTR&nbsp;may
        join the multicast group or replication list. If a&nbsp;Scale-Out
        xTR&nbsp;chooses to join the multicast group,
        it&nbsp;MUST&nbsp;implement split-horizon filtering and ensure that
        traffic from PoD is not forwarded back into the PoD, in order to
        prevent duplication. xTRs providing active multihoming access to a
        PoD&rsquo;s accelerators&nbsp;MUST&nbsp;support the following:</t>

        <t>&bull; Registration: All active xTRs must register the PoD&rsquo;s
        Scale-Up mappings with the Mapping System. Each registration must
        include the&nbsp;'I' bit&nbsp;set and carry both
        the&nbsp;Site-ID&nbsp;(PoD ID) and the corresponding&nbsp;xTR-ID.</t>

        <t>&bull; Multicast forwarding: Only the selected Scale-Up xTR joins
        the Scale-Up multicast group or replication list.</t>

        <t>&bull; Broadcast Forwarding: Only the designated Scale-Out xTR is
        permitted to forward broadcast traffic to and from remote PoDs.</t>
      </section>
    </section>

    <section title="Data Plane Encapsulation Options">
      <t>The LISP control plane is decoupled from the data plane
      encapsulation, allowing flexibility in the choice of encapsulation
      formats for Scale-Up and Scale-Out. Common encapsulation formats include
      VXLAN-GPE, LISP, and VXLAN:</t>

      <t>&bull; VXLAN-GPE Encapsulation: Defined in [RFC9305], VXLAN-GPE
      supports encapsulation of both Scale-Up and Scale-Out packets.
      The&nbsp;VNI field&nbsp;directly maps to the&nbsp;Instance-ID&nbsp;used
      in the LISP control plane. For unified deployments,
      the&nbsp;P-bit&nbsp;is set, and the&nbsp;Next-Protocol&nbsp;field is
      used to indicate the payload type.</t>

      <t>&bull; LISP Encapsulation: As specified in [RFC9300], this format
      also supports encapsulation of both Scale-Up and Scale-Out packets.
      The&nbsp;Instance-ID&nbsp;embedded in the EID maps directly to
      the&nbsp;Instance-ID&nbsp;in the LISP header. Upon decapsulation at the
      ETR, the&nbsp;IID&nbsp;may be used to determine whether the packet
      should be processed as part of a Scale-Up or Scale-Out flow.</t>

      <t>Any alternative encapsulation format optimized for backend networks,
      capable of supporting a 24-bit Instance-ID, MAY be used to deploy
      Scale-Up, Scale-Out, or unified network data planes.</t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>No new IANA considerations apply to this document.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>There are no additional security considerations except what already
      discussed in [RFC9301].</t>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>This draft builds on top of many LISP RFCs and drafts. Many thanks to
      the combined authors of those RFC and drafts.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include='reference.RFC.9300'?>

      <?rfc include='reference.RFC.9301'?>

      <?rfc include='reference.RFC.9306'?>

      <?rfc include='reference.RFC.9735'?>

      <?rfc include='reference.I-D.draft-ietf-lisp-site-external-connectivity-02'?>

      <?rfc include='reference.I-D.draft-ietf-lisp-vpn-12'?>

      <?rfc include='reference.RFC.9437'?>

      <?rfc include='reference.RFC.9305'?>

      <?rfc include='reference.RFC.8060'?>

      <?rfc include='reference.RFC.2119'?>

      <?rfc include='reference.RFC.8174'?>
    </references>
  </back>
</rfc>
