<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-xu-idr-bgp-route-broker-05"
     ipr="trust200902">
  <front>
    <title abbrev="BGP Route Broker">BGP Route Broker for Hyperscale
    SDN</title>

    <author fullname="Xiaohu Xu" initials="X." surname="Xu">
      <organization>China Mobile</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>xuxiaohu_ietf@hotmail.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Shraddha Hegde" initials="S." surname="Hegde">
      <organization>Juniper</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>shraddha@juniper.net</email>

        <uri/>
      </address>
    </author>

    <author fullname="Srihari Sangli  " initials="S." surname="Sangli">
      <organization>Juniper</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>ssangli@juniper.net</email>

        <uri/>
      </address>
    </author>

    <author fullname="Shunwan" initials="S." surname="Zhang">
      <organization>Huawei</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>zhuangshunwan@huawei.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Jie" initials="J." surname="Dong">
      <organization>Huawei</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>jie.dong@huawei.com</email>

        <uri/>
      </address>
    </author>

    <date day="1" month="November" year="2024"/>

    <abstract>
      <t>This document describes an optimized mechanism for BGP route
      reflection, known as BGP route broker. It aims to utilize the BGP-based
      IP VPN as an overlay routing protocol in a scalable manner, specifically
      for hyperscale data center network virtualization environments, commonly
      referred to as Software-Defined Network (SDN) environments.</t>
    </abstract>
  </front>

  <middle>
    <section title="Problem Statement">
      <t>BGP/MPLS IP VPN has been successfully deployed in global service
      provider networks for two decades, proving its scalability in
      large-scale environments. Here, the BGP/MPLS IP VPN means both BGP/MPLS
      IPv4 VPN <xref target="RFC4364"/> and BGP/MPLS IPv6 VPN <xref
      target="RFC4659"/> . In addition, the BGP/MPLS IP VPN-based data center
      network virtualization approach as described in <xref
      target="RFC7814"/>, especially in the virtual PE model as described in
      <xref target="I-D.ietf-bess-virtual-pe"/> has been widely deployed in
      small to medium-sized data centers for the purpose of network
      virtualization, also known as Software Defined Networking (SDN).
      Examples include, but are not limited to, Tungsten Fabric (formerly
      known as OpenContrail) and .</t>

      <t>Hyperscale cloud data centers typically contain tens of thousands of
      servers that are virtualized as Virtual Machines (VMs) or containers.
      This results in at least tens of thousands of virtual Provider Edges
      (PEs), millions of Virtual Private Networks (VPNs), and tens of millions
      of VPN routes from the perspective of network virtualization, assuming a
      virtual PE model is utilized. However, this creates a significant
      challenge regarding the capacity of BGP sessions and the VPN routing
      table of any given BGP router.</t>

      <t>The route reflection (RR) mechanism is essential for addressing BGP
      scaling issues. In a one-level route reflector architecture, VPN routes
      within a data center can be distributed across multiple route
      reflectors. This is achieved by preconfiguring each route reflector with
      a designated block of route targets associated with specific VPNs. As a
      result, no single route reflector needs to manage all the VPN routes
      supported by the data center. For redundancy, multiple route reflectors
      should be configured with the same block of route targets to create a
      route reflector cluster.</t>

      <t>If each virtual PE is connected to at least one VPN associated with a
      specific route reflector, that route reflector would need to establish
      BGP sessions with all virtual PEs. This can lead to excessive BGP
      session demands on route reflectors. To address this scaling issue, a
      new level of route reflectors, referred to as bottom-level route
      reflectors, can be introduced between the existing top-level route
      reflectors and the virtual PEs. Each top-level route reflector would
      then establish BGP sessions with all the bottom-level route reflectors
      instead of with all the virtual PE routers. Additionally, the
      bottom-level route reflectors would only need to establish BGP sessions
      with a subset of the virtual PEs. This partitioning mechanism
      effectively resolves the scaling issue related to BGP session
      capacity.</t>

      <t>In a two-level route reflector (RR) hierarchy within hyperscale data
      centers, using the Route Target Constraint (RTC) mechanism <xref
      target="RFC4684"/> presents two main drawbacks. First, it can be
      challenging to effectively partition all the VPN routes supported by the
      data center among multiple top-level Route Reflectors. Second, virtual
      PEs have to receive RT membership Network Layer Reachability Information
      (NLRIs) related to all the route targets supported by the data center
      which leads to unnecessary consumption of CPU and RAM resources on the
      virtual PEs.</t>

      <section title="Requirements Language">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
        "OPTIONAL" in this document are to be interpreted as described in
        BCP14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only
        when, they appear in all capitals, as shown here.</t>
      </section>
    </section>

    <section title="Solution Overview">
      <t>Bottom-level route reflectors, also known as route brokers, are
      designed using the message queuing mechanisms like RabbitMQ. These route
      brokers maintain route target membership information for their IBGP
      peers and reflect VPN routes among them as needed. Essentially, route
      brokers act as the message brokers or exchanges within the message
      queuing system. In contrast, top-level route reflectors, referred to as
      route collection servers, and virtual PEs, known as route broker
      clients, function as both message publishers (or producers) and
      subscribers (or consumers) within the same message queuing system.</t>
    </section>

    <section title="Route-Target Membership Advertisement Process">
      <t>Route collection servers advertise route target membership
      information based on the preconfigured block of route targets they have.
      Consequently, route brokers are aware of the VPNs partitioned to each
      server.</t>

      <t>Route brokers advertise default route target membership information
      to their clients so as to collect VPN routes from their clients.</t>

      <t>Route broker clients advertise their route target membership
      information based on a dynamically configured block of route targets.
      When a route broker receives this advertisement, it forwards the route
      target membership information to the corresponding route collection
      servers that are preconfigured to cover the advertised route targets.
      This action occurs only if the broker has not previously sent that route
      target membership information towards the corresponding route collection
      servers.</t>

      <t>The advertisement of route target membership information is based on
      Route Target Outbound Route Filtering (ORF) as defined in <xref
      target="I-D.xu-idr-route-target-orf"/> .</t>
    </section>

    <section title="Route Distribution Process">
      <t>When a route broker receives a route update message from a route
      collection server containing VPN routes for a specific VPN, and if those
      routes in the update message are selected as the best routes, the route
      broker will store them in its local Routing Information Base (RIB) and
      then reflect these routes to its clients associated with that VPN.
      Additionally, the cluster ID of the route broker SHOULD be prepended
      when reflecting the VPN routes.</t>

      <t>When a route broker receives a route update message from one of its
      client containing VPN routes for a specific VPN, and if those routes are
      selected as the best routes, the route broker will store these routes in
      their local RIBs and then reflect these routes to other iBGP peers
      including the corresponding route collection servers and other route
      broker clients associated with the same VPN. Additionally, the cluster
      ID of the route brokers SHOULD be prepended when they share these VPN
      routes.</t>

      <t>When a route broker receives an implicit request for VPN routes
      associated with one or more VPNs (through the advertisement of route
      target membership information) from a route broker client, the route
      broker SHOULD respond by providing the relevant VPN routes stored in its
      local RIB to that client.</t>

      <t>When a route broker receives an implicit route request for all the
      VPN routes for one or more VPNs (through the advertisement of route
      target membership information) from a route collection server, the route
      broker SHOULD respond by providing the relevant VPN routes stored in its
      local RIB which are learnt from its clients to that route collection
      server.</t>
    </section>

    <section title="Deployment Considerations">
      <t>To simplify control over VPN route distribution, each VPN SHOULD be
      assigned a globally unique export route target value.</t>

      <t>In data center SDN environments, the advertisement of multiple paths
      for a given VPN prefix is needed, so virtual PEs SHOULD be assigned
      different route distinguishers (RDs).</t>

      <t>Virtual PEs SHOULD NOT establish BGP sessions with more than one
      cluster of route brokers configured with the same cluster ID.</t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>There is no need for IANA to do any action.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>TBD.</t>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>The authors would like to thank Robert Raszuk for valuable comments
      on this document.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>

      <?rfc include="reference.RFC.8174"?>

      <?rfc include="reference.RFC.4364"?>

      <?rfc include="reference.RFC.4659"?>

      <?rfc include="reference.RFC.4684"?>

      <?rfc include="reference.RFC.5291"?>

      <?rfc include="reference.RFC.7814"?>

      <!---->
    </references>

    <references title="Informative References">
      <?rfc include="reference.I-D.ietf-bess-virtual-pe"?>

      <?rfc include="reference.I-D.xu-idr-route-target-orf"?>

      <!---->
    </references>
  </back>
</rfc>
