<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-xu-idr-bgp-route-broker-04"
     ipr="trust200902">
  <front>
    <title abbrev="BGP Route Broker">BGP Route Broker for Hyperscale
    SDN</title>

    <author fullname="Xiaohu Xu" initials="X." surname="Xu">
      <organization>China Mobile</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>xuxiaohu_ietf@hotmail.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Shraddha Hegde" initials="S." surname="Hegde">
      <organization>Juniper</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>shraddha@juniper.net</email>

        <uri/>
      </address>
    </author>

    <author fullname="Srihari Sangli  " initials="S." surname="Sangli">
      <organization>Juniper</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>ssangli@juniper.net</email>

        <uri/>
      </address>
    </author>

    <author fullname="Shunwan" initials="S." surname="Zhang">
      <organization>Huawei</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>zhuangshunwan@huawei.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Jie" initials="J." surname="Dong">
      <organization>Huawei</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>jie.dong@huawei.com</email>

        <uri/>
      </address>
    </author>

    <date day="25" month="April" year="2024"/>

    <abstract>
      <t>This document describes an optimized BGP route reflector mechanism,
      referred to as a BGP route broker, so as to use BGP-based IP VPN as an
      overlay routing protocol in a scalable way for hyperscale data center
      network virtualization environments, also known as Software-Defined
      Network (SDN) environments.</t>
    </abstract>
  </front>

  <middle>
    <section title="Problem Statement">
      <t>BGP/MPLS IP VPN has been successfully deployed in world-wide service
      provider networks for two decades and therefore it has been proved to be
      scalable enough in large-scale networks. Here, the BGP/MPLS IP VPN means
      both BGP/MPLS IPv4 VPN <xref target="RFC4364"/> and BGP/MPLS IPv6 VPN
      <xref target="RFC4659"/> . In addition, BGP/MPLS IP VPN-based data
      center network virtualization approaches as described in <xref
      target="RFC7814"/>, especially in the virtual PE model as described in
      <xref target="I-D.ietf-bess-virtual-pe"/> have been widely deployed in
      small to medium-sized data centers for network virtualization purpose,
      also known as Software Defined Network (SDN). Examples include but not
      limited to OpenContrail.</t>

      <t>Hyperscale cloud data centers usually have tens of thousands of
      servers, which are virtualized as Virtual Machines (VMs) or containers.
      This means that there are at least tens of thousands of virtual PEs,
      millions of VPNs, and tens of millions of VPN routes from the network
      virtualization perspective, assuming the virtual PE model is used.
      However, this poses a significant challenge on the BGP session capacity
      and the VPN routing table capacity of any given BGP router.</t>

      <t>The route reflection (RR) mechanism is crucial to address BGP scaling
      issues. If a one-level route reflector architecture is used, all the VPN
      routes supported by a data center could be divided among multiple route
      reflectors by preconfiguring each route reflector with a block of route
      targets associated with partial VPNs. This means that any single route
      reflector does not need to maintain all the VPN routes supported by the
      data center. For redundancy, more than one route reflectors should be
      preconfigured with the same block of route targets to form a RR
      cluster.</t>

      <t>If each virtual PE is attached to at least one VPN corresponding to a
      given route reflector, that route reflector would have to establish BGP
      sessions with all virtual PEs, which can create a huge BGP session
      pressure on route reflectors. To solve this scaling issue, another level
      &#65288;i.e, bottom-level) of route reflectors can be introduced between
      the existing level (i.e., top-level) route reflectors and the virtual
      PEs. Each top-level route reflector would establish BGP sessions with
      all bottom-level route reflectors, rather than all virtual PE routers.
      Additionally, bottom-level route reflectors would only need to establish
      BGP sessions with a subset of all virtual PEs respectively. Therefore,
      the above partition mechanism solves the scaling issue of the BGP
      session capacity as mentioned above.</t>

      <t>In a two-level RR hierarchy within hyperscale data centers, using the
      Route Target Constrain (RTC) mechanism <xref target="RFC4684"/> may have
      two drawbacks. Firstly, it can be difficult to partition all the VPN
      routes supported by the data center among multiple top-level RRs.
      Secondly, virtual PEs may have to receive RT membership NLRIs
      corresponding to all route targets supported by the data center, which
      would unnecessarily consume the CPU and RAM resources of virtual
      PEs.</t>

      <section title="Requirements Language">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
        "OPTIONAL" in this document are to be interpreted as described in
        BCP14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only
        when, they appear in all capitals, as shown here.</t>
      </section>
    </section>

    <section title="Solution Overview">
      <t>The bottom-level route reflectors, also known as route brokers, are
      designed based on the high-performance message queuing mechanisms such
      as RabbitMQ. These route brokers maintain the route target membership
      information of their IBGP peers and reflect VPN routes among them on
      demand. Essentially, route brokers function as the message brokers or
      exchanges of the message queuing system. On the other hand, top-level
      route reflectors, known as route collection servers, and virtual PEs,
      known as route broker clients, act as both message publishers or
      producers and subscribers or consumers of the message queuing
      system.</t>
    </section>

    <section title="Route Target Membership Advertisement Process">
      <t>Route collection servers advertise route target membership
      information according to the preconfigured block of route targets on
      each of them. As a result, route brokers know which VPNs are partitioned
      to each of them.</t>

      <t>Route brokers advertise default route target membership information
      to their own route broker client so as to collect VPN routes from their
      clients and then reflect them to route collection servers.</t>

      <t>Route broker clients advertise route target membership information
      according to the block of route targets which are dynamically configured
      on each of them. Upon receiving the above advertisement, route brokers
      would dispatch the received route target memembership information
      towards the corresponding route collection servers whose preconfigured
      block of route target cover the advertised route targets.</t>

      <t>The advertisement of route target membership information is based on
      Route Target Outbound Route Filtering (ORF) as defined in <xref
      target="I-D.xu-idr-route-target-orf"/> .</t>
    </section>

    <section title="Route Distribution Process">
      <t>Upon receiving a route update message from a route collection server
      which contains VPN routes for a given VPN, if those VPN routes contained
      in the route update message are selected as best routes, route brokers
      would store those VPN routes in their local RIBs and then reflect them
      to their route broker clients which are associated with that VPN.
      Meanwhile, the cluster ID of route brokers SHOULD be prepended when
      reflecting the above VPN routes.</t>

      <t>Upon receiving a route update message from a route broker client
      which contains VPN routes for a given VPN, if those VPN routes are
      selected as best routes, route brokers would store those routes in their
      local RIBs and then reflect them to the other iBGP peers (including
      route collection servers and other route broker clients) which are
      associated with that VPN. Meanwhile, the cluster ID of route brokers
      SHOULD be prepended when reflecting the above VPN routes.</t>

      <t>Upon receiving an implicit route request for all the VPN routes for
      one or more VPNs (via the route target membership information
      advertisement) from a route broker client, route brokers SHOULD respond
      with the corresponding VPN routes stored in its local RIBs to that route
      broker.</t>

      <t>Upon receiving an implicit route request for all the VPN routes for
      one or more VPNs (via the route target membership information
      advertisement) from a route collection server, route brokers SHOULD
      respond with the corresponding VPN routes stored in its local RIBs which
      are learnt from their own route broker clients to that route collection
      server.</t>
    </section>

    <section title="Deployment Considerations">
      <t>To simplify the VPN route distribution control, each VPN SHOULD be
      assigned with a globally unique export route target value.</t>

      <t>Since the advertisement of multiple paths for a given VPN prefix is
      needed in the data center SDN environments, virtual PEs SHOULD be
      assigned with different RDs.</t>

      <t>Virtual PEs SHOULD NOT establish BGP session with more than one
      cluster of route brokers which are configured with the same cluster
      ID.</t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>There is no need for IANA to do any action.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>TBD</t>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>The authors would like to thank Robert Raszuk for their valuable
      comments and suggestions on this document.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>

      <?rfc include="reference.RFC.8174"?>

      <?rfc include="reference.RFC.4364"?>

      <?rfc include="reference.RFC.4659"?>

      <?rfc include="reference.RFC.4684"?>

      <?rfc include="reference.RFC.5291"?>

      <?rfc include="reference.RFC.7814"?>

      <!---->
    </references>

    <references title="Informative References">
      <?rfc include="reference.I-D.ietf-bess-virtual-pe"?>

      <?rfc include="reference.I-D.xu-idr-route-target-orf"?>

      <!---->
    </references>
  </back>
</rfc>
