<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-xu-idr-bgp-route-broker-03"
     ipr="trust200902">
  <front>
    <title abbrev="BGP Route Broker">BGP Route Broker for Hyperscale
    SDN</title>

    <author fullname="Xiaohu Xu" initials="X." surname="Xu">
      <organization>China Mobile</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>xuxiaohu_ietf@hotmail.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Shraddha Hegde" initials="S." surname="Hegde">
      <organization>Juniper</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>shraddha@juniper.net</email>

        <uri/>
      </address>
    </author>

    <author fullname="Srihari Sangli  " initials="S." surname="Sangli">
      <organization>Juniper</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>ssangli@juniper.net</email>

        <uri/>
      </address>
    </author>

    <author fullname="Shunwan" initials="S." surname="Zhang">
      <organization>Huawei</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>zhuangshunwan@huawei.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Jie" initials="J." surname="Dong">
      <organization>Huawei</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>jie.dong@huawei.com</email>

        <uri/>
      </address>
    </author>

    <date day="16" month="August" year="2023"/>

    <abstract>
      <t>This document describes an optimized BGP route reflector mechanism,
      referred to as a BGP route broker, so as to use BGP-based IP VPN as an
      overlay routing protocol in a scalable way for hyperscale data center
      network virtualization environments, also known as Software-Defined
      Network (SDN) environments.</t>
    </abstract>
  </front>

  <middle>
    <section title="Problem Statement">
      <t>BGP/MPLS IP VPN has been successfully deployed in world-wide service
      provider networks for two decades and therefore it has been proved to be
      scalable enough in large-scale networks. Here, the BGP/MPLS IP VPN means
      both BGP/MPLS IPv4 VPN <xref target="RFC4364"/> and BGP/MPLS IPv6 VPN
      <xref target="RFC4659"/> . In addition, BGP/MPLS IP VPN-based data
      center network virtualization approaches described in <xref
      target="RFC7814"/>, especially in the virtual PE model described in
      <xref target="I-D.ietf-bess-virtual-pe"/> have been widely deployed in
      small to medium-sized data centers for network virtualization purpose,
      also known as Software Defined Network (SDN). Examples include but not
      limited to OpenContrail.</t>

      <t>When it comes to hyperscale cloud data centers typically housing tens
      of thousands of servers which in turn are virtualized as Virtual
      Machines (VMs) or containers, it usually means there would be at least
      tens of thousands of virtual PEs, millions of VPNs and tens of millions
      of VPN routes from the network virtualization perspective provided the
      virtual PE model as mentioned above (a.k.a., a host-based network
      virtualization model) is used. That means a significant challenge on
      both the BGP session capacity and the VPN routing table capacity of any
      given BGP router.</t>

      <t>It&rsquo;s no doubt that the route reflection mechanism should be
      considered in order to address the BGP scaling issues as mentioned
      above. Assume a typical one-level route reflector architecture is used,
      it's straightforward to partition all the VPN routes supported by a data
      center among multiple route reflectors with each route reflector being
      preconfigured with a block of route targets associated with partial
      VPNs. In other words, there is no need for a single route reflector to
      maintain all the VPN routes supported by the data center. For redundancy
      purpose, more than one route reflector SHOULD be preconfigured with the
      same block of route targets so as to form a RR cluster.</t>

      <t>Provided each virtual PE had been attached with at least one VPN
      corresponding to a given route reflector, that particular route
      reflector would have to establish BGP sessions with all virtual PEs, it
      would become a huge BGP session pressure on route reflectors.Now assume
      that another level (bottom-level) of route reflectors is introduced
      between the existing level (top-level) of router reflectors and the
      virtual PEs. Each top-level route reflectors would establish BGP
      sessions with all bottom-level route reflectors rather than all virtual
      PE routers. In addition, bottom-level route reflectors just need to
      establish BGP sessions with a subset of all virtual PEs respectively. As
      a result, the scaling issue of the BGP session capacity is solved
      through the above partition mechanism.</t>

      <t>In the above two-level RR hierarchy within hyperscale data centers,
      deploying the Route Target Constrain (RTC) mechanism as defined in <xref
      target="RFC4684"/> would bring at least the following two drawbacks: 1)
      it's hard to partition all the VPN routes supported by the data center
      among multiple top-level RRs; 2) virtual PEs would have to receive RT
      membership NLRIs corresponding to all of route targets supported by the
      data center, which unnecessarily waste the CPU and RAM resources on
      virtual PEs.</t>

      <section title="Requirements Language">
        <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
        "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
        "OPTIONAL" in this document are to be interpreted as described in
        BCP14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only
        when, they appear in all capitals, as shown here.</t>
      </section>
    </section>

    <section title="Solution Overview">
      <t>By learning from the widely-adopted high-performance message queuing
      mechanisms (e.g., RabbitMQ), the bottom-level route reflectors, referred
      to as route brokers in the following text, work as follows: they just
      need to maintain the route target membership information of their IBGP
      peers and reflect VPN routes among them on demands. In a word, route
      brokers act as the message brokers/exchanges of the message queuing
      system, while top-level route reflectors, referred to as route servers,
      and virtual PEs, referred to as route broker clients, act as both
      message publishers/producers and subscribers/consumers of the message
      queuing system. </t>
    </section>

    <section title="Route Target Membership Advertisement Process">
      <t>Route collection servers advertise route target membership
      information according to the preconfigured block of route targets on
      each of them. As such, route brokers know the VPNs partitioned to each
      of them.</t>

      <t>Route brokers advertise a default route target membership information
      to their own route broker clients so as to collect VPN routes originated
      from their own route broker clients and then reflect them to the
      corresponding route collection servers.</t>

      <t>Route broker clients advertise route target membership information
      according to the block of route targets which are dynamically
      configured. Upon receiving the above advertisement, route brokers would
      dispatch the received route target memembership information towards the
      corresponding route collection servers whose preconfigured block of
      route target cover the advertised route targets.</t>

      <t>The advertisement of route target membership information is built on
      the Route Target Outbound Route Filtering (ORF) as defined in <xref
      target="I-D.xu-idr-route-target-orf"/> .</t>
    </section>

    <section title="Route Distribution Process">
      <t>Upon receiving a route update message from a route collection server
      which contains VPN routes for a given VPN, if those VPN routes contained
      in the route update message are selected as best routes, route brokers
      would store those VPN routes in their local RIBs and then reflect them
      to their route broker clients which are associated with that VPN.
      Meanwhile, the cluster ID of route brokers SHOULD be prepended when
      reflecting the above VPN routes.</t>

      <t>Upon receiving a route update message from a route broker client
      which contains VPN routes for a given VPN, if those VPN routes are
      selected as best routes, route brokers would store those routes in their
      local RIBs and then reflect them to the other iBGP peers (including
      route collection servers and other route broker clients) which are
      associated with that VPN. Meanwhile, the cluster ID of route brokers
      SHOULD be prepended when reflecting the above VPN routes.</t>

      <t>Upon receiving an implicit route request for all the VPN routes for
      one or more VPNs (via the route target membership information
      advertisement) from a route broker client, route brokers SHOULD respond
      with the corresponding VPN routes stored in its local RIBs to that route
      broker.</t>

      <t>Upon receiving an implicit route request for all the VPN routes for
      one or more VPNs (via the route target membership information
      advertisement) from a route collection server, route brokers SHOULD
      respond with the corresponding VPN routes stored in its local RIBs which
      are learnt from their own route broker clients to that route collection
      server.</t>
    </section>

    <section title="Deployment Considerations">
      <t>To simplify the VPN route distribution control, each VPN SHOULD be
      assigned with a globally unique export route target value.</t>

      <t>Since the advertisement of multiple paths for a given VPN prefix is
      needed in the data center SDN environments, virtual PEs SHOULD be
      assigned with different RDs.</t>

      <t>To avoid the VPN routes learnt from a given route collection server
      to another route collection server, route collection servers SHOULD be
      configured with the same cluster ID.</t>

      <t>Virtual PEs SHOULD NOT establish BGP session with more than one
      cluster of route brokers which are configured with the same cluster
      ID.</t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>TBD</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>TBD</t>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>The authors would like to thank Robert Raszuk for their valuable
      comments and suggestions on this document.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>

      <?rfc include="reference.RFC.8174"?>

      <?rfc include="reference.RFC.4364"?>

      <?rfc include="reference.RFC.4659"?>

      <?rfc include="reference.RFC.4684"?>

      <?rfc include="reference.RFC.5291"?>

      <?rfc include="reference.RFC.7814"?>

      <!---->
    </references>

    <references title="Informative References">
      <?rfc include="reference.I-D.ietf-bess-virtual-pe"?>

      <?rfc include="reference.I-D.xu-idr-route-target-orf"?>

      <!---->
    </references>
  </back>
</rfc>
