<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
    which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
    There has to be one entity for each item to be referenced. 
    An alternate method (rfc include) is described in the references. -->
<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC2629 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2629.xml">
<!ENTITY RFC3552 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.3552.xml">
<!ENTITY I-D.narten-iana-considerations-rfc2434bis SYSTEM "http://xml.resource.org/public/rfc/bibxml3/reference.I-D.narten-iana-considerations-rfc2434bis.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs), 
    please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
    (Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space 
    (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="std" docName="draft-xu-lsr-ospf-flooding-reduction-in-msdc-05"
     ipr="trust200902">
  <front>
    <title abbrev="">OSPF Flooding Reduction in MSDCs</title>

    <author fullname="Xiaohu Xu" initials="X." surname="Xu">
      <organization>China Mobile</organization>

      <address>
        <email>xuxiaohu_ietf@hotmail.com</email>
      </address>
    </author>

    <author fullname="Luyuan Fang" initials="L. " surname="Fang">
      <organization>eBay</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>luyuanf@gmail.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Jeff Tantsura" initials="J." surname="Tantsura">
      <organization>Nvidia</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>jefftant.ietf@gmail.com</email>

        <uri/>
      </address>
    </author>

    <author fullname="Shaowen Ma" initials="S." surname="Ma">
      <organization>Google</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country/>
        </postal>

        <phone/>

        <facsimile/>

        <email>shaowen@google.com</email>

        <uri/>
      </address>
    </author>

    <!--

-->

    <date day="25" month="July" year="2023"/>

    <abstract>
      <t>OSPF is one of the used underlay routing protocol for MSDC (Massively
      Scalable Data Center) networks. For a given OSPF router within the CLOS
      topology, it would receive multiple copies of exactly the same LSA from
      multiple OSPF neighbors. In addition, two OSPF neighbors may send each
      other the same LSA simultaneously. The unnecessary link-state
      information flooding wastes the precious process resource of OSPF
      routers greatly due to the presence of too many OSPF neighbors for each
      OSPF router within the CLOS topology. This document proposes extensions
      to OSPF so as to reduce the OSPF flooding within such MSDC networks. The
      reduction of the OSPF flooding is much beneficial to improve the
      scalability of MSDC networks. These modifications are applicable to both
      OSPFv2 and OSPFv3.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119">RFC 2119</xref>.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>OSPF is commonly used as an underlay routing protocol for Massively
      Scalable Data Center (MSDC) networks where CLOS is the most popular
      topology. MSDCs are also called Large-Scale Data Centers. </t>

      <t>For a given OSPF router within the CLOS topology, it would receive
      multiple copies of exactly the same LSA from multiple OSPF neighbors. In
      addition, two OSPF neighbors may send each other the same LSA
      simultaneously. The unnecessary link-state information flooding
      significantly wastes the precious process resource of OSPF routers and
      therefore OSPF could not scale very well in MSDC networks. As a result,
      some MSDC operators had to choose BGP as the routing protocol in their
      data centers <xref target="RFC7938"/>. However, with the emergence of
      high-performance Ethernet networks for AI and high performance computing
      (HPC), the visibility of the whole network topology, and even the link
      load information, is crucial for the end-to-end path load-balancing. As
      a result, link-state routing protocols, such as OSPF, would have to be
      reconsidered as the routing protocol for large-scale AI and HPC Ethernet
      networks. Of course, the prerequisite is the scaling issue associated
      with link-state routing protocols as mentioned above could be
      addressed.</t>

      <t>This document describes a pragmatic approach to the above scaling
      issue. The basic idea is as follows: instead of flooding link-state
      information across neighboring OSPF routers with the MSDC network
      fabric, link-state information originated from each OSPF routers would
      be collected to centralized controllers, which in turn reflect the
      collected link-state information to all OSPF routers within the MSDC. As
      shown in Figure 1, all OSPF routers within a MDSC network fabric are
      connected to one or more centralized controllers via a dedicated Local
      Area Network (LAN) , referred to as link-state collection and
      distribution LAN, which is used for link-state information collection
      and distribution purpose. For redundancy, there should be at least two
      link-state collection and distribution LANs.</t>

      <t><figure>
          <artwork align="center"><![CDATA[           +----------+                  +----------+                     
           |Controller|                  |Controller|                     
           +----+-----+                  +-----+----+                     
                |DR                            |BDR                       
                |                              |                          
                |                              |                          
   ---+---------+---+----------+-----------+---+---------+-  LS Collection&Distribution LAN       
      |             |          |           |             |                
      |Non-DR       |Non-DR    |Non-DR     |Non-DR       |Non-DR          
      |             |          |           |             |                
      |         +---+--+       |       +---+--+          |                
      |         |Router|       |       |Router|          |                
      |         *------*-      |      /*---/--*          |                
      |        /     \   --    |    //    /    \         |                
      |        /     \     --  |  //      /    \         |                
      |       /       \      --|//       /      \        |                
      |       /        \      /*-       /        \       |                
      |      /          \   // | --    /         \       |                
      |      /          \ //   |   --  /          \      |                
      |     /           /X     |     --           \      |                
      |     /         //  \    |     / --          \     |                
      |    /        //    \    |     /   --         \    |                
      |    /      //       \   |    /      --       \    |                
      |   /     //          \  |   /         --      \   |                
      |   /   //             \ |  /            --     \  |                
      |  /  //               \ |  /              --   \  |                
    +-+- //*                +\\+-/-+               +---\-++               
    |Router|                |Router|               |Router|               
    +------+                +------+               +------+               

                              Figure 1
]]></artwork>
        </figure></t>

      <t>With the assistance of these controllers which are acting as OSPF
      Designated Router (DR)/Backup Designated Router (BDR) for the link-state
      collection and distribution LAN, OSPF routers within the MSDC network
      don't need to exchange any other types of OSPF packet than the OSPF
      Hello packet among them. As specified in <xref target="RFC2328"/>, these
      Hello packets are used for the purpose of establishing and maintaining
      neighbor relationships and ensuring bidirectional communication between
      OSPF neighbors, and even the DR/BDR election purpose in the case where
      those OSPF routers are connected to a broadcast network. In order to
      obtain the full topology information (i.e., the fully synchronized
      link-state database) of the MSDC's network, these OSPF routers only need
      to exchange the link-state information with the controllers being
      elected as OSPF DR/BDR for the link-state collection and distribution
      LAN instead.</t>

      <t>To further suppress the flooding of multicast OSPF packets originated
      from OSPF routers over the link-state collection and distribution LAN,
      OSPF routers would not send multicast OSPF Hello packets over the
      link-state collection and distribution LAN. Instead, they just wait for
      OSPF Hello packets originated from the controllers being elected as OSPF
      DR/BDR initially. Once OSPF DR/BDR for the link-state collection and
      distribution LAN have been discovered, they start to send OSPF Hello
      packets directly (as unicasts) to OSPF DR/BDR periodically. In addition,
      OSPF routers would send other types of OSPF packets (e.g., Database
      Descriptor packet, Link State Request packet, Link State Update packet,
      Link State Acknowledgment packet) to OSPF DR/BDR for the LINK-STATE
      collection and distribution LAN as unicasts as well. In contrast, the
      controllers being elected as OSPF DR/BDR would send OSPF packets as
      specified in <xref target="RFC2328"/>. As a result, OSPF routers within
      the MSDC would not receive OSPF packets from one another unless these
      OSPF packets are forwarded as unknown unicasts over the LINK-STATE
      collection and distribution LAN. Through these modifications to the
      legacy OSPF router behaviors, the OSPF flooding is greatly reduced,
      which is much beneficial to improve the overall scalability of MSDC
      networks. These modifications specified in this document are applicable
      to both OSPFv2 <xref target="RFC2328"/> and OSPFv3 <xref
      target="RFC5340"/>.</t>

      <t>The mechanism for OSPF refresh and flooding reduction in stable
      topologies as described in <xref target="RFC4136"/> may be considered as
      well.</t>
    </section>

    <section anchor="Abbreviations_Terminology" title="Terminology">
      <t>This memo makes use of the terms defined in <xref
      target="RFC2328"/>.</t>
    </section>

    <section title="Modifications to Legacy OSPF Behaviors ">
      <t/>

      <section title="OSPF Routers as Non-DRs">
        <t>After the exchange of OSPF Hello packets among OSPF routers, the
        OSPF neighbor relationship among them would transition to and remain
        in the 2-WAY state. OSPF routers would originate Router-LSAs and/or
        Network-LSAs accordingly depending upon the link-types. Note that the
        neighbors in the 2-WAY state would be advertised in the Router-LSAs
        and/or Network-LSA. This is slightly different from the legacy OSPF
        router behavior as specified in <xref target="RFC2328"/> where the
        neighbors in the TWO-WAY state would not be advertised. However, these
        self-originated LSAs need not to be exchanged directly among them
        anymore. Instead, these LSAs only need to be sent solely to the
        controllers being elected as OSPF DR/BDR for the LINK-STATE collection
        and distribution LAN.</t>

        <t>To further reduce the flood of multicast OSPF packets over the
        LINK-STATE collection and distribution LAN, OSPF routers SHOULD send
        OSPF packets as unicasts. More specifically, OSPF routers SHOULD send
        unicast OSPF Hello packets periodically to the controllers being
        elected as OSPF DR/BDR. In other words, OSPF routers SHOULD NOT send
        any OSPF Hello packet over the LINK-STATE collection and distribution
        LAN until they have found an OSPF DR/BDR for the LINK-STATE collection
        and distribution LAN. Note that OSPF routers, within the MSDC, SHOULD
        NOT be elected as OSPF DR/BDR for the LINK-STATE collection and
        distribution LAN (This is done by setting the Router Priority of those
        OSPF routers to zero). As a result, OSPF routers would not see each
        other over the LINK-STATE collection and distribution LAN.
        Furthermore, OSPF routers SHOULD send all other types of OSPF packets
        than OSPF Hello packets to the controllers being elected as OSPF
        DR/BDR as unicasts as well.</t>

        <t>To avoid the data traffic from being forwarded across the
        link-state collection and distribution LAN, the cost of all OSPF
        routers' interfaces to the link-state collection and distribution LAN
        SHOULD be set to the maximum value.</t>

        <t>When a given OSPF router lost its connection to the link-state
        collection and distribution LAN, it SHOULD actively establish FULL
        adjacency with all of its OSPF neighbors within the MSDC network. As
        such, it could obtain the full LSDB of the MSDC network while flooding
        its self-originated LSAs to the remaining part of the whole network.
        That's to say, for a given OSPF router within the MSDC network, it
        would not actively establish FULL adjacency with its OSPF neighbor in
        the 2-WAY state by default. However, it SHOULD NOT refuse to establish
        FULL adjacency with a given OSPF neighbors when receiving Database
        Description Packets from that OSPF neighbor.</t>
      </section>

      <section title="Controllers as DR/BDR">
        <t>The controllers being elected as OSPF DR/BDR would send OSPF
        packets as multicasts or unicasts as per <xref target="RFC2328"/>. In
        addition, Link State Acknowledgment packets are RECOMMENDED to be sent
        as unicasts rather than multicasts.</t>
      </section>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>The authors would like to thank Acee Lindem and Mohamed Boucadair for
      their valuable comments and suggestions on this document.</t>

      <!---->
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>TBD.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>TBD.</t>

      <!---->
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include='reference.RFC.2119'?>

      <?rfc include='reference.RFC.2328'?>

      <?rfc include='reference.RFC.5340'?>

      <!---->
    </references>

    <references title="Informative References">
      <?rfc include='reference.RFC.4136'?>

      <?rfc include='reference.RFC.7938'?>

      <!---->
    </references>
  </back>
</rfc>
