<?xml version="1.0" encoding="US-ASCII"?>
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="no"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc xmlns:xi="http://www.w3.org/2001/XInclude"  category="info" docName="draft-hegde-lsr-ospf-better-idbx-03"
     ipr="trust200902"
     obsoletes="" submissionType="IETF" updates="" xml:lang="en">
    <front>
        <title abbrev="draft-hegde-lsr-ospf-better-idbx">
            Improved OSPF Database Exchange Procedure
        </title>

        <author fullname="Shraddha Hegde" initials="S." surname="Hegde">
            <organization>Juniper</organization>
            <address>
                <postal>
                    <street>
                    </street>
                    <city></city>
                    <region>
                    </region>
                    <code/>
                    <country>India
                    </country>
                </postal>
                <phone/>
                <facsimile/>
                <email>shraddha@juniper.net
                </email>
                <uri/>
            </address>
        </author>
         <author fullname="Tony Przygienda" initials="A." surname="Przygienda">
            <organization>Juniper</organization>
            <address>
                <postal>
                    <street>1133 Innovation Way
                    </street>
                    <city>Sunnyvale</city>
                    <region>CA
                    </region>
                    <code/>
                    <country>USA
                    </country>
                </postal>
                <phone/>
                <facsimile/>
                <email>prz@juniper.net
                </email>
                <uri/>
            </address>
        </author>
         <author fullname="Acee Lindem" initials="A." surname="Lindem">
            <organization>LabN Consulting LLC</organization>
            <address>
                <postal>
                  <street>
                    301 Midenhall Way
                  </street>
                  <city>Cary</city>
                  <region>
                    North Carolina
                  </region>
                  <code/>
                  <country>
                    USA
                  </country>
                </postal>
                <phone/>
                <facsimile/>
                <email>acee.ietf@gmail.com
                </email>
                <uri/>
            </address>
        </author>

        <date year="2024"/>
        <abstract>
            <t>
                 When an OSPF router undergoes restart, previous instances of LSAs belonging to that router
                 may remain in the databases of other routers in the OSPF domain until such LSAs are aged out.
                 Hence, when the restarting
                 router joins the network again, neighboring routers re-establish adjacencies while the
                 restarting router is still bringing-up its interfaces and adjacencies and generates LSAs with
                 sequence numbers that may be lower than the stale LSAs. Such stale LSAs may be interpreted as bi-directional
                 connectivity before the initial database exchanges are finished and genuine bi-directional LSA
                 connectivity exists.
                 Such incorrect interpretation may lead to, among other things, transient traffic packet drops.
                 This document suggests improvements in the OSPF database exchange
                 process to prevent such problems due to stale LSA utilization. The solution does not preclude
                 changes in the existing standard but presents an extension that will prevent this scenario.
            </t>
        </abstract>
        <note title="Requirements Language">
          <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
          NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
          "MAY", and "OPTIONAL" in this document are to be interpreted as
          described in BCP 14 <xref target="RFC2119" format="default"/> <xref target="RFC8174" format="default"/>
          when, and only when, they appear in all capitals, as shown here.</t>
        </note>
    </front>
    <middle>


        <section title="Introduction" anchor='sec_introduction'>
            <t>
               When an OSPF <xref target ='RFC2328'/> router restarts, its stale LSAs are left in the
               database of other  routers in the OSPF domain until the LSAs are aged out either
               intentionally or by the LSA age elapsing. The stale
               Router LSA can contain links to all the neighbors that had Full adjacencies before the router
               restarted.
                 <figure anchor="Topology_1" title="OSPF Network">
                 <artwork>
                            ----------C--------
                            |                 |
                      A-----B                 E-----F
                            |                 |
                            --------D---------


                 </artwork>
                 </figure>
                 <xref target ='Topology_1'/> shows a very simple OSPF network. In case of C undergoing restart that
                 does not generate purges, the other routers
                 in the domain will hold the stale LSA of Router C in their database. The stale LSA may have
                 links to B and E, which represents the topology of C before it went down. When C restarts again,
                 it initiates the database exchange process with B and E. B and E may have C's stale LSA with a higher
                 sequence number in their database than the new ones originated by C and
                 hence erroneously assume those being the newest copy, successively bringing up the adjacency
                 with C, and transitioning to Full state. Based on C's Stale LSA having LSA links to B and E,
                 the Shortest Path First (SPF) back-link check is satisfied and B and E update their routing table to point
                 to C. This may cause C to drop this traffic as C may not have all its previous adjacencies up and
                 all LSAs in place to correctly compute the necessary routes. The situation corrects itself with C
                 reissuing the LSAs with even higher sequence numbers over time so the condition is transient today
                already.
            </t>
        </section>
          <section title="Solution" anchor='sec_solution'>
               <t>
                 To prevent the transient condition described
                   the database exchange procedure from <xref target ='RFC2328'/> section 7.2 is extended with additional
                 constraints to prevent an OSPF router from transitioning to Full state when it has
                 stale LSAs originated by the database exchange neighbor in its Link State Database (LSDB).
               </t>
               <t>
                 During an IDBX the router determined it should become adjacent with another OSPF router and for that
                   purpose
                   initially it enters the
                 ExStart state and creates a "Database summary list" for the neighbor. In addition to this list a
                   "Stale database exchange list"
                 is created for such neighbor and is initially populated with LSAs in LSDB which were originated by it.
                 The neighbor will not transition to Full state until both the Link state request list and the Stale database
                 exchange list are empty. During the Database Exchange process entries are removed from the Stale database
                 exchange list when
v                 the same or a more recent LSAs is referenced in a Database Description packet or the neighbor originates a
                 more recent instance of the LSA, and it is received in a Link State Update packet. For stale LSAs in the
                  router's Link State Database, the neighbor will request stale LSAs and originate more recent
                 instances via normal OSPF procedures as illustrated in <xref target="sec_ex"/>.
               </t>
               <section title="OSPF Specification Additions and Modifications" anchor='sec_additions'>
                 <t>
                   The following additions and modifications to OSPF <xref target="RFC2328"/> are needed for this feature.
                   The application of the feature SHOULD be based on local configuration (refer to
                   <xref target="sec_mgmt"/>).
                 </t>
                 <ol spacing="normal">
                   <li>
                     In section 10.3 "The Neighbor State Machine",
                     State: ExStart, Event: NegotiationDone - The "Stale database exchange list"
                     is created including LSAs with the "Advertising Router" set to the neighbor's Router ID.
                   </li>
                   <li>
                     In section 10.3 "The Neighbor State Machine",
                     State: Exchange, Event: ExchangeDone - The new neighbor state will not be Full
                     unless both the Link state request list and the Stale database exchange list are empty.
                   </li>
                   <li>
                     In section 10.3 "The Neighbor State Machine",
                     State: Exchange or greater, Event: AdjOk? - If an adjacency will not be formed,
                     the Stale database exchange list is also cleaned up. 
                   </li>
                   <li>
                     In section 10.3 "The Neighbor State Machine",
                     State: Any state, Event: KillNbr - The Stale database exchange list
                     is also cleaned up. 
                   </li>
                   <li>
                     In section 10.3 "The Neighbor State Machine",
                     State: Any state, Event: LLDown - The Stale database exchange list
                     is also cleaned up. 
                   </li>
                   <li>
                     In section 10.3 "The Neighbor State Machine",
                     State: Any state, Event: InactivityTimer - The Stale database exchange list is
                     is also cleaned up. 
                   </li>
                   <li>
                     In section 10.3 "The Neighbor State Machine",
                     State: 2Way or greater, Event: 1-WayReceived - The Stale database exchange list is
                     is also cleaned up. 
                   </li>
                   <li>
                     In section 10.6 "Receiving Database Description Packets",
                     If the Database Description packet is accepted, LSAs originated by the neighbor that
                     are not less recent than the local database copy will be removed from the Stale database exchange
                     list. LSAs that are less recent than the local database copy will remain on th State database
                     exchange list and will be updated by the neighbor through the Loading phase of the Database exchange
                     process.
                   </li>
                   <li>
                     In section 13, "The Flooding Procedure", (5), (g): If the orignator of the more recent LSA is
                     a neighbor that is in Exchange or Loading state, remove the LSA from the Stale database
                     exchange list. If the neighbor is in Loading state and both the List state request list and
                     the Stale database exchange list are empty, the Loading Done event is generated.
                     In most cases, the LSA will have been received from the neighbor in Exchange or Loading state.
                     However, this cannot be guaranteed due to packet loss and different propagation delays.
                     Additionally, the neighbor may be in Exchange and Loading state on more than one interface. 
                   </li>
                 </ol>
               </section>
               <section title="Example" anchor='sec_ex'>
                 <t>
                   <xref target ='seq_dgm'/> provides an example of C restarting having originated an LSA with sequence number
                   Y before. After restarting C originates the same LSA with sequence number X where X &lt; Y since
                   it is not aware of existence of version X yet.
                   <figure anchor="seq_dgm" title="Modified Database Exchange Procedure">
                     <artwork>
                             <![CDATA[

                 C-----------------E
                               Create Stale DB Exchange
                                List

      DBD: LSA A origin:C  ------->
          Sequence:X
                              DBD: LSA A Origin:C
                      <------       Sequence:Y
      LSReq LSA A,
      origin C       ----->
                            Modified Procedure

                     <------- LSA  A, Origin C, Seq:Y
      C re-originating self LSA
      LSA A, Origin C, -------> Remove Stale
      Seq:Y+1                   LSA A, Origin C
                                From Stale DB Exchange
                                list.

                                Bring adjacency to Full state if
                                both LS Request list and Stale DB
                                Exchange list are empty.


                         ]]>
                         </artwork>
                         </figure>
                            As shown in figure <xref target ='seq_dgm'/> above, E originates LSReq with Sequence number X but
                            waits until the LSA with sequence number Y+1 (or strictly speaking, an LSA that compares as newer
                            to the one it holds) arrives. As the LSA is still in the Stale DB Exchange List, the adjacency
                            will remain in Loading state and will not move to Full state. All the neighbors of the restarting routers
                            hold the neighbor FSM in Loading state and do not transition to Full state until the stale
                            LSA is replaced with the new LSA that is more recent than the stale LSA. This ensures that other
                            routers in the network do not compute
                            a path through the restarting router since they cannot satisfy the bi-directionality condition
                            in SPF computations.
                 </t>
               </section>
          </section>
          <section title="Potential Optimizations" anchor='sec_optimize'>
            <t>
              The solution described in <xref target="sec_solution"/> assures stale LSAs for a
              restarting router are either updated or purged before neighbors of the restarting
              router advertise a link to the restarting router. However, there are several optimizations
              being considered.
            </t>
            <section title="Restarting Neighbor Tracking" anchor='sec_nbr_track'>
              <t>
                OSPF Routers would maintain a "Stale Neighbor List" containing the router IDs and
                interfaces for neighbor that were previously in Full state but recently went down. The
                procedures in <xref target="sec_solution"/> would then only be applied for neighbors
                on the Stale Neighbor List. Neighbors would be removed from the Stale Neighbor List when
                a new adjacency is formed with the neighbor or after a configured interval. The
                interval SHOULD be longer than the expected duration of an unplanned neighbor restart. 
              </t>
            </section>
            <section title="Limited LSA Tracking" anchor='sec_lsa_track'>
              <t>
                Rather than adding all the restarting router's LSAs in the Link-State Database (LSDB) to the
                Stale DB Exchange list, only the following LSAs would be added:
              </t>
              <ul spacing="normal">
                <li>
                  Router-LSA containing one or more links to the local router 
                </li>
                <li>
                  Network-LSAs containing a link to the local router 
                </li>
              </ul>
              <t>
                The procedures in <xref target="sec_solution"/> will guarantee that the LSAs on the
                Stale DB Exchange list are updated  before the local router advertises a link
                to the restarting router. Given that the these stale LSAs are the only ones containing
                a link to the local router, their update will prevent usage of the link until the
                restarting router has established an adjacency with the local router. As part of
                the restarting router establishing an adjacency and advertising a link to the
                local router, the restarting router will synchronize its LSDB with the local router
                and will update or purge stale
                LSAs previously originated by the restarting router. These stale LSAs will be
                requested by the restarting router using Link State Request packets
                (section 10.9 in <xref target="RFC2328"/>)
                and, when received, updated or purgeed using the procedure specified
                in section 13.4 <xref target="RFC2328"/>.  
              </t>
              <t>
                The restarting router's LSAs containing a link to the local router are added to the
                Stale DB Exchange list and will be
                updated or purged prior to the local router advertising a link to the restarting router.
                Additionally, the restarting router will not advertise a link to the local router
                until it has synchronized its LSDB with the local router. Hence, the restarting
                router's other stale LSAs will be updated or purged prior both routers advertising a
                link and these other stale LSAs need not be added to the Stale DB Exchange list.
              </t>  
          </section> 
          </section> 
          <section title="Restarting Router Data Plane Convergence" anchor='data_plane_converge'>
            <t>
              The procedures in <xref target="sec_solution"/> don't attempt to solve the problem of the restarting
              router's adjacency being used prior to the data plane being updated with the routes associated
              with the adjacency. This problem can occur on platforms with a non-neglible delay being the control plane
              Routing Information Base (RIB) being updated and the platform's data plane Forwarding
              Information Bases (FIBs) being updated. However, this delay is local to the restarting router and
              can be avoided by delaying the advertisements of adjacecies (i.e., as links in the restarting router's
              router-LSA or network-LSAs) can be handled local by delaying these advertisements until the
              data plane has been updated. As long as the restarting router's stale LSAs have been updated or purged
              as described in <xref target="sec_solution"/>, the scope of delaying usage of the adjacency before
              the restarting router's data plane has converged is completely under the local control.
            </t>
          </section> 
          <section anchor="sec_mgmt" title="Management Considerations" toc="default">
            <t>
              Application the Database exchange procedure specified in this documennt SHOULD
              be based on local configuration with the default
              behavior not to perform the addition Stale database exchange list processed specified
              in this document. Not all OSPF networks require elimination of traffic drops during
              the Database exchange process (as evidenced by the fact that OSPF has been deployed
              for several decades without this added packet loss mitigation). Additionally, optional
              configuration will encourage implementation and deployment.
            </t>
          </section> 
          <section anchor="IANA" title="IANA Considerations" toc="default">

            <t>
              No IANA Considerations
            </t>
          </section>

        <section title="Security Considerations">
            <t>
            </t>
        </section>

        <section title="Contributors">
            <t></t>
        </section>

    </middle>
    <back>
        <references title="Informative References">

        </references>
        <references title="Normative References">
            <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml"/>
            <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.2328.xml"/>
            <xi:include href="https://xml2rfc.tools.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml"/>


        </references>

    </back>
</rfc>
