<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="info" docName="draft-dong-lsvr-bgp-spf-selection-00"
     ipr="trust200902">
  <front>
    <title abbrev="BGP-SPF Selection Rules">Proposed Update to BGP Link-State
    SPF NLRI Selection Rules</title>

    <author fullname="Jie Dong" initials="J." surname="Dong">
      <organization>Huawei Technologies</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country>China</country>
        </postal>

        <email>jie.dong@huawei.com</email>
      </address>
    </author>

    <author fullname="Jinqiang Chen" initials="J." surname="Chen">
      <organization>Huawei Technologies</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country>China</country>
        </postal>

        <email>chenjinqiang@huawei.com</email>
      </address>
    </author>

    <author fullname="Sheng Fang" initials="S." surname="Fang">
      <organization>Huawei Technologies</organization>

      <address>
        <postal>
          <street/>

          <city/>

          <region/>

          <code/>

          <country>China</country>
        </postal>

        <email>fangsheng@huawei.com</email>
      </address>
    </author>

    <date day="23" month="October" year="2023"/>

    <area>Routing Area</area>

    <workgroup>Link State Vector Routing Working Group</workgroup>

    <abstract>
      <t>For network scenarios such as Massively Scaled Data Centers (MSDCs),
      BGP is extended for Link-State (LS) distribution and the Shortest Path
      First (SPF) algorithm based calculation. BGP-LS-SPF leverages the
      mechanisms of both BGP protocol and BGP-LS protocol extensions, with new
      selection rules defined for BGP-LS-SPF NLRI. This document proposes some
      update to the BGP-LS-SPF NLRI selection rules, so as to ensure a
      deterministic selection result. The proposed update can also help to
      mitigate some issues in BGP-LS-SPF route convergence. This document
      updates the NLRI selection rules in I-D.ietf-lsvr-bgp-spf.</t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t>For network scenarios such as Massively Scaled Data Centers (MSDCs),
      BGP is extended for Link-State (LS) distribution and the Shortest Path
      First (SPF) algorithm based calculation. BGP-LS-SPF leverages the
      mechanisms of both BGP protocol and BGP-LS protocol extensions, with new
      selection rules for BGP-LS-SPF NLRI defined in <xref
      target="I-D.ietf-lsvr-bgp-spf"/>. For all BGP-LS-SPF NLRIs, the NLRI
      selection rules are defined as below:</t>

      <t><list style="numbers">
          <t>NLRI originated by directly connected BGP SPF peers are
          preferred.</t>

          <t>The NLRI with the most recent Sequence Number TLV, i.e., highest
          sequence number is selected.</t>

          <t>The NLRI received from the BGP SPF speaker with the numerically
          larger BGP Identifier is preferred.</t>
        </list></t>

      <t>In some cases, these rules may not be enough to provide deterministic
      selection result. And in some failure cases, these rules may cause the
      distribution of the latest link-state information be delayed, which
      would result in delayed route convergence in the network.</t>

      <t>This document firstly describes the network scenarios in which the
      existing NLRI selection rules are considered not enough. Then some
      updates to the BGP-LS-SPF NLRI selection rules are proposed.</t>
    </section>

    <section title="Network Scenarios Which Triggered This Update">
      <t/>

      <section title="Delayed Convergence during Link Failure">
        <t>Section 6.5.2 of <xref target="I-D.ietf-lsvr-bgp-spf"/> describes
        the NLRI advertisement in case of node failures. While in some cases,
        route convergence can be delayed due to the current NLRI selection
        rules.</t>

        <t><figure>
            <artwork align="center"
                     name="Figure 1. Example of Delayed Convergence "><![CDATA[ 
  +-----+         +-----+  link down  +-----+        +-----+
  | R1  +---------+  R2 +------X------+  R3 +--------+ R5  |
  +-----+         +--\--+             +--/--+        +-----+
                      \                 /
  R1-R2: down to up    \               / 
                        \             /
                         \           /
                          \         /
                           \+-----+/
                            |  R4 |
                            +--+--+
                               |
                               |
                               |
                               |
                            +--+--+
                            |  R6 |
                            +-----+
]]></artwork>
          </figure></t>

        <t>As shown in the example in Figure 1, a failure of BGP session
        between R2 and R3 is detected by R3, using either BFD or other
        detection mechanisms. Since R2 cannot distinguish whether it is a node
        failure of R2, or a link failure of R2-R3, in order to avoid
        unnecessary route flaps, according to the description in Section 6.5.2
        of <xref target="I-D.ietf-lsvr-bgp-spf"/>, R3 will hold all the NLRIs
        received from R1 for the period of NLRIImplicitWithdrawalDelay. During
        this period, if the state of link R1-R2 change from down to up, an
        updated link NLRI of R1-R2 with a greater sequence number would be
        originated by R2 and advertised to its neighboring nodes. Due to the
        failure of R2-R3, R3 cannot receive the updated link NLRI directly
        from R2, while R3 can receive the updated link NLRI of R1-R2 with a
        greater sequence number from R4. However, according to the NLRI
        selection rule, R3 would prefer the link NLRI of R1-R2 directly
        received from R2, thus R3 would not consider the link NLRI R1-R2
        received from R4 as the latest one. Consequently, R3 will not use the
        latest link NLRI of R1-R2 for SPF computation, nor it will advertise
        the latest link NLRI of R1-R2 to its neighbors. This would cause
        delayed convergence of the network.</t>
      </section>

      <section title="Unnecessary Redundant Advertisement">
        <t>According to the rules in <xref target="I-D.ietf-lsvr-bgp-spf"/>,
        for the BGP-LS-SPF NLRIs with the same sequence number, the NLRI
        received from the numerically larger BGP ID is preferred. While in
        some cases, this may cause unnecessary redundant advertisement of the
        same NLRI.</t>

        <t><figure>
            <artwork align="center"
                     name="Figure 2. Example of Duplicated Update"><![CDATA[
  +----+  new  +----+         +----+       +----+
  | R6 +-------+ R1 +---------+ R2 +-------+ R5 |
  +----+       +-+--+         +-+--+       +----+
                 |              |
                 |              |
                 |              |
                 |              |
                 |              |
               +-+--+         +-+--+
               | R3 +---------+ R4 |
               +----+         +----+

]]></artwork>
          </figure></t>

        <t>As shown in the example in Figure 2, a new BGP session is
        established between R1 and R6, and R1 advertise the link NLRI of R1-R6
        to its neighboring nodes (R2 and R3). R2 firstly receives the link
        NLRI R1-R6 from R1 directly, and advertise it further to its neighbors
        (R4 and R5). R4 receives the link NLRI of R1-R6 with the same sequence
        number from both R3 and R2, and according to the NLRI selection rules,
        R4 would prefer the NLRI received from R3 according to the rule of
        numerically larger BGP ID, then R4 advertises this link NLRI of R1-R6
        to R2. R2 would also prefer the NLRI received from R4 according to the
        rule of numerically larger BGP ID, and further advertises this link
        NLRI to R5, which is a redundant advertisement of its previous
        advertisement of the same link NLRI.</t>
      </section>

      <section title="Parallal BGP-LS-SPF Peers">
        <t>In some scenarios, BGP single-hop peering model is used between
        directly connected BGP nodes. When two or more parallel links exists
        between the BGP nodes, multiple BGP sessions are established between
        the peering nodes, and each session will be used for the distribution
        of BGP-LS-SPF NLRIs.</t>

        <t><figure>
            <artwork align="center"
                     name="Figure 3. Example of Parallal BGP Peers"><![CDATA[
               parallel BGP sessions

  +----+       +----+         +----+       +----+
  |    |       |    +---------+    |       |    |
  | R3 +-------+ R1 +---------+ R2 +-------+ R4 |
  +----+       +-+--+         +-+--+       +----+
               
]]></artwork>
          </figure></t>

        <t>As shown in the example of Figure 3, there are two parallel links
        between R1 and R2, and a separate BGP session is established on each
        link. Based on the existing BGP-LS-SPF NLRI selection rules, from R2's
        perspective, for the same NLRI with the same sequence number, either
        the route received from peer R1.1, or the route received from peer
        R1.2 may be selected as the best. To facilitate network operation and
        troubleshooting, it is preferable to have a deterministic result of
        NLRI selection once the network enters relative stable state. Thus
        some rules to select the preferred NLRI among parallel peering
        sessions is needed.</t>
      </section>
    </section>

    <section title="Update to BGP-LS-SPF Selection Rules">
      <t>This document proposes to update the selection rules for all
      BGP-LS-SPF NLRI as follows:</t>

      <t><list style="numbers">
          <t>NLRI originated by directly connected BGP SPF peers SHOULD be
          preferred.</t>

          <t>The NLRI with the most recent Sequence Number TLV, i.e., highest
          sequence number SHOULD be selected.</t>

          <t>For NLRIs received from EBGP peers, the NLRI with smaller number
          of AS numbers in the AS_PATH attribute SHOULD be preferred.</t>

          <t>For NLRIs received from IBGP peers, the NLRI with smaller number
          of Cluster IDs in the CLUSTER_LIST attributes SHOULD be
          preferred.</t>

          <t>The NLRI received from the BGP SPF speaker with the numerically
          larger BGP Identifier SHOULD be preferred.</t>

          <t>NLRI received from the BGP SPF peer with the smaller peer address
          SHOULD be preferred.</t>
        </list></t>

      <t>The new rule 3 and 4 is to solve the duplicated advertisement problem
      as described in section 2.2. The new rule 6 is to solve the
      indeterministic selection problem as described in section 2.3.</t>

      <t>For the problem illustrated in Section 2.1, there are several options
      to solve it, the details will be discussed further and documented in a
      future version of this document.</t>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This document makes no request of IANA.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>The mechanism described in this document provide updates to the NLRI
      selection rules for BGP-LS-SPF. It does not introduce any additional
      security considerations than those described in <xref target="RFC4271"/>
      and <xref target="RFC4272"/>.</t>
    </section>

    <section anchor="Acknowledgements" title="Acknowledgements">
      <t>The authors would like to thank Haibo Wang, Jun Ge and Li Zhang for
      the valuable discussion and suggestions.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include='reference.RFC.4271'?>

      <?rfc include='reference.I-D.ietf-lsvr-bgp-spf'?>
    </references>

    <references title="Informative References">
      <?rfc include='reference.RFC.4272'?>
    </references>
  </back>
</rfc>
