<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE rfc [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">
]>
<!-- This template is for creating an Internet Draft using xml2rfc,
     which is available here: http://xml.resource.org. -->
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs), 
     please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
     (Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space 
     (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc xmlns:xi="http://www.w3.org/2001/XInclude" category="std" docName="draft-ietf-idr-long-lived-gr-05" ipr="trust200902" updates="6368" obsoletes="" submissionType="IETF" xml:lang="en" tocInclude="true" tocDepth="4" symRefs="true" sortRefs="true" version="3">
  <!-- xml2rfc v2v3 conversion 3.17.4 -->
  <!-- category values: std, bcp, info, exp, and historic
     ipr values: full3667, noModification3667, noDerivatives3667
     you can add the attributes updates="NNNN" and obsoletes="NNNN" 
     they will automatically be output with "(if approved)" -->

  <!-- ***** FRONT MATTER ***** -->

  <front>
    <!-- The abbreviated title is used in the page header - it is only necessary if the 
         full title is longer than 39 characters -->

    <title abbrev="Long-Lived Graceful Restart">Support for Long-lived BGP Graceful Restart</title>
    <seriesInfo name="Internet-Draft" value="draft-ietf-idr-long-lived-gr-05"/>
    <!-- add 'role="editor"' below for the editors if appropriate -->

    <author fullname="James Uttaro" initials="J." surname="Uttaro">
      <organization>Independent Contributor</organization>
      <address>
        <email>juttaro@ieee.org</email>
      </address>
    </author>
    <author fullname="Enke Chen" initials="E." surname="Chen">
      <organization>Palo Alto Networks</organization>
      <address>
        <email>enchen@paloaltonetworks.com</email>
      </address>
    </author>
    <author fullname="Bruno Decraene" initials="B." surname="Decraene">
      <organization>Orange</organization>
      <address>
        <email>bruno.decraene@orange.com</email>
      </address>
    </author>
    <author fullname="John G. Scudder" initials="J. G." surname="Scudder">
      <organization>Juniper Networks</organization>
      <address>
        <email>jgs@juniper.net</email>
      </address>
    </author>
    <date/>
    <!-- If the month and year are both specified and are the current ones, xml2rfc will fill 
         in the current day for you. If only the current year is specified, xml2rfc will fill 
	 in the current day and month for you. If the year is not the current one, it is 
	 necessary to specify at least a month (xml2rfc assumes day="1" if not specified for the 
	 purpose of calculating the expiry date).  With drafts it is normally sufficient to 
	 specify just the year. -->

    <!-- Meta-data Declarations -->

    <area>General</area>
    <workgroup>Internet Engineering Task Force</workgroup>
    <!-- WG name at the upperleft corner of the doc,
         IETF is fine for individual submissions.  
	 If this element is not present, the default is "Network Working Group",
         which is used by the RFC Editor as a nod to the history of the IETF. -->

    <keyword>template</keyword>
    <!-- Keywords will be incorporated into HTML output
         files in a meta tag but they have no effect on text or nroff
         output. If you submit your draft to the RFC Editor, the
         keywords will be used for the search engine. -->

    <abstract>
      <t>
In this document, we introduce a new BGP capability termed "Long-lived
Graceful Restart Capability" so that stale routes can be retained for a
longer time upon session failure than is provided for by BGP Graceful
Restart (RFC 4724). A well-known BGP community
"LLGR_STALE" is introduced for marking stale routes retained for a
longer time. A second well-known BGP community, "NO_LLGR", is introduced
to mark routes for which these procedures should not be applied.
We also specify that such long-lived stale routes be treated as
the least-preferred, and their advertisements be limited to BGP speakers
that have advertised the new capability. Use of this extension is not
advisable in all cases, and we provide guidelines to help determine if it
is.
      </t>
      <t>
We update RFC 6368 by specifying that the LLGR_STALE community must be 
propagated into, or out of, the path attributes exchanged between PE and
CE.
      </t>
    </abstract>
  </front>
  <middle>
    <section anchor="intro" numbered="true" toc="default">
      <name>Introduction</name>
      <t>
Historically, routing protocols in general, and BGP in particular, have been
designed with a focus on correctness, where a key part of "correctness" is
for each network element's forwarding state to converge toward the current
state of the network as quickly as possible. For this reason, the protocol
was designed to remove state advertised by routers that went down (from a
BGP perspective) as quickly as possible. Over time, this has been relaxed
somewhat, notably by BGP Graceful Restart (GR) <xref target="RFC4724" format="default"/>; 
however, the paradigm
has remained one of attempting to rapidly remove "stale" state from the 
network.
      </t>
      <t>
Over time, two phenomena have arisen that call into question the underlying
assumptions of this paradigm. The first is the widespread adoption of 
tunneled forwarding infrastructures, for example, MPLS. Such infrastructures
eliminate the risk of some types of forwarding loops that can arise in 
hop-by-hop forwarding and thus reduce one of the motivations for strong 
consistency between forwarding elements. The second is the increasing use
of BGP as a transport for data which is less closely associated with packet forwarding
than was originally the case. Examples include the use of BGP for 
autodiscovery (<xref target="RFC4761" format="default">VPLS</xref>) and filter programming 
(<xref target="RFC8955" format="default">FLOWSPEC</xref>). In these cases, 
BGP data takes on a character more akin to configuration than to traditional
routing.
      </t>
      <t>
The observations above motivate a desire to offer network operators the 
ability to choose to retain BGP data for a longer period than has hitherto 
been possible when the BGP control plane fails for some reason. Although
the semantics of BGP Graceful Restart <xref target="RFC4724" format="default"/> are close to those desired,
several gaps exist, most notably in the maximum time for which "stale" information
can be retained -- Graceful Restart imposes a 4095-second upper bound. 
      </t>
      <t>
In this document, we introduce a new BGP capability termed "Long-lived Graceful
Restart Capability" so that stale information can be retained for a longer time 
across a session reset. We also introduce two new BGP well-known communities, "LLGR_STALE",
to mark such information, and "NO_LLGR", to indicate that these procedures should not
be applied to the marked route. Long-lived stale information is to be treated as least-preferred, 
and its advertisement limited to BGP speakers that support the
new capability. Where possible, we reference the semantics of BGP Graceful
Restart <xref target="RFC4724" format="default"/> rather than specifying similar semantics in this document.
      </t>
      <t>
The expected deployment model for this extension is that it will only be invoked
for certain address families. This is discussed in more detail in <xref target="deploy" format="default">
the Deployment Considerations section</xref>. When used, its use may be combined 
with that of traditional Graceful Restart, in which case it is invoked only after
the traditional Graceful Restart interval has elapsed, or it may be invoked
immediately. Apart from the potential to greatly extend the timer, the most
obvious difference between Long-Lived and traditional Graceful Restart is that
in the Long-Lived version, routes are "depreferenced", that is, treated as least-preferred, 
whereas in the traditional version, route preference is not affected.
The design choice to treat Long-Lived Stale routes as least-preferred was informed by
the expectation that they might be retained for a (potentially) almost
unbounded period of time, whereas in the traditional Graceful Restart
case, stale routes are retained for only a brief interval. In the Graceful Restart case, the 
tradeoff between advertising new route status (at the cost of routing churn) 
and not advertising it (at the cost of suboptimal or incorrect route selection)
is resolved in favor of not advertising. In the LLGR case, it is resolved in 
favor of advertising new state, and using stale information only as a last resort.
      </t>
      <t>
<xref target="examples" format="default"/> provides some simple examples illustrating the 
operation of this extension.
      </t>
      <section numbered="true" toc="default">
        <name>Requirements Language</name>
        <t>
   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
   "OPTIONAL" in this document are to be interpreted as described in
   <xref target="RFC2119" format="default">BCP 14</xref> <xref target="RFC8174" format="default"/> when,
   and only when, they appear in all capitals, as shown here.
        </t>
      </section>
    </section>
    <section numbered="true" toc="default">
      <name>Definitions</name>
      <dl newline="false" spacing="normal" indent="2">
        <dt>CE:</dt>
        <dd>
A Customer Edge router. <xref target="RFC4364" format="default"/>
        </dd>
        <dt>Depreference, Depreferenced:</dt>
        <dd>
A route is said to be depreferenced if
it has its route selection preference reduced in reaction to some
event.
	</dd>
        <dt>EoR:</dt>
        <dd>
Marker for End-of-RIB, defined in <xref target="RFC4724" format="default"/> Section 2.
	</dd>
        <dt>GR:</dt>
        <dd>
Abbreviation for "Graceful Restart" <xref target="RFC4724" format="default"/>, also sometimes
referred to herein as "conventional Graceful Restart" or
"conventional GR" to distinguish it from the "Long-lived Graceful
Restart" defined by this document.
	</dd>
        <dt>Helper:</dt>
        <dd>
Or "helper router". During Graceful Restart or Long-lived
Graceful Restart, the router that detects a session failure and
applies the listed procedures. <xref target="RFC4724" format="default"/> refers to this as the
"receiving speaker".
	</dd>
        <dt>LLGR:</dt>
        <dd>
Abbreviation for "Long-lived Graceful Restart".
	</dd>
        <dt>LLST:</dt>
        <dd>
Abbreviation for "Long-lived Stale Time".
	</dd>
        <dt>PE:</dt>
        <dd>
A Provider Edge router. <xref target="RFC4364" format="default"/>
        </dd>
        <dt>Route:</dt>
        <dd>
We use "route" to mean any information encoded as a BGP NLRI and
set of path attributes. As discussed above, the connection between such
routes and the installation of forwarding state may be quite remote.
 	</dd>
        <dt>VRF:</dt>
        <dd>
 VPN Routing and Forwarding table. <xref target="RFC4364" format="default"/>
        </dd>
      </dl>
    </section>
    <section numbered="true" toc="default">
      <name>Protocol Extensions</name>
      <t>
A new BGP capability and two new BGP communities are introduced.
      </t>
      <section anchor="llgr_cap" numbered="true" toc="default">
        <name>Long-lived Graceful Restart Capability</name>
        <t>
The "Long-lived Graceful Restart Capability", or "LLGR Capability"
(value: 71) is a BGP capability <xref target="RFC5492" format="default"/>
that can be used by a BGP speaker to indicate its ability to preserve
its state according to the procedures of this document. This
capability MUST be advertised in conjunction with the Graceful Restart
capability <xref target="RFC4724" format="default"/>, see <xref target="use_of_gr" format="default">the "Use of Graceful Restart Capability"
section</xref>.
        </t>
        <t>
The capability value consists of zero or more tuples &lt;AFI, SAFI,
Flags, Long-lived Stale Time&gt; as follows:
        </t>
        <artwork align="left" name="" type="" alt=""><![CDATA[
      +--------------------------------------------------+
      | Address Family Identifier (16 bits)              |
      +--------------------------------------------------+
      | Subsequent Address Family Identifier (8 bits)    |
      +--------------------------------------------------+
      | Flags for Address Family (8 bits)                |
      +--------------------------------------------------+
      | Long-lived Stale Time (24 bits)                  |
      +--------------------------------------------------+
      | ...                                              |
      +--------------------------------------------------+
      | Address Family Identifier (16 bits)              |
      +--------------------------------------------------+
      | Subsequent Address Family Identifier (8 bits)    |
      +--------------------------------------------------+
      | Flags for Address Family (8 bits)                |
      +--------------------------------------------------+
      | Long-lived Stale Time (24 bits)                  |
      +--------------------------------------------------+
	]]></artwork>
        <t>
The meaning of the fields are as follows:
</t>
        <ul empty="true" spacing="normal">
          <li>
      Address Family Identifier (AFI), Subsequent Address Family
         Identifier (SAFI):

</li>
          <li>
            <ul empty="true" spacing="compact">
              <li>
      The AFI and SAFI, taken in combination, indicate that the BGP
      speaker has the ability to preserve its forwarding state for
      the address family during a subsequent BGP restart. Routes may
      be explicitly associated with a particular AFI and SAFI using
      the encoding of <xref target="RFC4760" format="default"/> or implicitly associated with
      &lt;AFI=IPv4, SAFI=Unicast&gt; if using the encoding of <xref target="RFC4271" format="default"/>.
</li>
            </ul>
          </li>
          <li>
      Flags for Address Family:

</li>
          <li>
            <ul empty="true" spacing="compact">
              <li>
         This field contains bit flags relating to routes that were
         advertised with the given AFI and SAFI.
</li>
            </ul>
          </li>
        </ul>
        <artwork name="" type="" align="left" alt=""><![CDATA[
             0 1 2 3 4 5 6 7
            +-+-+-+-+-+-+-+-+
            |F|   Reserved  |
            +-+-+-+-+-+-+-+-+
]]></artwork>
        <ul empty="true" spacing="compact">
          <li>
            <ul empty="true" spacing="compact">
              <li>
      The most significant bit is used to indicate whether the
      state for routes that were advertised with the given AFI and
      SAFI has indeed been preserved during the previous BGP restart. 
      When set (value 1), the bit indicates that the state has been
      preserved. This bit is called the "F bit" since it was
      historically used to indicate the preservation of Forwarding State. 
      Use of the F bit is detailed in the <xref target="session_resets" format="default">
      Session Resets section</xref>.
</li>
              <li>
      The remaining bits are reserved and MUST be set to zero by the
      sender and ignored by the receiver.
</li>
            </ul>
          </li>
          <li>
Long-lived Stale Time:

</li>
          <li>
            <ul empty="true" spacing="compact">
              <li>
      This time (in seconds) specifies how long stale information
      (for this AFI/SAFI) may be retained by the receiver (in addition
      with the period specified by the "Restart Time" in the
      Graceful Restart Capability). Because the potential use cases for
      this extension vary widely, there is no suggested default
      value for the LLST.
</li>
            </ul>
          </li>
        </ul>
      </section>
      <section numbered="true" toc="default">
        <name>LLGR_STALE Community</name>
        <t>
The well-known BGP community <xref target="RFC1997" format="default"/>
"LLGR_STALE" (value: 0xFFFF0006) can be used to mark stale routes
retained for a longer period of time. Such long-lived stale routes are to
be handled according to the procedures specified in the <xref target="operation" format="default">Operation section</xref>.
</t>
        <t>
An implementation MAY allow users to configure policies that accept,
reject, or modify routes based on the presence or absence of this community. 
</t>
      </section>
      <section numbered="true" toc="default">
        <name>NO_LLGR Community</name>
        <t>
The well-known BGP community "NO_LLGR" (value: 0xFFFF0007) can be
used to mark routes that a BGP speaker does not want to be treated according to 
these procedures, as detailed in <xref target="operation" format="default">the Operation 
section</xref>.
</t>
        <t>
An implementation MAY allow users to configure policies that accept,
reject, or modify routes based on the presence or absence of this community. 
</t>
      </section>
    </section>
    <section anchor="operation" numbered="true" toc="default">
      <name>Theory of Operation</name>
      <t>
If A BGP speaker is configured to support the procedures of this
document, it MUST use <xref target="RFC5492" format="default">BGP Capabilities
Advertisement</xref> to advertise the "Long-lived Graceful Restart
Capability". The setting of the parameters for an AFI/SAFI depends on
the properties of the BGP speaker, network scale, and local
configuration.
      </t>
      <t>
In the presence of the "Long-lived Graceful Restart Capability", the
procedures specified in <xref target="RFC4724" format="default"/> and <xref target="RFC8538" format="default"/> continue to apply
unless explicitly revised by this document.
      </t>
      <section anchor="use_of_gr" numbered="true" toc="default">
        <name>Use of Graceful Restart Capability</name>
        <t>
The Graceful Restart capability MUST be advertised in conjunction 
with the LLGR capability. If it is not so advertised, the LLGR 
capability MUST be disregarded. The purpose for mandating that both
be used in conjunction is to enable the reuse of certain base mechanisms
that are common to both "flavors", notably origination, collection,
and processing of EoR, as well as the finite state machine modifications
and connection reset logic introduced by GR.
        </t>
        <t>
We observe that if support for conventional Graceful Restart is not desired
for the session, the conventional GR phase can be skipped by omitting all
AFI/SAFI from the GR capability, advertising a Restart Time of zero, or
both. The <xref target="session_resets" format="default">Session Resets section</xref>
discusses the interaction of conventional and long-lived GR.      
        </t>
      </section>
      <section anchor="session_resets" numbered="true" toc="default">
        <name>Session Resets</name>
        <t>
<xref target="RFC4724" format="default">BGP Graceful Restart</xref>, updated by
<xref target="RFC8538" format="default"/>, defines conditions
under which a BGP session can reset and have its associated routes
retained. If such a reset occurs for a session for which the LLGR 
Capability has also been exchanged, the following procedures apply.
        </t>
        <t>
If the Graceful Restart Capability that was received does not list all
AFI/SAFI supported by the session, then for those non-listed AFI/SAFI the
GR "Restart Time" shall be deemed zero. Similarly, if the received LLGR
Capability does not list all AFI/SAFI supported by the session, then for
those non-listed AFI/SAFI the "Long-lived Stale Time" shall be deemed zero.
        </t>
        <t>
The following text in <xref target="RFC4724" format="default">Section 4.2 of the GR
specification</xref> no longer applies:
        </t>
        <ul empty="true" spacing="normal">
          <li>
   If the session does not get re-established within the "Restart
   Time" that the peer advertised previously, the Receiving Speaker
   MUST delete all the stale routes from the peer that it is
   retaining.
      	</li>
        </ul>
        <t>
and the following procedures are specified instead:
        </t>
        <t>
After the session goes down, and before the session is re-established,
the stale routes for an AFI/SAFI MUST be retained. The interval for
which they are retained is
limited by the sum of the "Restart Time" in the received Graceful Restart Capability
and the "Long-lived Stale Time" in the received Long-lived Graceful
Restart Capability. These timers MAY be modified by local configuration.
        </t>
        <t>
If the value of the "Restart Time" or the "Long-lived Stale Time" is zero,
the duration of the corresponding period would be zero seconds. For
example, if the "Restart Time" is zero and the "Long-lived Stale Time" is
nonzero, only the procedures particular to LLGR would apply. Conversely, if
the "Long-lived Stale Time" is zero and the "Restart Time" is nonzero, only
the procedures of GR would apply. If both are zero, none of these procedures
would apply, only those of the base BGP specification (although EoR would
still be used as detailed in <xref target="RFC4724" format="default"/>). And finally, if both
are nonzero, then the procedures would be applied serially -- first those of 
GR, then those of LLGR. We observe that during the first interval, while the 
procedures of GR are in effect, route preference would not be affected. 
During the second interval, while LLGR procedures are in effect, routes
would be treated as least-preferred as specified elsewhere in this document.
        </t>
        <t>
Once the "Restart Time" period ends (including the case that the "Restart
Time" is zero), the LLGR period is said to have begun and the following 
procedures MUST be performed:
        </t>
        <ul spacing="normal">
          <li>
     For each AFI/SAFI for which it has received a nonzero "Long-lived
     Stale Time", the helper router MUST start a timer for that
     "Long-lived Stale Time". If the timer for the "Long-lived Stale
     Time" for a given AFI/SAFI expires before the session is
     re-established, the helper MUST delete all stale routes of that
     AFI/SAFI from the neighbor that it is retaining. 
      	</li>
          <li>
     The helper router MUST attach the LLGR_STALE community
     to the stale routes being retained. Note that this requirement
     implies that the routes would need to be readvertised, to 
     disseminate the modified community.
      	</li>
          <li>
     If any of the routes from the peer have been marked with 
     the NO_LLGR community, either as sent by the peer,
     or as the result of a configured policy, they
     MUST NOT be retained, but MUST be removed as per the
     normal operation of <xref target="RFC4271" format="default"/>.
	</li>
          <li>
     The helper router MUST perform the procedures listed under
     <xref target="processing_lls" format="default"/>.
        </li>
        </ul>
        <t>
Once the session is re-established, the procedures specified in <xref target="RFC4724" format="default"/>
apply for the stale routes irrespective of whether the stale routes are
retained during the "Restart Time" period or the "Long-lived Stale Time" period.
However, in the case of consecutive restarts, the previously marked stale routes MUST NOT
be deleted before the timer for the "Long-lived Stale Time" expires. 
        </t>
        <t>
Similarly to <xref target="RFC4724" format="default"/>, once the session is re-established, if the F bit
for a specific address family is not set in the newly received LLGR
Capability, or if a specific address family is not included in the newly
received LLGR Capability, or if the LLGR and accompanying GR Capability are
not received in the re-established session at all, then the Helper MUST
immediately remove all the stale routes from the peer that it is retaining
for that address family.
        </t>
        <t>
If a "Long-lived Stale Time" timer is running for routes with a given
AFI/SAFI received from a peer, it MUST NOT be updated (other than by
manual operator intervention) until the peer has established and
synchronized a new session. The session is termed "synchronized" for a
given AFI/SAFI once the EoR for that AFI/SAFI has been received from the
peer, or once the Selection_Deferral_Timer discussed in <xref target="RFC4724" format="default"/> expires. 
        </t>
        <t>
The value of a "Long-lived Stale Time" in the capability received
from a neighbor MAY be reduced by local configuration.
        </t>
        <t>
While the session is down, the expiration of a "Long-lived Stale Time"
timer is treated analogously to the expiration of the "Restart Time"
timer in Graceful Restart, other than applying only to the AFI/SAFI it
accompanies. However, the timer continues to run once the session has
re-established. The timer is neither stopped nor updated until EoR is
received for the relevant AFI/SAFI from the peer. If the timer expires
during synchronization with the peer, any stale routes that the peer has
not refreshed, are removed. If the session subsequently resets prior to
becoming synchronized, any remaining routes (for the AFI/SAFI whose LLST
timer expired) MUST be removed immediately. 
        </t>
      </section>
      <section anchor="processing_lls" numbered="true" toc="default">
        <name>Processing LLGR_STALE Routes</name>
        <t>
A BGP speaker that has advertised the "Long-lived Graceful Restart
Capability" to a neighbor MUST perform the following upon
receiving a route from that neighbor with the "LLGR_STALE" community,
or upon attaching the "LLGR_STALE" community itself per 
<xref target="session_resets" format="default"/>:
        </t>
        <ul spacing="normal">
          <li>
     Treat the route as the least-preferred in route selection (see below).
     See the <xref target="depref" format="default">Risks of Depreferencing Routes
     section</xref> for a discussion of potential risks inherent in doing
     this.
          </li>
          <li>
     The route SHOULD NOT be advertised to any neighbor from which the
     Long-lived Graceful Restart Capability has not been received. The
     exception is described in the <xref target="partial_deploy" format="default">Optional
     Partial Deployment Procedure section</xref>. Note that this
     requirement implies that such routes should be withdrawn from any such
     neighbor.
          </li>
          <li>
     The "LLGR_STALE" community MUST NOT be removed when 
     the route is further advertised.
          </li>
        </ul>
      </section>
      <section numbered="true" toc="default">
        <name>Route Selection</name>
        <t>
A "least-preferred" route MUST be treated as less preferred than any other route that
is not also least-preferred. When performing route selection between two routes both
of which are least-preferred, normal tie-breaking applies. Note that this
would only be expected to happen if the only routes available for selection
were least-preferred -- in all other cases, such routes would have been
eliminated from consideration.
        </t>
      </section>
      <!--      
      <section title="Multicast VPN">
        <t>
  If LLGR is being used in a network that carries Multicast VPN (MVPN)
  traffic (<xref target="RFC6513"></xref>,<xref target="RFC6514"></xref>), 
  special considerations apply.
        </t>
        <t>
  <xref target="RFC6513"></xref> defines the notion of the "Upstream PE" and the "Upstream
  Multicast Hop" (UMH) for a particular multicast flow.  To determine the
  Upstream PE and/or the UMH for a particular flow, a particular set of
  comparable BGP routes (the "UMH-eligible" routes for that flow, as
  defined in <xref target="RFC6513"></xref>) is considered, and the "best" one (according to the
  BGP bestpath selection algorithm) is chosen.  The UMH-eligible routes are
  routes with AFI/SAFI 1/1, 1/2, 2/1, or 2/2.  When a router detects a
  change in the Upstream PE or UMH for a given flow, the router may modify
  its data plane state for that flow.  For example, the router may begin to
  discard any packets of the flow that it believes have arrived from the
  previously chosen Upstream PE or UMH.  The assumption is that the newly
  chosen Upstream PE and/or UMH will make the corresponding changes, if
  necessary, to their own data plane states.  In addition, if a router
  detects a change in the Uptream PE or UMH for a given flow, it may
  originate or readvertise (with different attributes) certain of the BGP
  MCAST-VPN routes (routes with SAFI 5) that are defined in <xref target="RFC6514"></xref>.  The
  assumption is that the MCAST-VPN routes will be properly distributed by
  BGP to other routers that have data plane states for the given flow, i.e.,
  that BGP will converge so that all routers handle the flow in a
  consistent manner.
        </t>
        <t>
  However, if detection of a change to the Upstream PE or UMH is based
  entirely on stale routes, one cannot assume that BGP will converge;
  rather one must assume that the UMH-eligible routes and the MCAST-VPN
  routes are not being properly distributed.  Since the purpose of the LLGR
  procedures is to try to keep the data flowing (by "freezing" the data
  plane states) when the control plane updates are not being properly
  distributed, it does not seem appropriate to react to changes that are
  based entirely on stale routes. Therefore, the following rules MUST be
  applied when a router is computing or recomputing the Upstream PE and/or
  the UMH for a given multicast flow:
        </t>
        <t><list style="symbols">
          <t>
    STALE routes (i.e., UMH-eligible routes with the LLGR_STALE attribute)
    are less preferable than non-STALE routes.
          </t>
          <t>
    If all the UMH-eligible routes for a given flow are STALE, then the
    Upstream PE and/or UMH for that flow is considered to be "stale".
          </t>
          <t>
    If the Upstream PE or UMH for a given multicast flow has already been
    determined, and the result of a new computation yields a new Upstream
    PE or UMH, but the Upstream PE or UMH is "stale" (as defined just
    above), then the Upstream PE and/or UMH for that flow MUST be left
    unchanged.
          </t>
          <t>
    If the Upstream PE or UMH for a given multicast flow has not already
    been determined, but is now determined to be STALE, the multicast flow
    is considered to have no reachable Upstream PE and/or UMH.
          </t>
        </list></t>
        <t>
  <xref target="RFC6514"></xref> also defines a set of route types with SAFI 5 ("MCAST-VPN"
  routes).  LLGR can be applied to MCAST-VPN routes.  However, the
  following MCAST-VPN route types require special procedures, as specified
  in this section:
        </t>
        <t><list style="symbols">
<?rfc subcompact="yes" ?>
          <t>
     Leaf A-D routes
          </t>
          <t>
     C-multicast Shared Tree Join routes
          </t>
          <t>
     C-multicast Source Tree Join routes
          </t>
        </list></t>
<?rfc subcompact="no" ?>
    <t>
  Routes of these three types are always "targeted" to a particular
  upstream router.  Depending on the situation, the targeted router
  may be the Upstream PE for a given flow or the UMH for a given
  flow. Alternatively, the targeted router may be determined by
  choosing the "best" route (according to the BGP bestpath
  algorithm) from among a set of comparable Intra-AS I-PMSI A-D
  routes, or from among a set of comparable Inter-AS I-PMSI A-D
  routes, or from among a set of comparable S-PMSI A-D routes.  (See
  <xref target="RFC6513"/>, <xref target="RFC6514"/>, <xref
  target="RFC6625"/>, and <xref target="RFC7524"/> for details.)  Once the
  target is chosen, it is identified in an IPv4-address-specific
  Route Target (RT) or an IPv6-address-specific RT that is attached
  to the route before the route is advertised.  If the target for
  one of these routes changes, the value of the attached RT will
  also change.  This in turn may cause the route to be advertised,
  readvertised, or withdrawn on specific BGP sessions.
    </t>
    <t>
  For cases where the targeted router is the Upstream PE or the UMH for a
  particular flow, the rules given previously in this section apply.  For
  example, if a Leaf A-D route is targeted to a flow's UMH, and all the
  relevant UMH-eligible routes are stale, the UMH is left unchanged.  Thus
  the Leaf A-D route is not readvertised with a new RT.
    </t>
    <t>
  In those cases where the targeted router for a given Leaf A-D route is
  selected by comparing a set of S-PMSI A-D routes, or where the targeted
  router for a given C-multicast Shared or Source Tree Join route is
  selected by comparing a set of Inter-AS I-PMSI A-D routes, the following
  rules MUST be applied:
    </t>
    <t><list style="symbols">
      <t>
    STALE routes (i.e., "I/S-PMSI A-D routes" with the LLGR_STALE attribute)
    are less preferable than non-STALE routes.
      </t>
      <t>
    If all the routes being considered are STALE, then the targeted router
    of the Leaf A-D route or C-multicast Shared or Source Tree Join route
    MUST NOT be changed.
      </t>
    </list></t>
    <t>
  This prevents a Leaf A-D route or C-multicast route from being targeted
  to a particular router if the relevant I/S-PMSI A-D routes from that
  router are stale.  Since those routes are stale, it is likely that the
  Leaf A-D route or C-multicast route would not make it to the targeted
  router, in which case it is better to maintain the existing data plane
  states than to make changes that presuppose that the MCAST-VPN routes
  will be properly distributed.
    </t>
      </section>
-->

      <section numbered="true" toc="default">
        <name>Errors</name>
        <t>
If the LLGR capability is received without an accompanying GR capability,
the LLGR capability MUST be ignored, that is, the implementation MUST behave
as though no LLGR capability had been received.
        </t>
      </section>
      <section anchor="partial_deploy" numbered="true" toc="default">
        <name>Optional Partial Deployment Procedure</name>
        <t>
Ideally, all routers in an Autonomous System would support this
specification before it was enabled. However, to facilitate incremental
deployment, stale routes MAY be advertised to neighbors that have not
advertised the Long-lived Graceful Restart Capability under the following
conditions:
        </t>
        <ul spacing="normal">
          <li>
     The neighbors MUST be internal (IBGP or Confederation) 
     neighbors.
	  </li>
          <li>   
     The NO_EXPORT community <xref target="RFC1997" format="default"/> MUST be attached to the stale 
     routes.
	  </li>
          <li>     
     The stale routes MUST have their LOCAL_PREF set to zero. See the <xref target="depref" format="default">Risks of Depreferencing Routes section</xref> for a
     discussion of potential risks inherent in doing this.
	  </li>
        </ul>
        <t>
If this strategy for partial deployment is used, the network operator should
set LOCAL_PREF to zero for all long-lived stale routes throughout the Autonomous System.
This trades off a small reduction in flexibility (ordering may not be
preserved between competing long-lived stale routes) for consistency between routers
that do, and do not, support this specification. Since the consistency of route
selection can be important for preventing forwarding loops, the latter
consideration dominates.
        </t>
      </section>
      <section anchor="pe_ce" numbered="true" toc="default">
        <name>Procedures when BGP is the PE-CE Protocol in a VPN</name>
        <section numbered="true" toc="default">
          <name>Procedures when EBGP is the PE-CE Protocol in a VPN</name>
          <t>
In VPN deployments, for example <xref target="RFC4364" format="default"/>, EBGP is
often used as a PE-CE protocol. It may be a practical necessity in such
deployments to accommodate interoperation with peer routers that cannot easily be
upgraded to support specifications such as this one. This leads to a
problem: in this specification, we take pains to ensure that "stale"
routing information will not leak beyond the perimeter of routers that
support these procedures so that it can be depreferenced as expected, and
we provide <xref target="partial_deploy" format="default">a workaround</xref> for the case
where one or more IBGP routers are not upgraded. However, in the VPN PE-CE
case, the protocol in use is EBGP, and our workaround does not work since
it relies on the use of LOCAL_PREF, an IBGP-only path attribute.
          </t>
          <t>
We observe that the principal motivation for restricting the propagation of
"stale" routing information is the desire to prevent it from spreading
without limit once it exits the "safe" perimeter. We further observe that
VPN deployments are typically topologically constrained, making this
concern moot. For this reason, an implementation MAY advertise stale routes
over a PE-CE session, when explicitly configured to do so. That is, the
second rule listed in <xref target="processing_lls" format="default"/> MAY be
disregarded in such cases. All other rules continue to apply. Finally,
if this exception is used, the implementation SHOULD by default attach
the NO_EXPORT community to the routes in question, as an additional 
protection against stale routes spreading without limit. Attachment of
the NO_EXPORT community MAY be disabled by explicit configuration, to 
accommodate exceptional cases.
          </t>
          <t>
See further discussion of using explicitly configured policy to
mitigate this issue in <xref target="deploy_pe_ce" format="default"/>.
          </t>
        </section>
        <section numbered="true" toc="default">
          <name>Procedures when IBGP is the PE-CE Protocol in a VPN</name>
          <t>
If IBGP is used as the PE-CE protocol, following the procedures of
<xref target="RFC6368" format="default"/>, then when a PE router imports a VPN
route that contains the ATTR_SET attribute into a destination VRF and
subsequently advertises that route to a CE router,  
          </t>
          <ul spacing="normal">
            <li>
If the CE router does support the procedures of this document (in
other words, if the CE router has advertised the LLGR Capability): In
addition to including in the advertised route the path attributes
derived from the ATTR_SET as per <xref target="RFC6368" format="default"/>, the
PE router MUST also include the LLGR_STALE community if it is present
in the path attributes of the imported route, even if it is not
present in the ATTR_SET attribute.
	</li>
            <li>
If the CE router does not support the
procedures of this document, then the optional procedures of <xref target="partial_deploy" format="default"/> MAY be followed, attaching the NO_EXPORT
community and setting the value of LOCAL_PREF to zero, overriding the
value found in the ATTR_SET.
	</li>
          </ul>
          <t>
Similarly, when a PE router receives a route from a CE into its VRF
and subsequently exports that route to a VPN address family, 
          </t>
          <ul spacing="normal">
            <li>
If the PE router does support the procedures of this document (in
other words, if the PE router has advertised the LLGR Capability):
In addition to including in the VPN route the ATTR_SET derived from
the path attributes as per <xref target="RFC6368" format="default"/>, the PE router
MUST also include the LLGR_STALE community in the VPN route if it is
present in the path attributes of the route as received from the CE.
	</li>
            <li>
If the PE router does not support the procedures of this document,
there exists no ideal solution. The CE could advertise a route with
LLGR_STALE, with the understanding that the LLGR_STALE marking will
only be honored by the provider network if appropriate policy
configuration exists on the PE (see <xref target="deploy_pe_ce" format="default"/>).
It is at least guaranteed that LLGR_STALE will be propagated when
the route is propagated beyond the provider network. Or, the CE
could refrain from advertising the LLGR_STALE route to the incapable
PE. 
	</li>
          </ul>
        </section>
      </section>
    </section>
    <section anchor="deploy" numbered="true" toc="default">
      <name>Deployment Considerations</name>
      <t>
The deployment considerations discussed in <xref target="RFC4724" format="default"/> apply to this
document. In addition, network operators are cautioned to carefully
consider the potential disadvantages of deploying these procedures for a
given AFI/SAFI. Most notably, if used for an AFI/SAFI that conveys
traditional reachability information, the use of a long-lived stale route could
result in a loss of connectivity for the covered prefix. This specification
takes pains to mitigate this risk where possible, by making such routes
least-preferred and by restricting the scope of such routes to routers that
support these procedures (or, optionally, a single Autonomous System, see
<xref target="partial_deploy" format="default">"Optional Partial Deployment Procedure"</xref>). 
However, according to the
normal rules of IP forwarding a stale more-specific route, that has no
non-stale alternate paths available, will still be used instead of a
non-stale less-specific route. Networks in which the deployment of these procedures
would be especially concerning include those which do not use "tunneled"
forwarding (in other words, those using traditional hop-by-hop forwarding).
      </t>
      <t>
Implementations MUST NOT enable these procedures by default. They MUST
require affirmative configuration per AFI/SAFI in order to enable them.
      </t>
      <t>
The procedures of this document do not alter the route resolvability
requirement of Section 9.1.2.1 of <xref target="RFC4271" format="default"/>. Because of this, it will commonly be
the case that "stale" IBGP routes will only continue to be used if the
router depicted in the next hop remains resolvable, even if its BGP
component is down. Details of IGP
fault-tolerance strategies are beyond the scope of this document. In
addition to the foregoing, it may be advisable to check the viability of
the next hop through other means, for example, 
<xref target="RFC5880" format="default">BFD</xref>. This may be especially
useful in cases where the next hop is known directly at the network layer,
notably EBGP. 
      </t>
      <t>
As discussed in this document, after a BGP session goes down and before the
session is re-established, stale routes may be retained for up to two
consecutive periods, controlled by the "Restart Time" and the "Long-lived
Stale Time", respectively. During the first period routing churn would be
prevented but with potential blackholing of traffic. During the second
period potential blackholing of traffic may be reduced but routing churn
would be visible throughout the network. The setting of the relevant
parameters for a particular application should take into account the
tradeoffs, the network dynamics, and potential failure scenarios. If needed,
the first period can be bypassed either by local configuration or by setting
the "Restart Time" in the Graceful Restart Capability to zero and/or not
listing the AFI/SAFI in that Capability.
      </t>
      <t>
The setting of the F bit (and the "Forwarding State" bit of the
accompanying GR capability) depends in part on deployment considerations.
The F bit can be understood as an indication that the Helper should flush
associated routes (if the bit is left clear). As discussed in <xref target="intro" format="default">
the Introduction</xref>, an important use case for LLGR is for routes that are more
akin to configuration than to traditional routing. For such routes, it may
make sense to always set the F bit, regardless of other considerations.
Likewise, for control-plane-only entities such as dedicated route
reflectors, that do not participate in the forwarding plane, it makes sense
to always set the F bit. Overall, the rule of thumb is that if loss of
state on the restarting router can reasonably be expected to cause a
forwarding loop or black hole, the F bit should be set scrupulously
according to whether state has been retained. Specifics of when the F bit
is, and is not, set are implementation-dependent and may also be controlled
by configuration. Also, for every AFI/SAFI represented in the LLGR capability
that is also represented in the GR capability, there will be two corresponding
F bits -- the LLGR F bit and the GR F bit. If the LLGR F bit is set, the 
corresponding GR F bit should also be set, since to do otherwise would cause
the state to be cleared on the Receiving Router per the normal rules of GR,
violating the intent of the set LLGR bit.
      </t>
      <section anchor="deploy_pe_ce" numbered="true" toc="default">
        <name>When BGP is the PE-CE Protocol in a VPN</name>
        <t>
As discussed in <xref target="pe_ce" format="default"/>, it may be necessary for a PE to
advertise stale routes to a CE in some VPN deployments, even if the CE
does not support this specification. In that case, the operator
configuring their PE to advertise such routes should notify the operator
of the CE receiving the routes, and the CE should be configured to
depreference the routes.
        </t>
        <t>
Similarly, it may be necessary for a CE to advertise stale routes to a
PE, even if the PE does not support this specification. In that case,
the operator configuring their CE to advertise such routes should notify
the operator of the PE receiving the routes, and the PE should be
configured to depreference the routes.
        </t>
        <t>
Typical BGP implementations will be able to be configured to
depreference routes by matching on the LLGR_STALE community and setting
the LOCAL_PREF for matching routes to zero, similar to the procedure
described in <xref target="partial_deploy" format="default"/>.
        </t>
      </section>
      <section anchor="depref" numbered="true" toc="default">
        <name>Risks of Depreferencing Routes</name>
        <t>
Depreferencing EBGP routes is considered safe, no different from the 
common practice of applying a routing policy to an EBGP session. 
However, the same is not always true of IBGP.
        </t>
        <t>
Consistent route selection is a fundamental tenet of IBGP correctness and
safe operation in hop-by-hop routed networks. When routers within an AS apply different criteria in
selecting routes, they can arrive at inconsistent route selections. 
This can lead to the formation of forwarding loops unless some
form of tunneled forwarding is used to prevent "core" routers from 
making a (potentially inconsistent) forwarding decision based on the
IP header. 
        </t>
        <t>
This specification uses the state of a peering session as an input
to the selection criteria, depreferencing routes that are associated
with a session that has gone down but have not yet aged out. Since
different routers within an AS might have different notions as to 
whether their respective sessions with a given peer are up or down, they might 
apply different selection criteria to routes from that peer. This 
could result in a forwarding loop forming between such routers.
        </t>
        <t>
For an example of such a forwarding loop, consider the following 
simple topology:
        </t>
        <artwork align="left" name="" type="" alt=""><![CDATA[
     A ---- B ---- C ------------------------- D
     ^                                         ^
     |                                         |
     R1                                        R2
  	]]></artwork>
        <t>
In this example, A - D are routers with a full mesh of IBGP sessions
between them (the sessions are not shown). 
The short links have unit cost, the long link has cost 5.
Routers A and D are AS border routers, each advertising some route, R, into
the AS -- these are denoted R1 and R2 in the diagram. In ordinary
operation, it can be seen that routers B and C will select R1 for
forwarding, and will forward toward A.
        </t>
        <t>
Suppose that the session between A and B goes down for some reason, and
stays down long enough for LLGR processing to be invoked on B. Then on B,
route R1 will be depreferenced, leading to the selection of R2 by B.
However, C will continue to prefer R1. It can be seen that in this case, a
forwarding loop for packets destined to R would form between B and C. (We
note that other forwarding loop scenarios can be constructed for
traditional GR, but are generally considered less severe since GR can
remain in effect for a much more limited interval.)
        </t>
        <t>
The potential benefits of this specification can outweigh the 
risks discussed above, as long as care is exercised in deployment. 
The cardinal rule to be followed is, if a given set of routes are 
being used within an AS for hop-by-hop forwarding, it is not
recommended to enable LLGR procedures. If tunneled forwarding (such
as MPLS) is used within the AS, or if routes are being used for 
purposes other than hop-by-hop forwarding, less caution is needed,
though the operator should still carefully consider the consequences
of enabling LLGR.
        </t>
      </section>
    </section>
    <section numbered="true" toc="default">
      <name>Security Considerations</name>
      <t>
The security implications of the LLGR mechanism defined 
in this document are akin to those incurred by the maintenance of
stale routing information within a network.  This is particularly
relevant when considering the maintenance of routing information that
is used for service segregation - such as MPLS label entries.
      </t>
      <t>
For MPLS VPN services, the effectiveness of the traffic isolation
between VPNs relies on the correctness of the MPLS labels between
ingress and egress PEs.  In particular, when an egress PE withdraws a
label L1 allocated to a VPN1 route, this label MUST NOT be assigned
to a VPN route of a different VPN until all ingress PEs stop using
the old VPN1 route using L1.
      </t>
      <t>
Such a corner case may happen today if the propagation of VPN routes
by BGP messages between PEs takes more time than the label re-allocation 
delay on a PE.  Given that we can generally bound the worst-case 
BGP propagation time to a few minutes (for example 2-5), the security
breach will not occur if PEs are designed to not reallocate a
previously used and withdrawn label before a few minutes.
      </t>
      <t>
The problem is made worse with BGP GR between PEs as VPN routes can
be stalled for a longer period of time (for example 20 minutes).
      </t>
      <t>
This is further aggravated by the BGP LLGR extension proposed
in this document as VPN routes can be stalled for a much longer
period of time (for example 2 hours, 1 day).
      </t>
      <t>
Therefore, to avoid VPN breach, before enabling BGP LLGR for a VPN
address family, Service Providers
need to check how fast a given label can be reused by a PE, taking
into account:
      </t>
      <ul spacing="normal">
        <li>
      The load of the BGP route churn on a PE (in terms of the number 
      of VPN labels advertised and the churn rate).
      </li>
        <li>
      The label allocation policy on the PE (possibly depending 
      upon the size of the pool of the VPN labels (which can be
      restricted by hardware considerations or other MPLS usages),
      the label allocation scheme (for example per route or per VRF/CE),
      the re-allocation policy (for example least recently used label).
      </li>
      </ul>
      <t>
Note that <xref target="RFC4781" format="default"/> which defines Graceful Restart Mechanism
for BGP with MPLS is also applicable to BGP LLGR.
      </t>
      <t>
These considerations notwithstanding, the LLGR mechanism
described within this document is considered to be complex to exploit
maliciously - in order to inject packets into a topology, there is a
requirement to engineer a specific LLGR state between two PE
devices, whilst engineering label reallocation to occur in a manner
that results in the two topologies overlapping.  Such allocation is
particularly difficult to engineer (since it is typically an internal
mechanism of a router).
      </t>
    </section>
    <section anchor="examples" numbered="true" toc="default">
      <name>Examples of Operation</name>
      <t>
For illustrative purposes, we present a few examples of how this
specification might be used in practice. These examples are neither
exhaustive nor normative.
      </t>
      <t>
Consider the following scenario: A border router, ASBR1, has an IBGP peering with
a route reflector, RR1, from which it learns routes. It has an EBGP peering
with an external peer, EXT, to which it advertises those routes. The external peer
has advertised the GR and LLGR Capabilities to ASBR1. ASBR1 is configured
to support GR and LLGR on its sessions with RR1 and EXT. RR1 advertises a GR Restart Time
of 1 (second) and an LLST of 3600 (seconds): 
      </t>
      <table align="center">
        <thead>
          <tr>
            <th align="left">Time</th>
            <th align="left">Event</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td align="left">t</td>
            <td align="left">ASBR1's IBGP session with RR fails. ASBR1 
      			   retains RR's routes according to the rules
      			   of <xref target="RFC4724" format="default">GR</xref></td>
          </tr>
          <tr>
            <td align="left">t+1</td>
            <td align="left">GR Restart Time expires. ASBR1 transitions
      			   RR's routes to long-lived stale by attaching
      			   the LLGR_STALE community and depreferencing
      			   them. However, since it has no backup
      			   routes, it continues to make use of them. It
      			   re-announces them to EXT with the 
      			   LLGR_STALE community attached.</td>
          </tr>
          <tr>
            <td align="left">t+1+3600</td>
            <td align="left">LLST expires. ASBR1 removes RR's stale 
      	                   routes from its own RIB and sends BGP updates
      	                   to withdraw them from EXT.</td>
          </tr>
        </tbody>
      </table>
      <t>
Next, imagine the same scenario but suppose RR1 advertised a GR Restart 
Time of zero, effectively disabling GR. Equally, ASBR1 could have used
local configuration to override RR1's offered Restart Time, setting it to
a locally-configured value of zero:
      </t>
      <table align="center">
        <thead>
          <tr>
            <th align="left">Time</th>
            <th align="left">Event</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td align="left">t</td>
            <td align="left">ASBR1's IBGP session with RR fails. ASBR1 transitions
      			   RR's routes to long-lived stale by attaching
      			   the LLGR_STALE community and depreferencing
      			   them. However, since it has no backup
      			   routes, it continues to make use of them. It
      			   re-announces them to EXT with the 
      			   LLGR_STALE community attached.</td>
          </tr>
          <tr>
            <td align="left">t+0+3600</td>
            <td align="left">LLST expires. ASBR1 removes RR's stale 
      	                   routes from its own RIB and sends BGP updates
      	                   to withdraw them from EXT.</td>
          </tr>
        </tbody>
      </table>
      <t>
Next, imagine the original scenario, but consider that the ASBR1-RR1
session comes back up and becomes synchronized 180 seconds after the failure was
detected:
      </t>
      <table align="center">
        <thead>
          <tr>
            <th align="left">Time</th>
            <th align="left">Event</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td align="left">t</td>
            <td align="left">ASBR1's IBGP session with RR fails. ASBR1 
      			   retains RR's routes according to the rules
      			   of <xref target="RFC4724" format="default">GR</xref></td>
          </tr>
          <tr>
            <td align="left">t+1</td>
            <td align="left">GR Restart Time expires. ASBR1 transitions
      			   RR's routes to long-lived stale by attaching
      			   the LLGR_STALE community and depreferencing
      			   them. However, since it has no backup
      			   routes, it continues to make use of them. It
      			   re-announces them to EXT with the 
      			   LLGR_STALE community attached.</td>
          </tr>
          <tr>
            <td align="left">t+1+179</td>
            <td align="left">Session is reestablished and resynchronized. ASBR1
      			   removes the LLGR_STALE community from
      			   RR1's routes and re-announces them to EXT with
      			   the LLGR_STALE community removed.</td>
          </tr>
        </tbody>
      </table>
      <t>
Finally, imagine the original scenario, but consider that EXT has not 
advertised the LLGR Capability to ASBR1:
      </t>
      <table align="center">
        <thead>
          <tr>
            <th align="left">Time</th>
            <th align="left">Event</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td align="left">t</td>
            <td align="left">ASBR1's IBGP session with RR fails. ASBR1 
      			   retains RR's routes according to the rules
      			   of <xref target="RFC4724" format="default">GR</xref></td>
          </tr>
          <tr>
            <td align="left">t+1</td>
            <td align="left">GR Restart Time expires. ASBR1 transitions
      			   RR's routes to long-lived stale by attaching
      			   the LLGR_STALE community and depreferencing
      			   them. However, since it has no backup
      			   routes, it continues to make use of them. It
      			   withdraws them from EXT.</td>
          </tr>
          <tr>
            <td align="left">t+1+3600</td>
            <td align="left">LLST expires. ASBR1 removes RR's stale 
      	                   routes from its own RIB.</td>
          </tr>
        </tbody>
      </table>
    </section>
    <section anchor="Acknowledgements" numbered="true" toc="default">
      <name>Acknowledgements</name>
      <t>
We would like to thank Nabil Bitar, Martin Djernaes, Roberto Fragassi,
Jeffrey Haas, Jakob Heitz, Daniam Henriques, Nicolai Leymann, Mike McBride, Paul Mattes, John Medamana, Pranav
Mehta, Han Nguyen, Saikat Ray, and Bo Wu for their valuable input
and contributions to the discussion and solution.
      </t>
    </section>
    <section numbered="true" toc="default">
      <name>Contributors</name>
      <artwork align="left" name="" type="" alt=""><![CDATA[
 Clarence Filsfils
 Cisco Systems
 Brussels  1150
 Belgium

 Email: cf@cisco.com
	]]></artwork>
      <artwork align="left" name="" type="" alt=""><![CDATA[
 Pradosh Mohapatra
 Sproute Networks

 Email: mpradosh@yahoo.com
	]]></artwork>
      <artwork align="left" name="" type="" alt=""><![CDATA[
 Yakov Rekhter
	]]></artwork>
      <!-- Note: Yakov has requested that his email address not be 
 published. I (John Scudder) can contact him if needed. -->
 
        <artwork align="left" name="" type="" alt=""><![CDATA[
 Eric Rosen

 Email: erosen52@gmail.com
	]]></artwork>
      <artwork align="left" name="" type="" alt=""><![CDATA[
  Rob Shakir
  Google, Inc.
  1600 Amphitheatre Parkway
  Mountain View, CA 94043
  United States of America

  Email: robjs@google.com
	]]></artwork>
      <artwork align="left" name="" type="" alt=""><![CDATA[
  Adam Simpson
  Nokia

  Email: adam.1.simpson@nokia.com
	]]></artwork>
    </section>
    <section anchor="IANA" numbered="true" toc="default">
      <name>IANA Considerations</name>
      <t>
This document defines a new BGP capability - Long-lived Graceful Restart
Capability. IANA has assigned a Capability Code of 71, from the "Capability Codes"
registry. 
      </t>
      <t>
This document introduces a new BGP well-known community "LLGR_STALE" for marking
long-lived stale routes, and another well-known community "NO_LLGR" to 
mark routes that should not be retained if stale. IANA has assigned these
well-known community values 0xFFFF0006 and 0xFFFF0007, respectively, from the
"BGP Well-known Communities" registry.
      </t>
      <t>
For each of these three registrations, IANA is requested to update the 
reference to refer to this document.
      </t>
      <t>
IANA is requested to establish a new registry called "Long-lived Graceful
Restart Flags for Address Family" under the 
Border Gateway Protocol (BGP) Parameters group. The registration 
procedures are Standards Action. The registry should initially be 
populated as follows:
      </t>
      <table align="center">
        <thead>
          <tr>
            <th align="left">Bit Position</th>
            <th align="left">Name</th>
            <th align="left">Short Name</th>
            <th align="left">Reference</th>
          </tr>
        </thead>
        <tbody>
          <tr>
            <td align="left">0</td>
            <td align="left">Preservation of state</td>
            <td align="left">F</td>
            <td align="left">This document</td>
          </tr>
          <tr>
            <td align="left">1-7</td>
            <td align="left">Unassigned</td>
            <td align="left"/>
            <td align="left"/>
          </tr>
        </tbody>
      </table>
    </section>
  </middle>
  <back>
    <references>
      <name>References</name>
      <references>
        <name>Normative References</name>
        <reference anchor="RFC1997" target="https://www.rfc-editor.org/info/rfc1997" xml:base="https://www.rfc-editor.org/refs/bibxml/reference.RFC.1997.xml">
          <front>
            <title>BGP Communities Attribute</title>
            <author initials="R." surname="Chandra" fullname="R. Chandra">
              <organization/>
            </author>
            <author initials="P." surname="Traina" fullname="P. Traina">
              <organization/>
            </author>
            <author initials="T." surname="Li" fullname="T. Li">
              <organization/>
            </author>
            <date year="1996" month="August"/>
            <abstract>
              <t>This document describes an extension to BGP which may be used to pass additional information to both neighboring and remote BGP peers. [STANDARDS-TRACK]</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="1997"/>
          <seriesInfo name="DOI" value="10.17487/RFC1997"/>
        </reference>
        <reference anchor="RFC2119" target="https://www.rfc-editor.org/info/rfc2119" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.2119.xml">
          <front>
            <title>Key words for use in RFCs to Indicate Requirement Levels</title>
            <author fullname="S. Bradner" initials="S." surname="Bradner"/>
            <date month="March" year="1997"/>
            <abstract>
              <t>In many standards track documents several words are used to signify the requirements in the specification.  These words are often capitalized.  This document defines these words as they should be interpreted in IETF documents.  This document specifies an Internet Best Current Practices for the Internet Community, and requests discussion and suggestions for improvements.</t>
            </abstract>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="2119"/>
          <seriesInfo name="DOI" value="10.17487/RFC2119"/>
        </reference>
        <reference anchor="RFC4271" target="https://www.rfc-editor.org/info/rfc4271" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4271.xml">
          <front>
            <title>A Border Gateway Protocol 4 (BGP-4)</title>
            <author fullname="Y. Rekhter" initials="Y." role="editor" surname="Rekhter"/>
            <author fullname="T. Li" initials="T." role="editor" surname="Li"/>
            <author fullname="S. Hares" initials="S." role="editor" surname="Hares"/>
            <date month="January" year="2006"/>
            <abstract>
              <t>This document discusses the Border Gateway Protocol (BGP), which is an inter-Autonomous System routing protocol.</t>
              <t>The primary function of a BGP speaking system is to exchange network reachability information with other BGP systems. This network reachability information includes information on the list of Autonomous Systems (ASes) that reachability information traverses. This information is sufficient for constructing a graph of AS connectivity for this reachability from which routing loops may be pruned, and, at the AS level, some policy decisions may be enforced.</t>
              <t>BGP-4 provides a set of mechanisms for supporting Classless Inter-Domain Routing (CIDR). These mechanisms include support for advertising a set of destinations as an IP prefix, and eliminating the concept of network "class" within BGP. BGP-4 also introduces mechanisms that allow aggregation of routes, including aggregation of AS paths.</t>
              <t>This document obsoletes RFC 1771. [STANDARDS-TRACK]</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="4271"/>
          <seriesInfo name="DOI" value="10.17487/RFC4271"/>
        </reference>
        <reference anchor="RFC4724" target="https://www.rfc-editor.org/info/rfc4724" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4724.xml">
          <front>
            <title>Graceful Restart Mechanism for BGP</title>
            <author fullname="S. Sangli" initials="S." surname="Sangli"/>
            <author fullname="E. Chen" initials="E." surname="Chen"/>
            <author fullname="R. Fernando" initials="R." surname="Fernando"/>
            <author fullname="J. Scudder" initials="J." surname="Scudder"/>
            <author fullname="Y. Rekhter" initials="Y." surname="Rekhter"/>
            <date month="January" year="2007"/>
            <abstract>
              <t>This document describes a mechanism for BGP that would help minimize the negative effects on routing caused by BGP restart. An End-of-RIB marker is specified and can be used to convey routing convergence information. A new BGP capability, termed "Graceful Restart Capability", is defined that would allow a BGP speaker to express its ability to preserve forwarding state during BGP restart. Finally, procedures are outlined for temporarily retaining routing information across a TCP session termination/re-establishment.</t>
              <t>The mechanisms described in this document are applicable to all routers, both those with the ability to preserve forwarding state during BGP restart and those without (although the latter need to implement only a subset of the mechanisms described in this document). [STANDARDS-TRACK]</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="4724"/>
          <seriesInfo name="DOI" value="10.17487/RFC4724"/>
        </reference>
        <reference anchor="RFC4760" target="https://www.rfc-editor.org/info/rfc4760" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4760.xml">
          <front>
            <title>Multiprotocol Extensions for BGP-4</title>
            <author fullname="T. Bates" initials="T." surname="Bates"/>
            <author fullname="R. Chandra" initials="R." surname="Chandra"/>
            <author fullname="D. Katz" initials="D." surname="Katz"/>
            <author fullname="Y. Rekhter" initials="Y." surname="Rekhter"/>
            <date month="January" year="2007"/>
            <abstract>
              <t>This document defines extensions to BGP-4 to enable it to carry routing information for multiple Network Layer protocols (e.g., IPv6, IPX, L3VPN, etc.).  The extensions are backward compatible - a router that supports the extensions can interoperate with a router that doesn't support the extensions. [STANDARDS-TRACK]</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="4760"/>
          <seriesInfo name="DOI" value="10.17487/RFC4760"/>
        </reference>
        <reference anchor="RFC5492" target="https://www.rfc-editor.org/info/rfc5492" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5492.xml">
          <front>
            <title>Capabilities Advertisement with BGP-4</title>
            <author fullname="J. Scudder" initials="J." surname="Scudder"/>
            <author fullname="R. Chandra" initials="R." surname="Chandra"/>
            <date month="February" year="2009"/>
            <abstract>
              <t>This document defines an Optional Parameter, called Capabilities, that is expected to facilitate the introduction of new capabilities in the Border Gateway Protocol (BGP) by providing graceful capability advertisement without requiring that BGP peering be terminated.</t>
              <t>This document obsoletes RFC 3392. [STANDARDS-TRACK]</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="5492"/>
          <seriesInfo name="DOI" value="10.17487/RFC5492"/>
        </reference>
        <!--      &RFC6513;
      &RFC6514;
      &RFC6625; -->
      <reference anchor="RFC6368" target="https://www.rfc-editor.org/info/rfc6368" xml:base="https://www.rfc-editor.org/refs/bibxml/reference.RFC.6368.xml">
          <front>
            <title>Internal BGP as the Provider/Customer Edge Protocol for BGP/MPLS IP Virtual Private Networks (VPNs)</title>
            <author initials="P." surname="Marques" fullname="P. Marques">
              <organization/>
            </author>
            <author initials="R." surname="Raszuk" fullname="R. Raszuk">
              <organization/>
            </author>
            <author initials="K." surname="Patel" fullname="K. Patel">
              <organization/>
            </author>
            <author initials="K." surname="Kumaki" fullname="K. Kumaki">
              <organization/>
            </author>
            <author initials="T." surname="Yamagata" fullname="T. Yamagata">
              <organization/>
            </author>
            <date year="2011" month="September"/>
            <abstract>
              <t>This document defines protocol extensions and procedures for BGP Provider/Customer Edge router iteration in BGP/MPLS IP VPNs.  These extensions and procedures have the objective of making the usage of the BGP/MPLS IP VPN transparent to the customer network, as far as routing information is concerned.  [STANDARDS-TRACK]</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="6368"/>
          <seriesInfo name="DOI" value="10.17487/RFC6368"/>
        </reference>
        <reference anchor="RFC8174" target="https://www.rfc-editor.org/info/rfc8174" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8174.xml">
          <front>
            <title>Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words</title>
            <author fullname="B. Leiba" initials="B." surname="Leiba"/>
            <date month="May" year="2017"/>
            <abstract>
              <t>RFC 2119 specifies common key words that may be used in protocol specifications.  This document aims to reduce the ambiguity by clarifying that only UPPERCASE usage of the key words have the defined special meanings.</t>
            </abstract>
          </front>
          <seriesInfo name="BCP" value="14"/>
          <seriesInfo name="RFC" value="8174"/>
          <seriesInfo name="DOI" value="10.17487/RFC8174"/>
        </reference>
        <reference anchor="RFC8538" target="https://www.rfc-editor.org/info/rfc8538" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8538.xml">
          <front>
            <title>Notification Message Support for BGP Graceful Restart</title>
            <author fullname="K. Patel" initials="K." surname="Patel"/>
            <author fullname="R. Fernando" initials="R." surname="Fernando"/>
            <author fullname="J. Scudder" initials="J." surname="Scudder"/>
            <author fullname="J. Haas" initials="J." surname="Haas"/>
            <date month="March" year="2019"/>
            <abstract>
              <t>The BGP Graceful Restart mechanism defined in RFC 4724 limits the usage of BGP Graceful Restart to BGP messages other than BGP NOTIFICATION messages.  This document updates RFC 4724 by defining an extension that permits the Graceful Restart procedures to be performed when the BGP speaker receives a BGP NOTIFICATION message or the Hold Time expires.  This document also defines a new subcode for BGP Cease NOTIFICATION messages; this new subcode requests a full session restart instead of a Graceful Restart.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="8538"/>
          <seriesInfo name="DOI" value="10.17487/RFC8538"/>
        </reference>
      </references>
      <references>
        <name>Informative References</name>
        <reference anchor="RFC4761" target="https://www.rfc-editor.org/info/rfc4761" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4761.xml">
          <front>
            <title>Virtual Private LAN Service (VPLS) Using BGP for Auto-Discovery and Signaling</title>
            <author fullname="K. Kompella" initials="K." role="editor" surname="Kompella"/>
            <author fullname="Y. Rekhter" initials="Y." role="editor" surname="Rekhter"/>
            <date month="January" year="2007"/>
            <abstract>
              <t>Virtual Private LAN Service (VPLS), also known as Transparent LAN Service and Virtual Private Switched Network service, is a useful Service Provider offering. The service offers a Layer 2 Virtual Private Network (VPN); however, in the case of VPLS, the customers in the VPN are connected by a multipoint Ethernet LAN, in contrast to the usual Layer 2 VPNs, which are point-to-point in nature.</t>
              <t>This document describes the functions required to offer VPLS, a mechanism for signaling a VPLS, and rules for forwarding VPLS frames across a packet switched network. [STANDARDS-TRACK]</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="4761"/>
          <seriesInfo name="DOI" value="10.17487/RFC4761"/>
        </reference>
        <reference anchor="RFC4781" target="https://www.rfc-editor.org/info/rfc4781" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4781.xml">
          <front>
            <title>Graceful Restart Mechanism for BGP with MPLS</title>
            <author fullname="Y. Rekhter" initials="Y." surname="Rekhter"/>
            <author fullname="R. Aggarwal" initials="R." surname="Aggarwal"/>
            <date month="January" year="2007"/>
            <abstract>
              <t>A mechanism for BGP that helps minimize the negative effects on routing caused by BGP restart has already been developed and is described in a separate document ("Graceful Restart Mechanism for BGP"). This document extends this mechanism to minimize the negative effects on MPLS forwarding caused by the Label Switching Router's (LSR's) control plane restart, and specifically by the restart of its BGP component when BGP is used to carry MPLS labels and the LSR is capable of preserving the MPLS forwarding state across the restart.</t>
              <t>The mechanism described in this document is agnostic with respect to the types of the addresses carried in the BGP Network Layer Reachability Information (NLRI) field. As such, it works in conjunction with any of the address families that could be carried in BGP (e.g., IPv4, IPv6, etc.). [STANDARDS-TRACK]</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="4781"/>
          <seriesInfo name="DOI" value="10.17487/RFC4781"/>
        </reference>
        <reference anchor="RFC4364" target="https://www.rfc-editor.org/info/rfc4364" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.4364.xml">
          <front>
            <title>BGP/MPLS IP Virtual Private Networks (VPNs)</title>
            <author fullname="E. Rosen" initials="E." surname="Rosen"/>
            <author fullname="Y. Rekhter" initials="Y." surname="Rekhter"/>
            <date month="February" year="2006"/>
            <abstract>
              <t>This document describes a method by which a Service Provider may use an IP backbone to provide IP Virtual Private Networks (VPNs) for its customers.  This method uses a "peer model", in which the customers' edge routers (CE routers) send their routes to the Service Provider's edge routers (PE routers); there is no "overlay" visible to the customer's routing algorithm, and CE routers at different sites do not peer with each other.  Data packets are tunneled through the backbone, so that the core routers do not need to know the VPN routes. [STANDARDS-TRACK]</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="4364"/>
          <seriesInfo name="DOI" value="10.17487/RFC4364"/>
        </reference>
        <reference anchor="RFC5880" target="https://www.rfc-editor.org/info/rfc5880" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.5880.xml">
          <front>
            <title>Bidirectional Forwarding Detection (BFD)</title>
            <author fullname="D. Katz" initials="D." surname="Katz"/>
            <author fullname="D. Ward" initials="D." surname="Ward"/>
            <date month="June" year="2010"/>
            <abstract>
              <t>This document describes a protocol intended to detect faults in the bidirectional path between two forwarding engines, including interfaces, data link(s), and to the extent possible the forwarding engines themselves, with potentially very low latency.  It operates independently of media, data protocols, and routing protocols. [STANDARDS-TRACK]</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="5880"/>
          <seriesInfo name="DOI" value="10.17487/RFC5880"/>
        </reference>
        <!--      &RFC7524; -->
      <reference anchor="RFC8955" target="https://www.rfc-editor.org/info/rfc8955" xml:base="https://bib.ietf.org/public/rfc/bibxml/reference.RFC.8955.xml">
          <front>
            <title>Dissemination of Flow Specification Rules</title>
            <author fullname="C. Loibl" initials="C." surname="Loibl"/>
            <author fullname="S. Hares" initials="S." surname="Hares"/>
            <author fullname="R. Raszuk" initials="R." surname="Raszuk"/>
            <author fullname="D. McPherson" initials="D." surname="McPherson"/>
            <author fullname="M. Bacher" initials="M." surname="Bacher"/>
            <date month="December" year="2020"/>
            <abstract>
              <t>This document defines a Border Gateway Protocol Network Layer Reachability Information (BGP NLRI) encoding format that can be used to distribute (intra-domain and inter-domain) traffic Flow Specifications for IPv4 unicast and IPv4 BGP/MPLS VPN services. This allows the routing system to propagate information regarding more specific components of the traffic aggregate defined by an IP destination prefix.</t>
              <t>It also specifies BGP Extended Community encoding formats, which can be used to propagate Traffic Filtering Actions along with the Flow Specification NLRI. Those Traffic Filtering Actions encode actions a routing system can take if the packet matches the Flow Specification.</t>
              <t>This document obsoletes both RFC 5575 and RFC 7674.</t>
            </abstract>
          </front>
          <seriesInfo name="RFC" value="8955"/>
          <seriesInfo name="DOI" value="10.17487/RFC8955"/>
        </reference>
      </references>
    </references>
  </back>
</rfc>
