<?xml version="1.0" encoding="US-ASCII"?>
<!DOCTYPE rfc SYSTEM "rfc2629.dtd">
<?rfc toc="yes"?>
<?rfc tocompact="yes"?>
<?rfc tocdepth="3"?>
<?rfc tocindent="yes"?>
<?rfc symrefs="yes"?>
<?rfc sortrefs="yes"?>
<?rfc comments="yes"?>
<?rfc inline="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<rfc category="std" docName="draft-wang-idr-bgp-error-enhance-00"
     ipr="trust200902">
  <front>
    <title abbrev="Abbreviated-Title">Revised Error Handling for BGP
    Messages</title>

    <author fullname="Haibo Wang" initials="H" surname="Wang">
      <organization>Huawei Technologies</organization>

      <address>
        <postal>
          <street>Huawei Campus, No. 156 Beiqing Road</street>

          <city>Beijing</city>

          <code>100095</code>

          <region/>

          <country>China</country>
        </postal>

        <phone/>

        <facsimile/>

        <email>rainsword.wang@huawei.com</email>
      </address>
    </author>

    <author fullname="Ming Shen" initials="M" surname="Shen">
      <organization>Huawei Technologies</organization>

      <address>
        <postal>
          <street>Huawei Campus, No. 156 Beiqing Road</street>

          <city>Beijing</city>

          <code>100095</code>

          <region/>

          <country>China</country>
        </postal>

        <phone/>

        <facsimile/>

        <email>shenming2@huawei.com</email>
      </address>
    </author>

    <author fullname="Jie Dong" initials="J" surname="Dong">
      <organization>Huawei Technologies</organization>

      <address>
        <postal>
          <street>Huawei Campus, No. 156 Beiqing Road</street>

          <city>Beijing</city>

          <code>100095</code>

          <region/>

          <country>China</country>
        </postal>

        <phone/>

        <facsimile/>

        <email>jie.dong@huawei.com</email>
      </address>
    </author>

    <date day="24" month="October" year="2021"/>

    <area>Routing Area</area>

    <workgroup>Interdomain Routing Working Group</workgroup>

    <keyword>BGP Error Handeling</keyword>

    <keyword>Draft</keyword>

    <abstract>
      <t>This document supplements and revises RFC7606. According to RFC 7606,
      when an UPDATE packet received from a neighbor contains an attribute of
      incorrect format, the BGP session cannot be reset directly. Instead, the
      BGP session must be reset based on the specific problem. Error packets
      must minimize the impact on routes and do not affect the correctness of
      the protocol. Different error handling methods are used. The error
      handling methods include discarding attributes, withdrawing routes,
      disabling the address family, and resetting sessions.</t>

      <t>RFC 7606 specifies the error handling methods of some existing
      attributes and provides guidance for error handling of new
      attributes.</t>

      <t>This document supplements the error handling methods for common
      attributes that are not specified in RFC7606, and provides suggestions
      for revising the error handling methods for some attributes. The general
      principle remains unchanged: Maintain established BGP sessions and keep
      valid routes updated. However, discard or delete incorrect attributes or
      packets to minimize the impact on the current session.</t>
    </abstract>

    <note title="Requirements Language">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
      document are to be interpreted as described in <xref
      target="RFC2119">RFC 2119</xref>.</t>
    </note>
  </front>

  <middle>
    <section title="Introduction">
      <t>According to RFC 4271, a BGP session that receives an UPDATE message
      containing a malformed attribute needs to reset the session that
      receives the malformed attribute. </t>

      <t>According to our experience during maintenance, malformed packets may
      be incorrectly encapsulated due to software bugs or mis-understanding of
      standards in software development. Interrupting a neighbor causes
      neighbor flapping, which does not help solve the problem. The malformed
      packets may not be recognized by intermediate routers and cannot be
      incorrectly checked and propagated to other routers that establish
      sessions. When they reach the router that recognizes and checks the
      attribute, the neighbor flapping may also occured. Even because routes
      are propagated multiple times, a route containing malformed packets may
      be received from multiple sessions at a checkpoint, causing multiple
      sessions to be reset, and the harm is multiplied.</t>

      <t>For the preceding reasons, RFC 7606 defines a new method for
      processing incorrect UPDATE packets. If the Update packet received from
      a neighbor contains incorrect attributes, the BGP session cannot be
      reset directly. Instead, the BGP session needs to be handled in a
      specific manner based on the principle that incorrect packets affect
      routes as little as possible and do not affect protocol correctness. The
      error handling methods include discarding attributes, withdrawing
      routes, disabling the address family, and resetting sessions.</t>

      <t>However, the error handling methods of some common attributes are not
      provided in RFC7606 or are different from those of vendors. This
      document supplements the error handling methods of some common
      attributes and provides suggestions for modifying the error handling
      methods of some attributes.</t>
    </section>

    <section title="Scenarios">
      <t><figure>
          <artwork align="center"><![CDATA[                              
 +-----+    +-----+   +-----+ 
 | RT1 |----| RR  |---| RT2 | 
 +-----+    +-.---+   +-----+ 
              |               
              \               
            +-----+           
            | RT3 |           
            +-----+           
Figure 1 A simple network
]]></artwork>
        </figure>Figure 1 shows a simple network. When RT3 has some software
      bugs or misunderstands the RFC, it may send a malformed packet. The RR
      receives the packet and considers it a malformed packet according to the
      error handling rules and resets the session. Later the session between
      RT3 and RR is re-established. RT3 will resend the packet to RR, and RR
      continues to repeat the previous action. This will happen continuously
      until the operator modifies the configuration, such as deleting the
      configuration about that new feature of the property, and sometimes they
      don't know how to correct the problem. During this process, RT3 services
      are unavailable. Frequent neighbor reestablishment and route updating
      also consumes more RR's system resources.</t>

      <t>If the RR does not understand the attribute, the RR sends packets to
      RT1 and RT2. RT1 and RT2 may perform the same operation as RR. As a
      result, services between RT1 and RT2 are interrupted.</t>
    </section>

    <section title="Error-Handling Procedures Update for NLRI">
      <t/>

      <section title="Prefix Length Error">
        <t>According to <xref target="RFC7606"/>, when a NLRI/UNLRI or
        MP_REACH_NLRI/MP_REACH_UNLRI with invalid length, eg, IPv4 Prefix
        length is more than 32, we must drop this Prefix and ignore the
        following Prefixes. We may keep the prefixes we have parsed correctly
        before.</t>

        <t>Then we may also try to continue parse the next update packet if we
        can correctly find it.</t>

        <t>The NLRI/UNLRI or MP_REACH_NLRI/MP_REACH_UNLRI with invalid length
        is malformed.</t>
      </section>

      <section title="Appears More Than Once">
        <t><xref target="RFC7606"/> described like this:</t>

        <t>If the MP_REACH_NLRI attribute or the MP_UNREACH_NLRI [RFC4760]
        attribute appears more than once in the UPDATE message, then a
        NOTIFICATION message MUST be sent with the Error Subcode "Malformed
        Attribute List".</t>

        <t>Revised suggestion:</t>

        <t>If the MP_REACH_NLRI attribute or the MP_UNREACH_NLRI [RFC4760]
        attribute appears more than once in the UPDATE message, only the last
        MP_REACH_NLRI/MP_UNREACH_NLRI SHOULD be processed, the others would be
        ignore.</t>
      </section>
    </section>

    <section title="Error-Handling Procedures Update for Existing Attributes">
      <t/>

      <section title="Nexthop">
        <t><xref target="RFC4271"/> define the IP address in the NEXT_HOP meet
        the following criteria to be considered semantically incorrect:</t>

        <t>a) It is the IP address of receiving speaker.</t>

        <t>b) The IP address is not EBGP directly neighbor's address or not
        share a common subnet with the receiving BGP speaker.</t>

        <t>An update message with the case a) MAY be install to the RIB but
        treat as invalid.</t>

        <t>Whether an update message with the case b) SHOULD be considered
        semantically incoorect depends on the user's configuration.</t>

        <t>The following criteria also must to be considered semantically
        incorrect:</t>

        <t>c) The IP address is all zero.</t>

        <t>d) The IP address is all one.</t>

        <t>e) The IP address is multicast address(Class D) or reserved address
        (Class E).</t>

        <t>f) The IP address is not a invalid ip address.</t>

        <t>An update message with the case c) to f) SHOULD be logged, and the
        route will be treat-as-withdraw.</t>
      </section>

      <section title="MP_REACH_NLRI">
        <t><xref target="RFC7606"/> suggest to do "session reset" or "AFI/SAFI
        disable" approach. But this approach is too strict.</t>

        <t>If the Length of Next Hop Network Address field of the MP_REACH
        attribute is inconsistent with that which was expected, the attribute
        is considered malformed. The whole MP_REACH attribute will be ignore
        and try to parse the next update packet. When it cannot correctly
        locate the next update packet, it will do the procedure suggested
        according to <xref target="RFC7606"/> . Otherwise, only the error
        SHOULD be logged and continued to do packet parsing.</t>

        <t>An update message may both contained MP_REACH_NLRI and
        MP_REACH_UNLRI. If there are same Prefixes in both MP_REACH_NLRI and
        MP_REACH_UNLRI, the message SHOULD NOT be consider malformed. In this
        case, it should be firstly process the Prefixes in the MP_REACH_NLRI
        then process the Prefixes in the MP_REACH_UNLRI.</t>
      </section>

      <section title="Prefix SID">
        <t>According to <xref target="RFC8669"/>, an update message containing
        a malformed or invalid BGP Prefix-SID attribute will be ignore and not
        advertise it to other BGP peers. But this procedure may lead to
        unexpected results.</t>

        <t>The error handling is revised to be treat-as-withdraw.</t>
      </section>

      <section title="AGGREGATE and AS4_AGGREGATOR">
        <t>When the router-id in AGGREGATE or AS4_AGGREGATE attibute is zero,
        the attribute SHOULD be consider semantically incorrect, and the
        attribute SHOULD be logged and discard.</t>
      </section>

      <section title="ORIGINATOR_ID">
        <t>The error handling of <xref target="RFC4456"/> and <xref
        target="RFC7606"/> is revised as follows.</t>

        <t>When the BGP Identifier in ORIGINATOR_ID attibute is zero, the
        attribute SHOULD be consider semantically incorrect, and the attribute
        SHOULD be logged and the UPDATE message SHALL be handled using the
        approach of "treat-as- withdraw".</t>
      </section>

      <section title="Cluster-List">
        <t>The error handling of <xref target="RFC4456"/> and <xref
        target="RFC7606"/> is revised as follows.</t>

        <t>When the CLUSTER_ID value in ORIGINATOR_ID attibute is zero, the
        attribute SHOULD be consider semantically incorrect, and the attribute
        SHOULD be logged and the UPDATE message SHALL be handled using the
        approach of "treat-as- withdraw".</t>
      </section>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This document makes no request of IANA.</t>
    </section>

    <section anchor="Security" title="Security Considerations">
      <t>This document helps reduce the impact of malformed packets on the
      network and devices.</t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>
    </references>

    <references title="Informative References">
      <?rfc include="reference.RFC.4271"?>

      <?rfc include="reference.RFC.4456"?>

      <?rfc include="reference.RFC.7606"?>

      <?rfc include="reference.RFC.8669"?>
    </references>
  </back>
</rfc>
