<?xml version="1.0" encoding="utf-8"?>
<!--DOCTYPE rfc SYSTEM "rfc2629.dtd"-->
<?rfc comments="yes"?>
<?rfc compact="yes"?>
<?rfc subcompact="no"?>
<?rfc inline="yes"?>
<?rfc sortrefs="yes"?>
<?rfc symrefs="yes"?>
<?rfc toc="yes"?>
<?rfc tocdepth="6"?>
<?rfc tocindent="yes"?>
<?rfc tocompact="yes"?>

<rfc consensus="yes" category="std" submissionType="IETF" docName="draft-ymbk-idr-l3nd-06" ipr="trust200902">

<front>

  <title>Layer-3 Neighbor Discovery</title>

    <author fullname="Randy Bush" initials="R." surname="Bush">
      <organization>Arrcus &amp; Internet Initiative Japan</organization>
      <address>
        <postal>
          <street>5147 Crystal Springs</street>
          <city>Bainbridge Island</city>
          <region>WA</region>
          <code>98110</code>
          <country>US</country>
          </postal>
        <email>randy@psg.com</email>
      </address>
    </author>

    <author fullname="Russ Housley" initials="R" surname="Housley">
      <organization abbrev="Vigil Security">Vigil Security, LLC</organization>
      <address>
        <postal>
          <street>516 Dranesville Road</street>
          <city>Herndon</city>
          <region>VA</region>
          <code>20170</code>
          <country>USA</country>
           </postal>
        <email>housley@vigilsec.com</email>
      </address>
    </author>

    <author initials="R." surname="Austein" fullname="Rob Austein">
      <organization abbrev="Arrcus">Arrcus, Inc</organization>
      <address>
        <email>sra@hactrn.net</email>
      </address>
    </author>
    
    <author fullname="Susan Hares" initials="S" surname="Hares">
      <organization>Hickory Hill Consulting</organization>
      <address>
        <postal>
          <street>7453 Hickory Hill</street>
          <city>Saline</city>
          <region>MI</region>
          <code>48176</code>
          <country>USA</country>
        </postal>
          <phone>+1-734-604-0332</phone>
        <email>shares@ndzh.com</email>
      </address>
    </author>
      
    <author fullname="Keyur Patel" initials="K." surname="Patel">
      <organization>Arrcus</organization>
      <address>
        <postal>
          <street>2077 Gateway Place, Suite #400</street>
          <city>San Jose</city>
          <region>CA</region>
          <code>95119</code>
          <country>US</country>
          </postal>
        <email>keyur@arrcus.com</email>
        </address>
      </author>

  <date />

  <abstract>

    <t>Data Centers where the topology is BGP-based need to discover
    neighbor IP addressing, IP Layer-3 BGP neighbors, etc.  This Layer-3
    Neighbor Discovery protocol identifies BGP neighbor candidates.</t>

  </abstract>

  <note title="Requirements Language">

    <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
    "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
    "OPTIONAL" in this  document are to be interpreted as described in
    BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> when,
    and only when, they appear in all capitals, as shown here.</t>

    </note>

  </front>

<middle>

  <section anchor="intro" title="Introduction">

    <t>The Massive Data Center (MDC) environment presents unusual
    problems of scale, e.g. O(10,000) forwarding devices, while its
    homogeneity presents opportunities for simple approaches.  Layer-3
    Discovery and Liveness (L3DL), <xref target="I-D.ietf-lsvr-l3dl"/>,
    provides neighbor discovery at Layer-2.  This document (set)
    provides a similar solution at Layer-3, attempting to be as similar
    as reasonable to L3DL.</t>

    <t>Some guiding principles when dealing with datacenters with tens
    of thousands of devices are <list style="symbols">
<?rfc subcompact="yes"?>
      <t>Predictable Reliability,</t>
      <t>Security: Authorization and Integrity more than
      Confidentiality, and</t>
      <t>Massive Scalability</t>
<?rfc subcompact="no"?>
    </list></t>

    <t>Layer-3 Neighbor Discovery (L3ND) provides brutally simple
    mechanisms for neighboring devices to <list style="symbols">
<?rfc subcompact="yes"?>
      <t>Discover each other's IP Addresses,</t>
      <t>Discover mutually supported layer-3 encapsulations, e.g.
      IPv4/IPv6//MPLS,</t>
      <t>Discover Layer-3 IP and/or MPLS addressing of interfaces of the
      encapsulations,</t>
      <t>Provide authenticity, integrity, and verification of protocol
      messages, and</t>
      <t>Accommodate scaling needed for EVPN etc.</t>
<?rfc subcompact="no"?>
    </list></t>

    <t>L3ND is intended for use within single IP subnets (IP over
    Ethernet or other point-to-point or multi-point IP link) in order to
    exchange the data needed to bootstrap BGP-based peering, EVPN, etc.;
    especially in a datacenter Clos <xref target="Clos"/> environment.
    Once IP connectivity has been leveraged to discover layer-3
    addressability and forwarding capabilities, normal IP forwarding and
    routing can take over.</t>

    <t>L3ND might be more widely applicable to a range of routing and
    similar protocols which need Layer-3 neighbor discovery.</t>

    </section>

  <section anchor="terminology" title="Terminology">

    <t>Even though it concentrates on the inter-device layer, this
    document relies heavily on routing terminology.  The following
    attempts to clarify the use of some possibly confusing terms:
    <list hangIndent="11" style="hanging">
<?rfc subcompact="yes"?>
<!--  <t hangText="ASN:">Autonomous System Number <xref
      target="RFC4271"/>, a BGP identifier for an originator of
      Layer-3 routes, particularly BGP announcements.</t>  -->
      <t hangText="Clos:">A hierarchic subset of a crossbar switch
      topology commonly used in data centers <xref target="Clos"/>.</t>
      <t hangText="Datagram:">The L3ND content of a single Layer-3
      UDP Datagram.</t>
      <t hangText="Encapsulation:">Address Family Indicator and
      Subsequent Address Family Indicator (AFI/SAFI).  I.e. classes of
      Layer-2.5 and Layer-3 addresses such as IPv4, IPv6, MPLS.</t>
      <t hangText="Link or Logical Link:">A logical connection between
      two interfaces on two different devices.  E.g. two VLANs between
      the same two ports are two links.</t>
      <t hangText="MDC:">Massive Data Center, commonly composed of
      thousands of Top of Rack Switches (TORs).</t>
      <t hangText="MTU:">Maximum Transmission Unit, the size in octets
      of the largest packet that can be sent on a medium, see <xref
      target="RFC1122"/> 1.3.3.</t>
      <t hangText="PDU:">Protocol Data Unit, an L3ND application layer
      message.</t>
<!--  <t hangText="RouterID:">An 32-bit identifier unique in the
      current routing domain, see <xref target="RFC6286"/>.</t> -->
      <t hangText="Session:">An established, via exchange of OPEN PDUs,
      session between two L3ND capable IP interfaces on a link.</t>
      <t hangText="TOR Switch:">Top Of Rack switch, aggregates the
      servers in a rack and connects to aggregation layers of the Clos
      tree, AKA the Clos spine.</t>
<!--  <t hangText="ZTP:">Zero Touch Provisioning gives devices initial
      addresses, credentials, etc. on boot/restart.</t> -->
<?rfc subcompact="no"?>
    </list></t>

    </section>

  <section anchor="background" title="Background">

    <t>L3ND is primarily designed for a Clos type datacenter scale and
    topology, but can accommodate richer topologies which contain
    potential cycles.</t>

    <t>While L3ND is designed for the MDC, there are no inherent reasons
    it could not run on a WAN.  The authentication and authorization
    needed to run safely on a WAN need to be considered, and the
    appropriate level of security options chosen.</t>

    <t>The number of addresses of one Encapsulation type on an interface
    link may be quite large given a TOR switch with tens of servers,
    each server having a few hundred micro-services, resulting in an
    inordinate number of addresses.  And highly automated micro-service
    migration can cause serious address prefix disaggregation, resulting
    in interfaces with thousands of disaggregated prefixes.</t>

    <t>To meet such scaling needs, the L3ND protocol is session oriented
    and uses incremental announcement and withdrawal with session
    restart, a la BGP (<xref target="RFC4271"/>).</t>

    </section>

  <section anchor="ilpo" title="Inter-Link Protocol Overview">

    <t>A device broadcasts a Layer-3 Multicast UDP datagram (HELLO)
    containing the port number that is willing to serve a TLS or raw
    TCP connection to support the data exchange of the rest of the
    protocol in a reliable and preferably authenticated manner.</t>

    <t>Another device on the link then establishes a TLS or raw TCP
    session in which inter-device PDUs are used to exchange device and
    logical link identities and layer-2.5 (MPLS) and 3 identifiers
    (not payloads), e.g. more IP Addresses, loopback addresses, port
    identities, and Encapsulations.</t>

    <t>To assure discovery of new devices coming up on a multi-link
    topology, devices on such a topology, and only on a multi-link
    topology, send periodic HELLOs forever, see <xref
    target="dhello"/>.</t>

    <t>Given the TLS/TCP session, OPEN PDUs (<xref target="open"/>) are
    exchanged, the Encapsulations (<xref target="afisafi"/>) configured
    on an end point may be announced and modified.  Note that these are
    only the encapsulation and addresses configured on the announcing
    interface; though a device's loopback and overlay interface(s) may
    also be announced.</t>
 
    <section anchor="ladder" title="L3ND Ladder Diagram">

      <t>The HELLO, <xref target="hello"/>, is a priming message sent on
      all logical links configured for L3ND.  It is a small L3ND
      Multicast UDP PDU with the simple goal of advertising a TLS/TCP
      service available on an advertised port on the sending IP
      interface.</t>

      <t>The HELLO PDU is either IPv4 or IPv6, which selects the AFI to
      be used for the rest of the session(s) between end-points.  Two
      endpoints MAY establish a link for each AFI.</t>

      <t>An interface on the link receiving the HELLO PDU attempts to
      establish a TLS or raw TCP, as specified by the HELLO, session to
      the source IP address of the HELLO on the port advertised in the
      HELLO.</t>

      <t>The OPEN, <xref target="open"/> PDUs, used to exchange details
      about the L3ND session, and the ACK/ERROR PDU, are mandatory;
      other PDUs are optional; though at least one encapsulation SHOULD
      be agreed at some point.</t>

      <t>Like Multi-Protocol BGP, <xref target="RFC4760"/>, an L3ND
      session running over one AFI MAY carry encapsulations etc. of
      different AFIs,</t>
      
      <t>A L3DL extension, <xref target="I-D.ymbk-idr-l3nd-ulpc"/>,
      describes the next upper layer L3DL protocol to exchange BGP
      parameter information.</t>

      <t>The following is a ladder-style diagram of the L3ND protocol
      exchanges:</t>

    <figure>
        <artwork>
|            HELLO            | Logical Link Peer discovery
|----------------------------&gt;|
|          TCP OPEN           | Mandatory
|&lt;----------------------------|
|                             |
|                             |
|            OPEN             | IDs, security, etc.
|----------------------------&gt;|
|            ACK              |
|&lt;----------------------------|
|                             |
|            OPEN             | Mandatory
|&lt;----------------------------|
|            ACK              |
|----------------------------&gt;|
|                             |
|                             |
|   Interface IPv4 Addresses  | Interface IPv4 Addresses
|----------------------------&gt;| Optional
|            ACK              |
|&lt;----------------------------|
|                             |
|   Interface IPv4 Addresses  |
|&lt;----------------------------|
|            ACK              |
|----------------------------&gt;|
|                             |
|                             |
|   Interface IPv6 Addresses  | Interface IPv6 Addresses
|----------------------------&gt;| Optional
|            ACK              |
|&lt;----------------------------|
|                             |
|   Interface IPv6 Addresses  |
|&lt;----------------------------|
|            ACK              |
|----------------------------&gt;|
|                             |
|                             |
|   Interface MPLSv4 Labels   | Interface MPLSv4 Labels
|----------------------------&gt;| Optional
|            ACK              |
|&lt;----------------------------|
|                             |
|   Interface MPLSv4 Labels   | Interface MPLSv4 Labels
|&lt;----------------------------| Optional
|            ACK              |
|----------------------------&gt;|
|                             |
|                             |
|   Interface MPLSv6 Labels   | Interface MPLSv6 Labels
|----------------------------&gt;| Optional
|            ACK              |
|&lt;----------------------------|
|                             |
|   Interface MPLSv6 Labels   | Interface MPLSv6 Labels
|&lt;----------------------------| Optional
|            ACK              |
|----------------------------&gt;|
        </artwork>
      </figure>

      </section>
    </section>

  <section anchor="tlv" title="TLV PDUs">

    <t>The basic L3ND application layer PDU is a typical TLV (Type
    Length Value) PDU.  As it is transported over TCP, integrity is
    assured.  When it is transported over TLS, authenticity is also
    provided.</t>

<!--
    protocol "Version = 0:8,PDU Type:8,Payload Length:32,Payload ...:48"
-->

      <figure>
        <artwork> 
 0                   1                   2                   3  
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Version = 0  |    PDU Type   |         Payload Length        ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                               |                               ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               ~
~                          Payload ...                          ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        </artwork>
      </figure>

    <t>The fields of the basic L3ND header are as follows:
    <list style="hanging">

      <t hangText="Version:">An integer differentiating versions of the
      L3ND protocol.  Currently only Version 0 MAY BE specified.</t>

      <t hangText="PDU Type:">An integer differentiating PDU payload
      types.  See <xref target="iana-types"/>.</t>
      
      <t hangText="Payload Length:">Total number of octets in the
      Payload field.</t>

      <t hangText="Payload:">The application layer content of the L3ND
      PDU.</t>

      </list></t>

    </section>

  <section anchor="hello" title="HELLO">

<!--
    protocol "Version = 0:8,PDU Type = 0:8,Payload Length = 4:32,Transport:8,Flags:8,Port:16"
-->

    <figure>
      <artwork>
 0                   1                   2                   3  
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Version = 0  |  PDU Type = 0 |       Payload Length = 4      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                               |   Transport   |     Flags     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|              Port             |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        </artwork>
      </figure>

    <t>The Payload Length is 4 to cover the Transport, Flags, and Port
    fields.</t>

    <t>The IPv4 UDP packets are sent to the IPv4 link local multicast
    address (TBD1) and the IPv6 UDP packets are sent to an IPv6 link
    Local multicast address (TBD2).  See <xref target="dhello"/> for why
    multicast is used.</t>

    <t>The HELLO PDU solicits a unicast TLS/TCP session open request of
    the same AFI from other devices on the link.</t>

    <t>When a HELLO is received from a source IP address with which
    there is no established TLS/TCP L3ND session, if the receiver has
    the higher of the two IP addresses, it SHOULD respond by sending a
    TLS/TCP client open request, using the same AFI, to the source IP
    address of the HELLO to establish an L3ND TLS/TCP session.</t>

    <t>All L3ND PDUs other than HELLO are sent via TLS/TCP, as the
    server's destination IP address is known after the HELLO.</t>

    <t>When an interface is turned up on a device, it SHOULD issue a
    HELLO if it is configured to participate in L3ND sessions and repeat
    HELLOs at a configured interval, with a default of 60 seconds.</t>

    <t>If the configured multicast destination address is one that is
    propagated by switches, the HELLO SHOULD be repeated at a configured
    interval, with a default of 60 seconds.  This allows discovery by
    new devices which come up on the mesh.  In this multi-link scenario,
    the operator should be aware of the trade-off between timer tuning
    and network noise and adjust the inter-HELLO timer accordingly.</t>

    <t>By default, GTSM, <xref target="RFC5082"/>, SHOULD be enabled to
    test that a received HELLO MUST be on the local link; thus leaving
    no middle on which a monkey in the middle might stand.  It MAY be
    disabled by configuration.  GTSM check failures SHOULD be logged,
    though with rate limiting to keep from overwhelming logs.</t>

    <t>If more than one device responds, one adjacency is formed for
    each unique source IP address.  L3ND treats each adjacency as a
    separate logical link.</t>

    <t>To ameliorate possible load spikes during bootstrap or event
    recovery, there SHOULD be a jittered delay between receipt of a
    HELLO and TLS/TCP open.  The default delay range SHOULD be zero to
    five seconds, and MUST be configurable.</t>

    <t>If a HELLO is received from an IP Address with which there is an
    established session for that AFI, the HELLO SHOULD be dropped.</t>

    <t>A device with a TLS/TCP listener SHOULD log or otherwise report
    repeated failed inbound attempts.</t>

    <section anchor="transport" title="Transport">

      <t>The Transport signals the type of transport security for the
      session.</t>

      <t>The actual transport options are actually pre-configured in the
      devices by provisioning, as most require certificates etc.  It is
      best to think of this field as in-band signaling to conform the
      correctness of the pre-configurations.  Any disagreements MUST BE
      considered to indicate an error condition and brought to the
      attention of the operator.</t>

      <t>The Transport field is an enumeration with the following values:
	<list style="hanging">
	  <?rfc subcompact="yes"?>
	  <t hangText="0: Raw TCP:">TLS is not used.</t>
	  <t hangText="1: TLS TOFU:">TLS using a self-signed server
	  certificate.</t>
	  <t hangText="2: TLS CA-NoIP:">TLS using a CA-Based server
	  certificate, with no IP address extension.</t>
	  <t hangText="3: TLS CA WithIP:">TLS using a CA-Based server
	  certificate, with the server's IP address in the subject
	  alternative name extension (see <xref target="RFC5280"/>
	  Section 4.2.1.6).</t>
	  <t hangText="4-255:">Reserved.</t>
	  <?rfc subcompact="no"?>
	</list></t>

      <t>If server certificates are to be used, they may be locally
      generated and then signed by a CA or generated by the CA and
      loaded.  See <xref target="RFC8635"/>.</t>
      
      </section>

    <section anchor="flags" title="Flags">

      <t>Though the Working Group scope for this protocol is within a
      data center, an issue was raised that, on an internet echange with
      route server(s), it would attempt to form adjacencies with all
      members of the exchange.  Hence a Flag field is provided to
      indicate that a device does not intend to field a TLS/TCP server
      on the announcing interface, but does seek one or more from
      peers.</t>
      
      <t>Currently, only one Flags field is defined
	<list style="hanging">
	  <?rfc subcompact="yes"?>
	  <t hangText="Bit 0: Client Only">This interface does not
	  provide a TLS/TCP server.</t>
	  <t hangText="Bits 1-7:">Reserved.</t>
	  <?rfc subcompact="no"?>
	</list></t>

      </section>

    <section anchor="port" title="Port">
	
      <t>The Port is the two octet TCP Port Number (default is TBD3) on
      which the HELLO sender SHOULD have a waiting TLS/TCP (as specified
      in Flags) server listening unless the Client Only Flag is set.
      Though the IANA assigned well-known port SHOULD be used, this
      field allows configuration of alternate ports.</t>
      
      </section>
    </section>

  <section anchor="tcp" title="TCP Set-Up">

    <t>As it is assumed that the configured deployment of a data center
    would have compatible parameters on all devices, any disagreement
    over TLS/TCP or trust anchors MUST be logged; with rate limiting of
    the logging.</t>

    <t>By default, GTSM, <xref target="RFC5082"/>, SHOULD be enabled to
    ensure that a SYN received in response to a HELLO is on the local
    link.  It MAY be disabled by configuration.  GTSM check failures
    SHOULD be logged; though with rate limiting to keep from
    overwhelming logs.</t>

    <t>If the receiver of a HELLO agrees with the sender's choice of
    TLS/TCP and authentication, both sides have agreed on an AFI for the
    transport and on each other's IP address in that AFI.  This is
    sufficient to open a TCP session between them, which will allow for
    reliable transport of very large data PDUs while obviating the need
    to invent complex transports.</t>

    <t>The L3ND peer with the higher IP address MUST act as the TLS/TCP
    client and open the transport session (send SYN) toward the peer
    with the lower IP address.</t>

    <t>The server, the sender of the HELLO from the lower IP address,
    listens on the advertised port for the TLS/TCP session open.  The
    receiver of the acceptable HELLO, the TLS/TCP client, initiates a
    TLS or raw TCP session toward the sender of the HELLO, the TLS/TCP
    server, preferably TLS, as advertised.  If TLS, the server has
    chosen and signaled either a self-signed certificate or one
    configured from the operational CA trusted by both parties, as
    negotiated in the HELLO exchange.</t>

    <t>Once the TLS/TCP session is established, if its interface is
    configured as point to point, the client side SHOULD stop listening
    on any port for which it has sent a HELLO.  The server, if
    configured as a point to point interface SHOULD stop sending
    HELLOs.</t>

    <t>If the TLS/TCP open fails, then this SHOULD be logged and the
    parties MUST go back to the initial state and try HELLO.  Logging
    SHOULD be rate limited.</t>

    <t>Should an interface with an established TLS/TCP session be
    reconfigured changing the TLS/TCP parameters, the TLS/TCP session
    should be closed or torn down and both parties should return to the
    HELLO state.</t>

    <t>Should the TLS/TCP session terminate for any reason, the devices
    SHOULD restart/resume HELLOs.  When the new TLS/TCP session is
    started, if possible the OPEN PDU SHOULD try to resume the lost
    logical session by using the same nonce and resuming from the last
    Serial Number.</t>

    <t>Once the TLS/TCP session has been established, the two devices
    exchange L3ND PDUs, starting with OPENs.</t>

    </section>
    
  <section anchor="open" title="OPEN">

    <t>Each device has learned the other's IP Address from the
    HELLO exchange, see <xref target="hello"/> and established a
    TLS/TCP session over a particular AFI.</t>

    <t>The first PDU each sends MUST be an OPEN, and the other side MUST
    respond with an ACK PDU.</t>

<!--
    protocol "Version = 0:8,PDU Type = 1:8,Payload Length:32,Session ID:32,Serial Number:32,AttrCount:8,Attribute List ...:40"
-->

      <figure>
        <artwork>
 0                   1                   2                   3  
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Version = 0  |  PDU Type = 1 |         Payload Length        ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                               |           Session ID          ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                               |         Serial Number         ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                               |   AttrCount   |               ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+               +
~                       Attribute List ...                      ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
         </artwork>
      </figure>

    <t>The four octet Payload Length is the number of octets in all
    fields of the PDU from the Session ID through the Serial Number.</t>

    <t>The four octet Session ID is a nonce which uniquely identifies a
    session.  It enables detection of a duplicate OPEN PDU.  It SHOULD
    be either a random number or a high resolution timestamp.  It is
    needed to prevent session closure due to a repeated OPEN caused by a
    race or a dropped or delayed ACK.  It can be used to resume a
    dropped logical session.</t>

    <t>The one octet AttrCount is the number of attributes in the
    Attribute List.  A node may send zero or more attributes.</t>

    <t>Attributes are single octets the semantics of which are
    operator-defined, e.g.: spine, leaf, backbone, route reflector,
    arabica, ...</t>

    <t>Attribute syntax and semantics are local to an operator or
    datacenter; hence there is no global registry.  Nodes exchange
    their attributes only in the OPEN PDU.</t>

    <t>Unlike L3DL <xref target="I-D.ietf-lsvr-l3dl"/>, there are no
    verifiable keys in the PDUs.  If the operator wants authentication,
    integrity, confidentiality, then TLS MUST have been requested by the
    HELLO and agreed by the TLS session open.</t>

    <t>The Serial Number is a monotonically increasing four octet value
    representing the sender's state at the time of sending the last PDU.
    It may be a non-negative integer, a timestamp, etc.  If incrementing
    the Serial Number would cause it to be zero, it should be
    incremented again.</t>

    <t>On session restart (new OPEN, same Session ID), a receiver MAY
    send the last received Serial Number to tell the sender to only send
    data with a Serial Number greater (in the <xref target="RFC1982"/>
    sense), or send a Serial Number of zero to request all data.</t>

    <t>This allows a sender of an OPEN to tell the receiver that the
    sender would like to resume a logical session and that the receiver
    of the OPEN PDU only needs to send data starting with the PDU with
    the lowest Serial Number greater (in the <xref target="RFC1982"/>
    sense) than the one sent in the OPEN.  If the sender is not trying
    to resume a dropped session, the Serial Number MUST be zero.</t>

    <t>If the receiver of an OPEN PDU with a non-zero Serial Number can
    not resume from the requested point, it should return an ACK with an
    Error Code of 5, Session May Not Be Continued, EType of 1.  The
    sender of the failing OPEN PDU SHOULD respond with an OPEN PDU with
    a Serial Number of zero.</t>

    <t>If a sender of OPEN does not receive an ACK of the OPEN PDU in a
    configurable time (default 5 seconds), then they SHOULD close or
    otherwise terminate the TLS/TCP session and restart from the HELLO
    state.</t>

    <t>If an OPEN arrives at L3ND speaker A from B with which A believes
    it already has an L3ND session (i.e. OPENs have already been
    exchanged), and the Serial Number in B's OPEN PDU is non-zero,
    speaker A SHOULD establish a new sending session by sending an OPEN
    with the Serial Number being the same as that of A's last sent and
    ACKed PDU.  A MUST resume sending encapsulations etc. subsequent to
    the requested Sequence Number.  And B MUST retain all previously
    discovered encapsulation and other data received from A.</t>

    <t>If an OPEN arrives at L3ND speaker A from B with which A believes
    it already has an L3ND session (i.e. OPENs have already been
    exchanged), and the Serial Number in B's OPEN is zero, then the A
    MUST assume that B's internal state has been reset.  All Previously
    discovered encapsulation data MUST BE discarded; and A MUST respond
    with a new OPEN PDU with a Serial Number of zero.</t>

    <t>TCP KeepAlives should be configured and tuned to meet local
    operational needs.  Some defaults and recommendations are needed
    here.</t>

    </section>

  <section anchor="ack" title="ACK">

    <t>The ACK PDU acknowledges receipt of a PDU and reports any error
    condition which might have been raised.</t>

<!--
    protocol "Version = 0:8,PDU Type = 3:8,Payload Length = 6:32,ACKed PDU:8,EType:8,Error Code:16,Error Hint:16"
-->

    <figure>
        <artwork>
 0                   1                   2                   3  
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Version = 0  |  PDU Type = 3 |       Payload Length = 6      ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                               |   ACKed PDU   |     EType     |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|           Error Code          |           Error Hint          |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        </artwork>
      </figure>

    <t>The ACK PDU acknowledges receipt of an OPEN, Encapsulation,
    Vendor PDU, etc. and is used to return error codes if any.</t>

    <t>The one octet ACKed PDU is the PDU Type of the PDU being
    acknowledged, e.g., OPEN, one of the Encapsulations, etc.</t>

    <t>If there was an error processing the received PDU, then the one
    octet EType is non-zero.  If the EType is zero, Error Code and Error
    Hint MUST also be zero.</t>

    <t>A non-zero EType is the receiver's way of telling the PDU's
    sender that the receiver had problems processing the PDU.  The Error
    Code and Error Hint will tell the sender more detail about the
    error.</t>

    <t>The decimal value of EType gives a strong hint how the receiver
    sending the ACK believes things should proceed:
    <list style="empty">
<?rfc subcompact="yes"?>
        <t>0 - No Error, Error Code and Error Hint MUST be zero</t>
        <t>1 - Warning, something not too serious happened, continue</t>
        <t>2 - Session should not be continued, try to restart</t>
        <t>3 - Restart is hopeless, call the operator</t>
        <t>4-15 - Reserved</t>
<?rfc subcompact="no"?>
        </list></t>

    <t>The two octet Error Code, noting protocol failures, are listed in
    <xref target="iana-error"/>.  Someone stuck in the 1990s might think
    the catenation of EType and Error Code as an echo of 0x1zzz, 0x2zzz,
    etc.  They might be right; or not.</t>

    <t>The two octet Error Hint, is arbitrary additional data the sender
    of the error PDU thinks will help the recipient or the debugger with
    the particular error.</t>

    <section anchor="retrans" title="Retransmission">

      <t>If a PDU sender expects an ACK, e.g. for an OPEN, an
      Encapsulation, a Vendor PDU, etc., and does not receive the ACK
      for a configurable time (default five seconds) the TLS/TCP
      session should be closed or dropped, and both sides revert to
      HELLO state.</t>

      </section>
      
    </section>

  <section anchor="afisafi" title="The Encapsulations">

    <t>Once the devices know each other's IP Addresses, and have
    established a TLS/TCP session and have successfully exchanged OPENs,
    the L3ND session is considered established, and the devices SHOULD
    exchange Layer-3 interface encapsulations, Layer-3 addresses, and Layer-2.5
    labels.</t>

    <t>Encapsulation data for any AFI/SAFI may be exchanged over a
    TLS/TCP session irrespective of the AFI/SAFI of the session
    transport.</t>

    <t>The Encapsulation types the peers exchange may be IPv4 (<xref
    target="ipv4"/>), IPv6 (<xref target="ipv6"/>), MPLS IPv4 (<xref
    target="mpls4"/>), MPLS IPv6 (<xref target="mpls6"/>), and/or
    possibly others not defined here.</t>

    <t>The sender of an Encapsulation PDU MUST NOT assume that the
    receiver is capable of the same Encapsulation Type.  An ACK (<xref
    target="ack"/>) with EType of 0 merely acknowledges receipt.  Only
    if both peers have sent the same Encapsulation Type is it safe for
    Layer-3 protocols to assume that they are compatible for that
    Encapsulation Type.</t>

    <t>A receiver of an encapsulation might recognize an addressing
    conflict, such as both ends of the link trying to use the same
    address.  In this case, the receiver MUST respond with an error
    (Error Code 2, Logical Link Addressing Conflict) ACK.  As there may
    be other usable addresses or encapsulations, this error might log
    and continue, letting an upper layer topology builder deal with what
    works.</t>

    <t>Further, to consider a logical link of a Encapsulation Type to
    formally be established so that it may be used by other protocols,
    the addressing for the type must be compatible, e.g. on the same IP
    subnet.</t>

    <section anchor="encaps" title="The Encapsulation PDU Skeleton">

      <t>The header for all encapsulation PDUs is as follows:</t>

<!--
    protocol "Version = 0:8,PDU Type:8,Payload Length:32,Count:24,Serial Number:32,Encapsulation List...:24"
-->

      <figure>
        <artwork> 
 0                   1                   2                   3  
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Version = 0  |    PDU Type   |         Payload Length        ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                               |             Count             ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~               |                 Serial Number                 ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~               |             Encapsulation List...             ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        </artwork>
      </figure>

      <t>An Encapsulation PDU describes zero or more addresses of the
      encapsulation type.</t>

      <t>The three octet Count is the number of Encapsulations in the
      Encapsulation List.</t>

      <t>The Serial Number is a monotonically increasing four octet
      value representing the sender's state in time.  It may be an
      integer, a timestamp, etc.  On session restart (new OPEN), a
      receiver MAY send the last received Serial Number to request the
      sender to only send newer data.</t>

      <t>If a sender has multiple links on the same interface, separate
      state: data, ACKs, etc. must be kept for each peer session.</t>

      <t>Over time, multiple Encapsulation PDUs may be sent for an
      interface in a session as configuration changes.</t>

      <t>The Receiver MUST acknowledge the Encapsulation PDU with an ACK
      PDU (<xref target="ack"/>) with the Type field being that of the
      Type of the Encapsulation PDU being announced, see <xref
      target="ack"/>.</t>

      <t>If the Sender does not receive an ACK in a configurable
      interval (default five seconds), they SHOULD retransmit.  After a
      user configurable number of failures (default three), the L3ND
      session should be considered dead, TLS/TCP torn down, and the
      HELLO process SHOULD be restarted.</t>

      <t>If the link is broken below layer-3, retransmission MAY BE
      retried if data have not changed in the interim and the TLS/TCP
      session is still alive.</t>

      <t>Should an Encapsulation in the Encapsulation List be
      syntactically invalid, e.g. an out of bounds prefix length, the
      entire Encapsulation PDU MUST be dropped and the sending party
      notified by an ACK PDU with an EType of 1 and an Error Code of 3,
      Encapsulation Error.</t>

      </section>
    
   <section anchor="eflags" title="Encapsulaion Flags">

     <t>The one octet Encapsulation Flags field is a sequence of one bit
     fields as follows:</t>

      <figure>
          <artwork>
 0           1            2            3            4  ...       7
+------------+------------+------------+------------+------------+
|  Ann/With  |   Primary  | Under/Over |  Loopback  | Reserved ..|
+------------+------------+------------+------------+------------+
        </artwork>
      </figure>

      <t>Each encapsulation in an Encapsulation PDU of Type T may
      announce new and/or withdraw old encapsulations of Type T.  It
      indicates this with the Ann/With Encapsulation Flag, Announce ==
      1, Withdraw == 0.</t>

      <t>Announcing an encapsulation which already exists SHOULD raise
      an Announce/Withdraw Error (see <xref target="iana-error"/>); the
      EType SHOULD be 2, suggesting a session restart (see <xref
      target="ack"/>) so all encapsulations will be resent.</t>

      <t>If an interface on a link has multiple addresses for an
      encapsulation type, one and only one address MAY be marked as
      primary (Primary Flag == 1) for that Encapsulation Type.</t>

      <t>An Encapsulation interface address in an Encapsulation PDU MAY
      be marked as a loopback, in which case the Loopback bit is set.
      Loopback addresses are generally not seen directly on an external
      interface.  One or more loopback addresses MAY be exposed by
      configuration on one or more L3ND speaking external interfaces,
      e.g. for iBGP peering.  They SHOULD be marked as such, Loopback
      Flag == 1.</t>

      <t>Each Encapsulation interface address in an Encapsulation PDU is
      that of the direct 'underlay interface (Under/Over == 1), or an
      'overlay' address (Under/Over == 0), likely that of a VM or
      container guest bridged or configured on to the interface already
      having an underlay address.</t>
      
      </section>

    <section anchor="ipv4" title="IPv4 Encapsulation">

      <t>The IPv4 Encapsulation describes a device's ability to exchange
      IPv4 packets on one or more subnets.  It does so by stating the
      interface's addresses and the corresponding prefix lengths.</t>
  
<!--
    protocol "Version = 0:8,PDU Type = 4:8,Payload Length:32,Count:24,Serial Number:32,Encaps Flags:8,IPv4 Address:32,PrefixLen:8,more ...:8"
-->

    <figure>
        <artwork>
 0                   1                   2                   3  
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Version = 0  |  PDU Type = 4 |         Payload Length        ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                               |             Count             ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~               |                 Serial Number                 ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~               |  Encaps Flags |          IPv4 Address         ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                               |   PrefixLen   |    more ...   ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        </artwork>
      </figure>

      <t>The three octet Count is the sum of the number of IPv4
      Encapsulations being announced and/or withdrawn.</t>

      </section>

    <section anchor="ipv6" title="IPv6 Encapsulation">

      <t>The IPv6 Encapsulation describes a link's ability to
      exchange IPv6 packets on one or more subnets.  It does so by
      stating the interface's addresses and the corresponding prefix
      lengths.</t>
    
<!--
    protocol "Version = 0:8,PDU Type = 5:8,Payload Length:32,Count:24,Serial Number:32,Encaps Flags:8,IPv6 Prefix:128,PrefixLen:8,more ...:8"
-->

      <figure>
          <artwork>
 0                   1                   2                   3  
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Version = 0  |  PDU Type = 5 |         Payload Length        ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                               |             Count             ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~               |                 Serial Number                 ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~               |  Encaps Flags |                               ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               +
|                                                               ~
+                                                               +
~                                                               ~
+                                                               +
~                          IPv6 Prefix                          ~
+                               +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                               |   PrefixLen   |    more ...   ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
          </artwork>
        </figure>
    
      <t>The three octet Count is the sum of the number of IPv6
      Encapsulations being announced and/or withdrawn.</t>

      </section>

    <section anchor="mplslist" title="MPLS Label List">

      <t>As an MPLS enabled interface may have a label stack, see <xref
      target="RFC3032"/>, a variable length list of labels is needed.
      These are the labels the sender will accept for the prefix to
      which the list is attached.</t>

<!--
    protocol "Label Count:8,Label:20,Exp:3,S:1,Label:20,Exp:3,S:1,more ...:8"
-->

    <figure>
      <artwork>
 0                   1                   2                   3  
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Label Count  |                 Label                 | Exp |S|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                 Label                 | Exp |S|    more ...   ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        </artwork>
      </figure>

      <t>A one octet Label Count of zero is an implicit withdraw of all
      labels for that prefix on that interface.</t>

      <t>The bottom of the stack flag, S, MUST be set on one and only
      one label.  Should this not be the case, the receiver of the
      erroneous PDU MUST respond with an ACK PDU of EType 1 and Error
      Code 1, MPLS Error.</t>

      </section>

    <section anchor="mpls4" title="MPLS IPv4 Encapsulation">

      <t>The MPLS IPv4 Encapsulation describes a logical link's ability
      to exchange labeled IPv4 packets on one or more subnets.  It does
      so by stating the interface's addresses the corresponding prefix
      lengths, and the corresponding labels which will be accepted for
      each address.</t>
    
<!--
    protocol "Version = 0:8,PDU Type = 6:8,Payload Length:32,Count:24,Serial Number:32,Encaps Flags:8,MPLS Label List ...:16,IPv4 Address:32,PrefixLen:8,more ...:24"
-->

    <figure>
      <artwork>
 0                   1                   2                   3  
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Version = 0  |  PDU Type = 6 |         Payload Length        ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                               |             Count             ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~               |                 Serial Number                 ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~               |  Encaps Flags |      MPLS Label List ...      ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                          IPv4 Address                         |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   PrefixLen   |                    more ...                   ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        </artwork>
      </figure>
    
      <t>The three octet Count is the sum of the number of MPLSv4
      Encapsulation being announced and/or withdrawn.</t>

      </section>

    <section anchor="mpls6" title="MPLS IPv6 Encapsulation">

      <t>The MPLS IPv6 Encapsulation describes a logical link's ability
      to exchange labeled IPv6 packets on one or more subnets.  It does
      so by stating the interface's addresses, the corresponding prefix
      lengths, and the corresponding labels which will be accepted for
      each address.</t>
<!--
    protocol "Version = 0:8,PDU Type = 7:8,Payload Length:32,Count:24,Serial Number:32,Encaps Flags:8,MPLS Label List ...:16,IPv6 Address:128,Prefix Len:8,more ...:24"
-->

    <figure>
      <artwork>
 0                   1                   2                   3  
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Version = 0  |  PDU Type = 7 |         Payload Length        ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                               |             Count             ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~               |                 Serial Number                 ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~               |  Encaps Flags |      MPLS Label List ...      |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|                                                               ~
+                                                               +
~                                                               ~
+                          IPv6 Address                         +
~                                                               ~
+                                                               +
~                                                               |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|   Prefix Len  |                    more ...                   ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        </artwork>
      </figure>

      <t>The three octet Count is the sum of the number of MPLSv6
      Encapsulations being announced and/or withdrawn.</t>

      </section>
    </section>

  <section anchor="vendor" title="VENDOR - Vendor Extensions">
      
<!--
    protocol "Version = 0:8,PDU Type = 255:8,Payload Length:32,Serial Number:32,Enterprise Number:24,Ent Type:8,Enterprise Data ...:16"
-->

    <figure>
        <artwork>
 0                   1                   2                   3  
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|  Version = 0  | PDU Type = 255|         Payload Length        ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                               |         Serial Number         ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~                               |       Enterprise Number       ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
~               |    Ent Type   |      Enterprise Data ...      ~
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
        </artwork>
      </figure>

    <t>Vendors or enterprises may define TLVs beyond the scope of L3ND
    standards.  This is done using a Private Enterprise Number <xref
    target="IANA-PEN"/> followed by Enterprise Data in a format defined
    for that three octet Enterprise Number and one octet Ent Type.</t>

    <t>Ent Type allows a Vendor PDU to be sub-typed in the event that
    the vendor/enterprise needs multiple PDU types.</t>

    <t>As with Encapsulation PDUs, a receiver of a Vendor PDU MUST
    respond with an ACK PDU, possibly signalling an error.  Similarly, a
    Vendor PDU MUST only be sent over an open session.</t>

    </section>

  <section anchor="discuss" title="Discussion">

    <t>This section explores some trade-offs taken and some
    considerations.</t>

    <section anchor="dhello" title="HELLO Discussion">

      <t>A device may send IP packets over a Layer-3 interface which
      transmits data over a single Layer-2 interface or multiple Layer-2
      interfaces.  Packets sourced by one Layer-3 IP interface over
      multiple Layer-2 should consider that a Layer-3 interface with
      multiple Layer-2 interfaces could have many devices which might
      come at various times, therefore the configured HELLO PDU
      retransmit time SHOULD be set to a non-zero value, and periodic
      HELLOs should continue.  Packets transmitted on a single Layer-2
      interface on a point-to-point (p2p) connection, MAY set the
      configuration value to zero, so when a TLS/TCP session is up,
      HELLOs are no longer desirable.</t>

      <t>A device with multiple Layer-2 interfaces, traditionally called
      a switch, may be used to forward packets from multiple devices to
      one Layer-3 interface, I, on an L3ND speaking device.  Interface I
      could discover a peer J across the switch.  Later, a prospective
      peer K could come up across the switch.  If I was not still
      sending and listening for HELLOs, the potential peering with K
      could not be discovered.  Therefore, on multi-link interfaces,
      L3ND MUST continue to send HELLOs as long as they are turned
      up.</t>

      </section>
     
    </section>
     
  <section anchor="vlans" title="VLANs/SVIs/Sub-interfaces">

    <t>One can think of the protocol as an instance (i.e. state machine)
    which runs on each logical link of a device.</t>

    <t>As the upper routing layer must view VLAN topologies as separate
    graphs, L3ND treats VLAN ports as separate links.</t>

    <t>As Sub-Interfaces each have their own layer-3 identities, they
    act as separate interfaces, forming their own links.</t>

    </section>

  <section anchor="impl" title="Implementation Considerations">

    <t>An implementation SHOULD provide the ability to configure each
    logical interface as L3ND speaking or not.</t>

    <t>An implementation SHOULD provide the ability to distribute one or
    more loopback addresses or interfaces into L3ND on an external L3ND
    speaking interface.</t>
    
    <t>An implementation SHOULD provide the ability to distribute one or
    more overlay and/or underlay addresses or interfaces into L3ND on an
    external L3ND speaking interface.</t>
    
    <t>An implementation SHOULD provide the ability to configure one of
    the addresses of an encapsulation as primary on an L3ND speaking
    interface.  If there is only one address for a particular
    encapsulation, the implementation MAY mark it as primary by
    default.</t>

    <t>An implementation MAY allow optional configuration which updates
    the local forwarding table with overlay and underlay data both
    learned from L3ND peers and configured locally.</t>
    
    </section>

  <section anchor="security" title="Security Considerations">

    <t>For TLS, versions greater than 1.1 MUST be used.</t>      

    <t>The protocol as is MUST NOT be used outside a datacenter or
    similarly closed environment without using TLS encapsulation which
    is based on a configured CA trust anchor.</t>

    <t>Many datacenter operators have a strange belief that physical
    walls and firewalls provide sufficient security.  This is not
    credible.  All DC protocols need to be examined for exposure and
    attack surface.  In the case of L3ND, authentication and integrity
    as provided by TLS validated to a configured shared CA trust anchor
    is strongly RECOMMENDED.</t>

    <t>It is generally unwise to assume that on the wire Layer-3 is
    secure.  Strange/unauthorized devices may plug into a port.
    Mis-wiring is very common in datacenter installations.  A poisoned
    laptop might be plugged into a device's port, form malicious
    sessions, etc. to divert, intercept, or drop traffic.</t>

    <t>Similarly, malicious nodes/devices could mis-announce
    addressing.</t>

    <t>If OPEN PDUs are not over validated TLS, an attacker could forge
    an OPEN for an existing session and cause the session to be
    reset.</t>

    </section>

    <section anchor="iana" title="IANA Considerations">

      <section anchor="iana-l3addr" title="Link Local Layer-3 Addresses">

        <t>IANA is requested to assignment one address (TBD1) for
        L3DL-L3-LL from the IPv4 Multicast Address Space Registry from
        the Local Network Control Block (224.0.0.0 - 224.0.0.255
        (224.0.0/24)).</t>

	<t>IANA is requested to assign one address (TBD2) for L3DL-L3-LL
	from the IPv6 Multicast Address Space Registry in the the IPv6
	Link-Local Scope Multicast address (TBD:2).</t>

      </section>

      <section anchor="iana-ports" title="Ports for TLS/TCP">

	<t>This document requests the IANA to assign a well-known TCP
	Port Number (TBD3) to the Layer-3 Neighbor Discovery Protocol for the
	following, see <xref target="tcp"/>:</t>

          <figure>
            <artwork>
          l3nd-server
             </artwork>
          </figure>
	  
      </section>

    <section anchor="iana-types" title="PDU Types">

      <t>This document requests the IANA create a registry for L3ND PDU
      Type, which may range from 0 to 255.  The name of the registry
      should be L3ND-PDU-Type.  The policy for adding to the registry is
      RFC Required per <xref target="RFC5226"/>, either standards track or
      experimental.  The initial entries should be the following:</t>
          <figure>
            <artwork>
          PDU
          Code      PDU Name
          ----      -------------------
            0       HELLO
            1       reserved
            2       OPEN
            3       ACK
            4       IPv4 Announcement
            5       IPv6 Announcement
            6       MPLS IPv4 Announcement
            7       MPLS IPv6 Announcement
            8-254   Reserved
            255     Vendor
             </artwork>
           </figure>

      </section>
           
    <section anchor="iana-flags" title="Flag Bits">
        
     <t>This document requests the IANA create a registry for L3ND PL
      Flag Bits, which may range from 0 to 7.  The name of the registry
      should be L3ND-PL-Flag-Bits.  The policy for adding to the registry is
      RFC Required per <xref target="RFC5226"/>, either standards track or
      experimental.  The initial entries should be the following:</t>
          <figure>
            <artwork>
          Bit     Bit Name
          ----    -------------------
           0      Announce/Withdraw (ann == 0)
           1      Primary
           2      Underlay/Overlay (under == 0)
           3      Loopback
           4-7    Reserved
             </artwork>
           </figure>

      </section>
           
    <section anchor="iana-error" title="Error Codes">
        
      <t>This document requests the IANA create a registry for L3ND Error
      Codes, a 16 bit integer.  The name of the registry should be
      L3ND-Error-Codes.  The policy for adding to the registry is RFC
      Required per <xref target="RFC5226"/>, either standards track or
      experimental.  The initial entries should be the following:</t>
          <figure>
            <artwork>
          Error
          Code    Error Name
          ----    -------------------
            0     No Error
            1     MPLS Error
            2     Logical Link Addressing Conflict
            3     Encapsulation Error
            4     Announce/Withdraw Error
	    5	  Session May Not Be Continued
             </artwork>
           </figure>

      </section>
           
    </section>

  <section anchor="acks" title="Acknowledgments">

    <t>The authors thank Ben Maddison and Jeff Haas.</t>
    
    </section>

  </middle>

<back>

  <references title="Normative References">
    <?rfc include="reference.RFC.2119.xml"?>
    <?rfc include="reference.RFC.3032.xml"?>
    <?rfc include="reference.RFC.4271.xml"?>
    <?rfc include="reference.RFC.5082.xml"?>
    <?rfc include="reference.RFC.5226.xml"?>
    <?rfc include="reference.RFC.5280.xml"?>
<!--<?rfc include="reference.RFC.6286.xml"?> -->
    <?rfc include="reference.RFC.8174.xml"?>
    <?rfc include="reference.RFC.8635.xml"?>
    <?rfc include="reference.I-D.ietf-lsvr-l3dl.xml"?>
<!--<?rfc include="reference.I-D.ietf-lsvr-l3dl-signing.xml"?> -->
    <reference anchor="IANA-PEN" target="https://www.iana.org/assignments/enterprise-numbers/enterprise-numbers">
      <front>
        <title>IANA Private Enterprise Numbers</title>
        <author/>
        <date/>
        </front>
      </reference>
    </references>

  <references title="Informative References">
    <?rfc include="reference.RFC.1122.xml"?>
    <?rfc include="reference.RFC.1982.xml"?>
    <?rfc include="reference.RFC.4760.xml"?>
    <?rfc include="reference.I-D.ymbk-idr-l3nd-ulpc.xml"?>
    <reference anchor="Clos" target="https://en.wikipedia.org/wiki/Clos_network/">
      <front>
        <title>Clos Network</title>
        <author/>
        <date/>
        </front>
      </reference>
    </references>
    
  </back>
</rfc>
