<?xml version="1.0" encoding="US-ASCII"?>
<!-- $Id: draft-boutros-spring-ng-vpls-00.xml 2015-07-05 sboutros $ -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!ENTITY rfc2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
]>
<rfc category="std" docName="draft-boutros-bess-elan-services-over-sr-03"
     ipr="trust200902" updates="">
  <?rfc toc="yes" ?>

  <?rfc compact="yes"?>

  <?rfc subcompact="no"?>

  <?rfc symrefs="yes"?>

  <?rfc sortrefs="yes" ?>

  <front>
    <title abbrev="ELAN Services with Segment Routing">A Simplified Scalable ELAN Service Model with Segment Routing Underlay</title>

    <author fullname="Sami Boutros" initials="S." surname="Boutros" role="editor">
   <organization>Ciena Corporation</organization>
      <address>
        <postal>
          <street></street>

          <city></city>

          <code></code>

          <region></region>

          <country>USA</country>
        </postal>

        <email>sboutros@ciena.com</email>
      </address>
    </author>

    <author fullname="Siva Sivabalan" initials="S." surname="Sivabalan" role="editor">
      <organization>Ciena Corporation</organization>
      <address>
        <postal>
          <street></street>

          <city></city>

          <code></code>

          <region></region>

          <country>Canada</country>
        </postal>

        <email>ssivabal@ciena.com</email>
      </address>
    </author>

    <author fullname="Himanshu Shah" initials="H." surname="Shah">
      <organization>Ciena Corporation</organization>

      <address>
        <postal>
          <street></street>

          <city></city>

          <code></code>

          <region></region>

          <country>USA</country>
        </postal>

        <email>hshah@ciena.com</email>
      </address>
    </author>

    <author fullname="James Uttaro" initials="J." surname="Uttaro">
      <organization>ATT</organization>

      <address>
        <postal>
          <street></street>

          <city></city>

          <code></code>

          <region></region>

          <country>USA</country>
        </postal>

        <email>ju1738@att.com</email>
      </address>
    </author>

    <author fullname="Daniel Voyer" initials="D." surname="Voyer">
      <organization>Bell Canada</organization>
    
      <address>
        <postal>
          <street></street>

          <city></city>

          <code></code>

          <region></region>

          <country>Canada</country>
        </postal>

        <email>daniel.voyer@bell.ca</email>
      </address>
    </author>

<author fullname="Bin Wen" initials="B." surname="Wen">
      <organization>Comcast</organization>

      <address>
        <postal>
          <street></street>

          <city></city>

          <code></code>

          <region></region>

          <country>USA</country>
        </postal>

        <email>bin_wen@cable.comcast.com</email>
      </address>
    </author>

   <author fullname="Luay Jalil" initials="L." surname="Jalil">
      <organization>Verizon</organization>

      <address>
        <postal>
          <street></street>

          <city></city>

          <code></code>

          <region></region>

          <country>USA</country>
        </postal>

        <email>luay.jalil@verizon.com</email>
      </address>
    </author>


    <date year="2021"/>

    <area>Routing</area>

    <workgroup>BESS Workgroup</workgroup>

<abstract>

<t>This document proposes a new approach for deploying Ethernet LAN (ELAN) services with an objective of achieving high scalability, faster network convergence, and reduced operational complexity. Furthermore, it naturally brings the benefits of All-Active multihoming as well as MAC learning in data-plane.
</t> 

</abstract>
  </front>

  <middle>
    <section title="Introduction">

<t>Virtual Private LAN Service(VPLS) is based on Pseudo-Wire (PW) construct which identifies both the service type and the service termination node in both control and data planes. RFCs 4761 and 4762 specify mechanisms to signal PW for VPLS services using BGP and LDP respectively. An ingress Provider Edge (PE) node needs to maintain a PW per VPLS instance for each egress PE node. So, if we assume 10K ELAN instances over a network of 100 PE nodes, each PE node needs to setup and maintain approximately 1M PWs which can easily become a scalability bottleneck in large scale deployment.</t>

<t>As described in RFC7432, Ethernet Virtual Private Network (EVPN) technology builds ELAN services similar to BGP-based IP-VPN services with additional features such as MAC address learning in control lane, All-Active multihoming, etc. It eliminates the need for PWs, and hence the scale problem associated with PWs. However, an egress PE node cannot unambiguously identify ingress PE node in data-plane. As such, EVPN requires control plane mechanisms for MAC advertisement and learning which increases control plane complexity and overhead.
</t>


<t>The goal of the proposed approach is to greatly simplify control plane functions and minimize the amount of control plane messages PE nodes have to process. In this version of the document, we assume Segment Routing (SR) underlay network. A future version of this document will generalize the underlay network to both classical MPLS and SR technologies.
</t>

<t>The proposed approach does not require PW, and hence the control plane complexity and message overhead associated with signaling and maintaining PWs are eliminated.
</t>

<t>An ELAN instance is uniquely identified by Segment ID (SID) regardless of the number of service termination points. Such a SID will be referred to as "Service SID" in the rest of the document. The number of states maintained at a PE node is equal to the number of ELAN instances in the corresponding broadcast domain. Referring to the above example, each PE node now needs to maintain states for 10K ELAN service instances as opposed to 1 M PWs in the case of classical VPLS model in data and control planes. A node can advertise service SID(s) of the ELAN instance(s) that it hosts via BGP for auto-discovery purpose. A Service SID can be:

     <list style="symbols">
		<t>MPLS label for SR-MPLS.</t>
		<t>uSID (micro SID) for SRv6 representing network function associated with an ELAN service instance.</t>
	</list>
</t>


<t>MAC address is learned in data-plane. Source node of a MAC address is identified by its node SID (assigned for regular SR operation) during MAC learning phase. In the data packets, the node SID of the source is inserted directly below the service SID so that a destination node can uniquely identify the source of the packets in an SR domain.
</t>

<t>ELAN service instances are advertised such that a service message packs as many ELAN instances hosted by the advertising PE node as possible at the time of advertisement. A possible approach is to use a bit-map in which each bit position represents an ELAN instance, as well as the starting value of Service SID. Using these parameters, an ingress PE receiving advertisements node can learn ELAN instance(s) hosted by an egress PE node.
</t>

<t>All-Active multihoming redundancy is supported at the underlay level by making use of SR anycast SID. No overlay mechanism is required for this purpose.
</t>


<t>Each node is also associated with another SID unique within the broadcast domain that is used to identify incoming Broadcast Unknown-unicast, and Multicast (BUM) traffic. We call such SID BUM SID. If node A wants to send BUM traffic to node B, it needs to use BUM SID assigned to node B as a destination SID. BUM SIDs can also be advertised via BGP for auto-discovery purpose. In order to send BUM traffic within a broadcast domain, P2MP SR policies can be used. Such policies may or may not be shared by ELAN instances.
</t>

<t>The proposed solution can also be applicable to the EVPN control plane without compromising its benefits such as All-Active multihoming on access, multipathing in the core, auto-provisioning and auto-discovery, etc. With this approach, the need for advertising EVPN route types 1 through 4 as well Split-Horizon (HP) label is eliminated.
</t> 

   
<t>In the following sections, we will describe the functionalities of the proposed approach in detail.
</t>


</section>

<section title="Terminology">
<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL 	NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in <xref target="RFC2119"/>.</t>
</section>

<section title="Abbreviations">

<t>BUM: Broadcast, unicast and multicast. </t> 

<t>CE: Customer Edge node e.g., host or router or switch.</t> 

<t>ELAN: Ethernet LAN.</t>

<t>EVPN: Ethernet VPN.</t>

<t>MAC: Media Access Control.</t> 

<t>MAC-VRF: A Virtual Routing and Forwarding table for Media 	Access Control (MAC) addresses on a PE.</t>

<t>MH: Multi-Home. </t>

<t>OAM: Operations, Administration and Maintenance.</t>  

<t>PE: Provide Edge Node.</t>

<t>SID: Segment Identifier.</t>

<t>SR: Segment Routing.</t>

<t>VPLS: Virtual Private LAN Service. </t>

</section>

<section title="Control Plane Behavior">
<section title="Service discovery">

<t>A node can discover ELAN service instances as well as the associated service SIDs hosted on other nodes via configuration or auto-discovery. With the latter, the service SIDs can be advertised using BGP. As mentioned earlier such update message will pack information about as many ELAN instances hosted by the advertising PE node to reduce the amount of update messages exchanged by PE nodes.</t>

<t>Similar to the service SID, an ingress PE node can discover BUM SID associated with an egress PE node via configuration or auto-discovery.</t>

<t>The necessary BGP extensions will be specified in a future version of this document.</t>

</section>

<section title="All-Active Service Redundancy">

<t>An anycast SID per Ethernet Segment (ES) can be associated with the PE nodes attached to a Multi-Home (MH) CE. The anycast SIDs will be advertised in BGP by the PE nodes. Based on ES anycast SIDs, ingress PEs receiving updates can discover the redundancy membership and perform DF election. Aliasing/Multipathing can be achieved using the same mechanisms excercised by SR underlay for forwarding traffic to destinations belonging to anycast group.
</t>

</section>

<section title="Mass service withdrawal">

<t>Node failure can be detected due via IGP convergence. For faster detection of node failure, mechanism like BFD can be deployed. The proposed approach does not require additional MAC withdrawal mechanism.</t>

<t>On PE-CE link failure, the corresponding PE node withdraws the route to the corresponding ES in BGP in order to stop receiving traffic to that ES. With MH case with anycast SID, upon detecting a failure on PE-CE link, a PE node may forward incoming traffic to the impacted ES(s) to other PE node(s) that is/are part of the anycast group until it withdraws routes to the impacted ES(s) for faster convergence. For example, in Figure 1, assuming PE5 and PE6 are part of an anycast group, upon link failure between PE5 and CE5, PE5 can forward the received packets from the core to PE6 until it withdraws the anycast SID associated with the ES(s).</t>

</section>

<section title="E-Tree Support">
<t>To be covered in the next revision of this document.</t>
</section>

</section>

<section title="Data Plane Behavior">

<figure align="left">
          <preamble/>

          <artwork align="left"><![CDATA[ 
                                     ____ CE3
                                    /               ____CE1
                         --------  PE3 ---------  /
                        /                       PE1
                       /                         | \
                      PE5                        |  \
                     /|                          |   \
                    / | Service Provider Network |    \
                CE5   |                          |     CE2
                   \  |                          |   /
                    \ |                         PE2_/
                      PE6                       /
                      /  --------  PE4  --------
              CE6___ /     CE4_____/
                                         
                                         
      Figure 1: Reference network diagram used for examples below
]]></artwork>
        </figure>

<section title="Unicast Traffic">

<t>The proposed method requires unicast data packet be formed as shown in Figure 2.</t>

<figure align="left">
          <preamble/>

          <artwork align="left"><![CDATA[

                      +-------------------------------+
                      | SID(s) to reach destination   |
                      +-------------------------------+
                      |          Service SID          |
                      +-------------------------------+
                      |        Source node SID        |
                      +-------------------------------+
                      |        Layer-2 Payload        |
                      +-------------------------------+

        Figure 2: Data packet format for unicast traffic 
]]></artwork>
        </figure>

	<t>
	<list style="symbols">
	<t>SID(s) to reach destination: depends on the intent of the underlay transport:
	
	<list style="symbols">
<t>IGP shortest path: node SID of the destination. The destination can belong to an anycast group.</t>
<t>IGP path with intent: Flex-Algo SID if the destination can be reached using the Flex-Algo SID for a specific intent (e.g., low latency). The destination can belong to an anycast group.</t>
<t>SR policy (to support fine intent): a SID-list for the SR policy that can be used to reach the destination.</t>
	</list>
</t>	
	<t>Service SID: The SID that uniquely identifies an ELAN instance in a broadcast domain.</t>

	<t>Source node SID: The SID that uniquely identifies the source node. This can be a node SID which may be part of an anycast group. Note that such a SID is allocated as part of SR underlay operation, and the proposed approach does not impose any additional requirement.</t>

	</list>
	</t>

</section>

<section title="BUM Traffic">

<t>In order to identify incoming BUM traffic a unique SID (which will be referred to as "BUM SID" in the rest of the document) per PE node is allocated. A BUM packet is formatted as shown in Figure 3:</t>

<t>
<figure align="left">
          <preamble/>

          <artwork align="left"><![CDATA[

                      +-------------------------------+
                      |            BUM SID            |
                      +-------------------------------+
                      |          Service SID          |
                      +-------------------------------+
                      |         Source node SID       |
                      +-------------------------------+
                      |        Layer-2 Payload        |
                      +-------------------------------+

        Figure 3: Data packet format for BUM traffic
 
]]></artwork>
        </figure>
</t>

<t>In order to send BUM traffic, a P2MP SR policy may be established from a given node to rest of the nodes associated with an ELAN instance. If a dedicated P2MP SR policy is used per ELAN instance, a single SID may be used as both replication SID for the P2MP SR policy as well as to identify ELAN instance. With this approach, the number of SIDs imposed on data packet will be only two. It is possible to use a given P2MP SR policy for multiple ELAN instances in which case service SID needs to be inserted in the packet for egress PE to identify the ELAN instance for the BUM traffic.</t>

</section>

<section title="Data Plane MAC learning">
	
<t>With the proposed approach, MAC address can be learned in data-	plane using the packets formatted as shown in Figure 4.</t>

<t>Source MAC address on the received Layer 2 packet is learned against the source node SID placed directly under the service SID in the data-plane.</t>

<section title="Single Home CE">

	<t> In Figure 1, node 3 learns a MAC address from CE3 and floods it to all nodes configured with the same service SID. Nodes 1, 2, 4, 5 and 6 learn the MAC address as reachable via the source node SID of Node 3.</t>
  
	<figure align="left">
          <preamble/>

          <artwork align="left"><![CDATA[
                      
                      +-----------------------------+
                      | Tree SID/Broadcast Node SID |
                      +-----------------------------+
                      |  Service SID                |
                      +-----------------------------+
                      |  Node SID of node 3         |
                      +-----------------------------+
                      |  Layer-2 Packet             |
                      +-----------------------------+
                 
        Figure 4: Packet format used for flooding
]]></artwork>
        </figure>

	</section>

	<section title="Multi-Home CE">

<t>Referring to Figure 1, let's assume that node 5 learns a MAC address from MH CE5, and floods it to all nodes in data-plane as per SID stack shown in Figure 5, including node 6. The receiving nodes learn the MAC address as reachable via the anycast SID belonging to node 5 and node 6. Node 6 applies SH and hence does not send the packet back to CE5, but treats the MAC address as reachable via CE5, as well floods the address to CE6.</t>

    	<t>The following diagram shows SID label stack for a Broadcast and Multicast MAC frame sent by Multi-Home PE. Note the presence of source SID after the service SID. This combination/order is necessary for the receiver to learn source MAC address (from L2 packet) associated with ingress PE (i.e. source node SID).</t>

  	<figure align="left">
          <preamble/>

          <artwork align="left"><![CDATA[

                      +-----------------------------+
                      | Tree SID/Broadcast Node SID |
                      +-----------------------------+
                      |  Service SID                |
                      +-----------------------------+
                      |  Source Node SID            |
                      +-----------------------------+
                      |  Layer-2 Packet             |
                      +-----------------------------+

        Figure 5: Data packet format for traffic sent by a MH PE 
	]]></artwork>
        </figure>

	</section>


</section>

<section title="ARP suppression">
<t>Gleaning ARP packet requests and replies will be used to learn IP/MAC binding for ARP suppression. ARP replies are unicast, however flooding ARP replies can allow all nodes to learn the 	MAC/IP bindings for the destinations too.</t>

</section>

<section title="Distributed Anycast Gateway">
<t>Distributed Anycast Gateway (GW) (aka inter-subnet IRB function) can be realized as follows:

     <list style="symbols">
		<t>All PEs connected to the tenant subnets share the same GW IP/MAC per subnet.</t>
		<t>A PE MUST never learn its own GW IP/MAC via the tunnels connecting itself to other PE(s).</t>
		<t>ARP requests/replies from the tenant subnet are flooded via the ingress PE(s) attached to the subnet to all egress PE(s) attached to the subnet so that egress PE(s) can learn the source MAC/IP address via the ingress PE(s).</t>
		<t>ARP replies from tenants will be delivered to the local PE hosts the GW virtual MAC address. The local PE MUST flood the ARP replies over the tunnel to other PEs. Other PEs, including the PE which originated the ARP request, will learn the IP/MAC association of the tenant from the received ARP reply.</t>
	</list>
	
</t>

</section>


<section title="Multi-pathing">
<t>Packets destined to a MH CE is distributed to the PE nodes attached to the CE for load-balancing purpose. This is achieved implicitly due to the use of anycast SIDs for both ES as well as PE attached to the ES. In our example, traffic 	destined to CE5 is distributed via PE5 and PE6.</t>
</section>

<section title="E-Tree Support">
<t>To be covered in the next revision of this document.</t>
</section>

</section>

   <section title="Benefits of ELAN over SR">
	
	<t>The proposed approach eliminates the need for establishing and maintaining PWs as with legacy VPLS technology. This yields significant reduction in control plane overhead. Also, due to MAC learning in data-plane (conversational MAC learning), the proposed approach provides the benefits as such fast convergence, fast MAC 	movement, etc. Finally, using anycast SID, the proposed approach 	provides All-Active multihoming as well as multipathing and ARP suppression.
	</t>


   </section>

   <section title="Security Considerations">

   	<t>The mechanisms in this document use Segment Routing control plane as defined in Security considerations described in Segment Routing control plane are equally applicable.</t>

    </section>
   <section title="IANA Considerations">
   <t>TBD. </t>
  </section>

  <section title="Acknowledgements">

  </section>
  </middle>
  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119"?>
      <?rfc include="reference.RFC.8402"?>
      <?rfc include="reference.RFC.8660"?>
      <?rfc include="reference.RFC.8754"?>
    </references> 

    <references title="Informative References">
	<?rfc include="reference.RFC.4761"?>
	<?rfc include="reference.RFC.4762"?>
      <?rfc include="http://xml.resource.org/public/rfc/bibxml3/reference.I-D.ietf-spring-segment-routing-policy"?>
	<?rfc include="http://xml.resource.org/public/rfc/bibxml3/reference.I-D.voyer-pim-sr-p2mp-policy"?>
    </references> 

  </back>
</rfc>

