<?xml version="1.0" encoding="utf-8"?>
  <?xml-stylesheet type="text/xsl" href="rfc2629.xslt" ?>
  <!-- generated by https://github.com/cabo/kramdown-rfc version 1.6.17 (Ruby 3.0.4) -->


<!DOCTYPE rfc  [
  <!ENTITY nbsp    "&#160;">
  <!ENTITY zwsp   "&#8203;">
  <!ENTITY nbhy   "&#8209;">
  <!ENTITY wj     "&#8288;">

]>

<?rfc comments="yes"?>

<rfc ipr="trust200902" docName="draft-mackenzie-bess-evpn-l3mh-proto-01" category="std" consensus="true" tocInclude="true" sortRefs="true" symRefs="true">
  <front>
    <title abbrev="EVPN L3MH">EVPN multi-homing support for L3 services</title>

    <author initials="M." surname="MacKenzie" fullname="Michael MacKenzie" role="editor">
      <organization abbrev="Cisco">Cisco Systems</organization>
      <address>
        <email>mimacken@cisco.com</email>
      </address>
    </author>
    <author initials="P." surname="Brissette" fullname="Patrice Brissette" role="editor">
      <organization abbrev="Cisco">Cisco Systems</organization>
      <address>
        <email>pbrisset@cisco.com</email>
      </address>
    </author>
    <author initials="S." surname="Matsushima" fullname="Satoru Matsushima">
      <organization abbrev="Softbank">Softbank</organization>
      <address>
        <email>satoru.matsushima@g.softbank.co.jp</email>
      </address>
    </author>
    <author fullname="Wen Lin" initials="W." surname="Lin">
     <organization>Juniper</organization>
     <address>
       <email>wlin@juniper.com</email>
     </address>
    </author>
    <author fullname="Jorge Rabadan" initials="J." surname="Rabadan">
     <organization>Nokia</organization>
     <address>
       <email>jorge.rabadan@nokia.com</email>
     </address>
    </author>
    <date year="2022"/>

    <area>Routing</area>
    <workgroup>BESS Working Group</workgroup>
    <keyword>Internet-Draft</keyword>

    <abstract>
<t>This document brings the machinery and solution providing higher network
availability and load balancing benefits of EVPN Multi-Chassis Link Aggregation
Group (MC-LAG) to various L3 services delivered by EVPN.</t>
    </abstract>

    <note title="Requirements Language">

<t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are to be interpreted as
described in BCP 14 <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when, they
appear in all capitals, as shown here.</t>
    </note>

  </front>

  <middle>


<section anchor="Introduction"><name>Introduction</name>

<t>Resilient L3VPN service to a CE requires multiple service PEs to run a
Multi-Chassis Link Aggregation Group mechanism, which previously required a 
proprietary ICL control plane link between them.</t>

<t>This extension to <xref target="RFC9135"/> and to <xref target="RFC9136"/> 
brings EVPN based MC-LAG all-active multi-homing load-balancing to various 
services (L2 and L3) delivered by EVPN.
Although this solution is also applicable to some L2 service use cases,
(example Centralized Gateway) this document focuses on
the L3VPN <xref target="RFC4364"/> use case to provide examples.</t>

<t>EVPN ESI-LAG is completely transparent to a CE device,
and provides link and node level redundancy with load-balancing using the
existing BGP control plane required by the L3 services.</t>

<t>For example, the L3VPN service can be MPLS, VxLAN or SRv6 based, and does not
require EVPN signaling to remote neighbors.
The EVPN signaling is limited to the redundant service PEs sharing
a Ethernet Segment Identifier (ESI).
This is used to synchronize ARP/ND, multicast Join/Leave, and IGP routes
replacing need for ICL link.</t>

<figure title="EVPN MC-LAG Topology" anchor="fig-mclag-bundle"><artwork><![CDATA[
                    +-----+
                    | PE3 |
                    +-----+
                 +-----------+
                 |  MPLS/IP  |
                 |  CORE     |
                 +-----------+
               +-----+   +-----+
               | PE1 |   | PE2 |
               +-----+   +-----+
                  |         |
                  I1       I2
                    \     /
                     \   /
                     +---+
                     |CE1|
                     +---+
]]></artwork></figure>

<t>Figure 1 shows a MC-LAG multi-homing topology where PE1 and PE2 are
part of the same redundancy group providing multi-homing to CE1 via
interfaces I1 and I2. PE1, PE2 and PE3 are attached to the same L3VPN thru the
core (running <xref target="RFC4364"/> and/or <xref target="RFC9136"/> procedures).
Interfaces I1 and I2 are Bundle-Ethernet interfaces running LACP protocol.
The CE device can be a layer-2 or layer-3 device connecting to the redundant
PEs over a single LACP LAG port.  In the case of a layer-3 CE device, this
document looks to solve the case of an IGP adjacency between PEs and CE.
Further study is needed to support BGP PE to CE protocols.
The core, shown as IP or MPLS enabled, provides wide range of L3 services.
MC-LAG multi-homing functionality is decoupled from those services in
the core and it focuses on providing multi-homing to CE. </t>

<t>To deliver resilient layer-3 services and provide traffic load-balancing towards
the access, the two service PEs advertise layer-3
reach-ability towards the layer-3 core and both be eligible to receive
traffic and forward towards the Access.</t>

<section anchor="problem-unicast"><name>Problems with unicast load-balancing from core to CE</name>

<t>The layer-2 hashing performed by CE over its LAG port means that its possible
for only one service PE to populate its ARP/ND cache.
Take for example PE1 and PE2 from <xref target="fig-mclag-bundle"/>.  If CE1 ARP/ND response
happens to always hash over I1 towards PE1, then PE2 ARP/ND table remains empty.
Since unicast traffic from remote PEs can be received by either service PE,
traffic that reaches the service PE2 does not find an ARP entry matching
the host IP address and traffic is dropped until its ARP/ND table is updated.</t>

<t>If the CEs hash implementation always calculates the ARP/ND response towards
PE1, the resolution on PE2 never succeeds and traffic load balanced to
PE2 is permanently droppped.</t>

<t>The route sync solution is described in <xref target="solution-route-sync"/></t>

</section>
<section anchor="problem-multicast"><name>Problems with multicast from core to CE</name>

<t>Like the unicast behavior above, multicast IGMP/MLD join messages from CE
to LAG link may always hash to a single PE.</t>

<t>When PIM runs on both redundant layer-3 PEs, both serving
multicast for the same access segment, PIM hello messages <xref target="RFC7761"/> issued by
I1 (<xref target="fig-mclag-bundle"/>) are not received by I2, and, vice versa;
PIM hello messages issued by I2 are not received by I1. This is due to the CE not being able
to switch traffic between the two members of the same LAG. Both PEs
therefore become PIM Designated Router (DR). The PIM DR is responsible for
tracking local multicast listeners and forwarding traffic to those listeners.
The PIM DR is also responsible for sending local Join/Prune messages towards
the RP or source. However, due to the CE hashing, a particular IGMP join for
a given multicast group is received by only one of the PEs. Only that PE programs the
multicast route for the group and issues a PIM join message.</t>

<t>The multicast route sync solution is described in <xref target="solution-igmp-route-sync"/></t>

</section>
<section anchor="problem-adj"><name>Problems with IGP adjacencies over the LAG port</name>

<t>A layer-3 CE device/router that connects to the redundant PEs may
establish an IGP adjacency on the bundle port. In this case, the adjacency is 
formed to one of the PEs and IGP customer route(s) is only present on
that PE.</t>

<t>This prevents the load-balancing benefits of redundant PEs from supporting this
use case, as only one PE is aware and advertising the customer routes
to the core.</t>

<figure title="IGP Adjacency over LAG Port" anchor="fig-igp-adjacency"><artwork><![CDATA[
                  <---------+
                            | IGP Adj
    +-------+               |
    |       | 1.1.1.1/24    |
    | PE1   +-----------+   |
    |       |           |   |
    |       |           |   +
    +-------+           |
                        |
        +               |  +------+
  RT5   |             L |  | CE   +------>H1
  Sync  |             A +->+      |
        v             G |  |      |
                        |  |      +------>R1
    +-------+           |  +------+
    |       |           |    1.1.1.2/2
    | PE2   +-----------+
    |       | 1.1.1.1/24
    |       |
    +-------+

]]></artwork></figure>

<t><xref target="fig-igp-adjacency"/> provides an example of this use case, where CE forms
an IGP adjacency with PE1 (example: ISIS or OSPF),
and advertises its H1 and R1 routes into the IP-VRF
of PE1. PE1 may then redistribute this IGP route into the core as an L3
service. Any remote PEs are only aware of the service from PE1, and cannot
load balance through PE2 as well.</t>

<t>Further study is required to support the case of BGP PE to CE protocols.</t>

<t>A solution to this is described in <xref target="solution-subnet-sync"/></t>

</section>
<section anchor="problem-ac-aware"><name>Problems with supporting multiple subnets on same ES in all active mode</name>

<t>In the case where the L3 service is L3VPN such as <xref target="RFC4364"></xref>, it is likely
the CE device could be a layer-2 switch supporting multiple subnets through
the use of VLANs. In addition, each VLAN may be associated with a different
customer VRF.</t>

<t>When ARP/ND routes are synchronized between the PEs for ARP proxy support using
RT-2, a similar problem is encountered as described by Section 1.1
of <xref target="I-D.sajassi-bess-evpn-ac-aware-bundling"/>.  The PE receiving RT-2 is
unable to determine which sub-interface the ARP/ND entry is associated with.</t>

<t>When IGMP/MLD routes are synchronized between the PEs using RT-7 and RT-8, a similar
problem is encountered as described by Section 1.2
of <xref target="I-D.sajassi-bess-evpn-ac-aware-bundling"/>.  The PE receiving RT-7 and RT-8
is unable to determine which sub-interface the IGMP join is associated with.</t>

<t>This document proposes to use the solution defined by Section 4
of <xref target="I-D.sajassi-bess-evpn-ac-aware-bundling"/> to solve both these cases.
All route sync messages (RT-2, RT-5, RT-7, RT-8) carry an Attachment Circuit
Identifier Extended Community to signal which sub-interface the routes
were learnt on.</t>

<t>This document focuses on configuration models over access-facing interfaces 
with L3 sub-interfaces. Models with both L2 and L3 sub interfaces on a
interface are left for future study.</t>

</section>
<section anchor="acronyms"><name>Acronyms</name>

<dl>
  <dt>BD:</dt>
  <dd>
    <t>Broadcast Domain</t>
  </dd>
  <dt>BE:</dt>
  <dd>
    <t>Bundle Ethernet</t>
  </dd>
  <dt>DF:</dt>
  <dd>
    <t>Designated Forwarder</t>
  </dd>
  <dt>DR:</dt>
  <dd>
    <t>Multicast Designated Router</t>
  </dd>
  <dt>EC:</dt>
  <dd>
    <t>BGP Extended Community</t>
  </dd>
  <dt>ES:</dt>
  <dd>
    <t>Ethernet Segment. When a customer site (device or network) is connected to
 one or more PEs via a set of Ethernet links, then that set of links is
 referred to as an 'Ethernet Segment'.</t>
  </dd>
  <dt>ESI:</dt>
  <dd>
    <t>Ethernet Segment Identifier.
 A unique non-zero identifier that identifies an Ethernet Segment is
 called an 'Ethernet Segment Identifier'.</t>
  </dd>
  <dt>ESI-LAG:</dt>
  <dd>
    <t>This refers to multi-homing scenario where peering PEs, connected to same CE, 
	are two, three or more.</t>
  </dd>
  <dt>ETAG:</dt>
  <dd>
    <t>Ethernet Tag. An Ethernet tag identifies a particular broadcast domain, e.g., a VLAN.
 An EVPN instance consists of one or more broadcast domains.</t>
  </dd>
  <dt>EVI:</dt>
  <dd>
    <t>An EVPN instance spanning the Provider Edge (PE) devices
 participating in that EVPN. It is used to assist a L3 VRF for route synchronization.</t>
  </dd>
  <dt>GRT:</dt>
  <dd>
    <t>Global Routing Table</t>
  </dd>
  <dt>ICL:</dt>
  <dd>
    <t>Inter Chassis Link</t>
  </dd>
  <dt>IGMP:</dt>
  <dd>
    <t>Internet Group Management Protocol</t>
  </dd>
  <dt>IP-VRF:</dt>
  <dd>
    <t>A VPN Routing and Forwarding table for IP routes on an PE.
 The IP routes could be populated by EVPN and IP-VPN address
 families.  An IP-VRF is also an instantiation of a layer 3 VPN in an PE.</t>
  </dd>
  <dt>L3AA</dt>
  <dd>
    <t>All-Active Redundancy Mode for Layer 3 services.
 When all PEs attached to an Ethernet segment are allowed to forward known
 unicast traffic to/from that Ethernet segment for a given VLAN,
 then the Ethernet segment is defined to be operating in All-Active redundancy mode.</t>
  </dd>
  <dt>MAC-VRF:</dt>
  <dd>
    <t>A Virtual Routing and Forwarding table for Media Access Control (MAC)
 addresses on a PE. A MAC-VRF is also an instantiation of an EVI in a PE</t>
  </dd>
  <dt>MC-LAG:</dt>
  <dd>
    <t>Multi-Chassis Link Aggregation Group (MC-LAG).</t>
  </dd>
  <dt>MLD:</dt>
  <dd>
    <t>Multicast Listener Discovery.</t>
  </dd>
  <dt>PE:</dt>
  <dd>
    <t>Provider Edge.</t>
  </dd>
  <dt>PIM:</dt>
  <dd>
    <t>Protocol Independent Multicast.</t>
  </dd>
  <dt>RD:</dt>
  <dd>
    <t>Route Distinguisher used in BGP.</t>
  </dd>
  <dt>RP:</dt>
  <dd>
    <t>Multicast Rendezvous Point.</t>
  </dd>
  <dt>RT:</dt>
  <dd>
    <t>Route-Targets used in BGP</t>
  </dd>
  <dt>RT-2:</dt>
  <dd>
    <t>EVPN route type 2, i.e., MAC/IP advertisement route, as defined
in <xref target="RFC7432"/>.</t>
  </dd>
  <dt>RT-5:</dt>
  <dd>
    <t>EVPN route type 5, i.e., IP Prefix route, as defined in
Section 3 of <xref target="RFC9136"/>.</t>
  </dd>
  <dt>RT-7:</dt>
  <dd>
    <t>EVPN route type 7, i.e., Multicast Join Synch Route, as defined in
Section 9.2 of <xref target="RFC9251"/>.</t>
  </dd>
  <dt>RT-8:</dt>
  <dd>
    <t>EVPN route type 8, i.e., Multicast Leave Synch Route, as defined in
Section 9.3 of <xref target="RFC9251"/>.</t>
  </dd>
</dl>

</section>
<section anchor="requirements"><name>Requirements</name>

<t><list style="numbers">
  <t>The multi-homing solution MUST support Layer-3 access interface</t>
  <t>The multi-homing solution MUST support Layer-3 access sub-interface</t>
  <t>The solution MUST support unicast and multicast VPN services</t>
  <t>The solution SHOULD support igp synchronization</t>
  <t>The solution SHOULD support unicast and multicast GRT services</t>
  <t>The solution MUST support all-active load-balancing mode</t>
  <t>The solution MAY support single-active load-balancing mode</t>
  <t>The solution MUST support port-active load-balancing mode</t>
</list></t>

</section>
</section>
<section anchor="solution"><name>Solution</name>

<figure title="ARP/ND synchronization over different VRF(s)" anchor="fig-esi-to-vrf"><artwork><![CDATA[
+------
|     +-------+ BE1.1 (10.0.0.1/24)
| PE1 || BE1  +---------------------------------+
|     || ESI-1|                                 |
|     ||      | BE1.2 (10.0.0.1/24)             |
|     ||      +-------------------------+       |
|     +-------+                         |       |
|     |                                 |       |
|     +-------+ BE2 (10.0.1.1/24)       |       |
|     || BE2  +------------------+      |       |
|     || ESI-2|                  |      |       |
|     ||      |                 +v----+ |       |
|     ||      |                 |CE1  | |       |
|     +-------+                 |.2   | |       |
+------                         |CUST1| |       |
                                +^----+ |       |
+------                          |     +v-----+-v----+
|     +-------+ BE2 (10.0.1.1/24)|     |SW1   |      +-->H1(.2)
| PE2 || BE2  +------------------+     |CUST2 |CUST1 |
|     || ESI-2|                        +^-----+-^----+
|     ||      |                         |       |
|     ||      |                         |       |
|     +-------+                         |       |
|     |                                 |       |
|     +-------+ BE1.2 (10.0.0.1/24)     |       |
|     || BE1  +-------------------------+       |
|     || ESI-1|                                 |
|     ||      | BE1.1 (10.0.0.1/24)             |
|     ||      +---------------------------------+
|     +-------+
+------

PE(1,2):
CUST1-VRF (IP-VRF1)
CUST2-VRF (IP-VRF2)

SW1:
CUST1-Subnet1: 10.0.0.2/24 (VLAN 1)
CUST2-Subnet1: 10.0.0.2/24 (VLAN 2)

CE1:
CUST1-Subnet2 10.0.1.2/24

]]></artwork></figure>

<t>Consider the <xref target="fig-esi-to-vrf"/> topology, where two AC aware 
bundling service interfaces are supported.
On first bundling interface BE1, PE1 and PE2 share a LAG interface with
switch 1 (SW1) and have two separate (but overlapping) customer 1 and customer 2
subnets.  CUST1 Subnet 1 is resolving over sub-interface VLAN 1 (.1),
and CUST2 Subnet 1 is resolving over sub-interface VLAN 2 (.2).</t>

<t>On second bundling interface BE2, both PEs share a LAG interface with Customer
Edge device 1 (CE1) and only a single Customer (CUST1) subnet on native VLAN.</t>

<t>Main interface BE1 on PE1 and PE2 is shared by customer 1 and 2, and represented
by ESI-1.</t>

<t>Main interface BE2 on PE1 and PE2 is only used by customer 1, and represented
by ESI-2.</t>

<t>If we focus on CUST1, there are 2 cases visible.</t>

<t>Case 1:
For CE1, if its ARP requests hash towards PE2, then PE1 is
unaware of its presence. For PE2 to synchronize this information to PE1,
in addition to CE1 IP address (10.0.1.2) and MAC address (m1), two additional
unique identifiers are needed:</t>

<t><list style="numbers">
  <t>IP-VRF. CUST 1 VRF is represented by associated L3 route targets (IP-VRF RT(s))</t>
  <t>Interface. BE2 Interface is represented by ESI-2</t>
</list></t>

<t>Case 2:
For Host 1 (H1), if its ARP request hash towards PE2, then PE1 is
unaware of its presence.  For PE2 to synchronize this information to PE1, then
in addition to H1 IP address (10.0.0.2) and MAC address (m2), three additional
unique identifiers are required.</t>

<t><list style="numbers">
  <t>IP-VRF. CUST 1 VRF is represented by corresponding L3 route target (IP-VRF RT(s))</t>
  <t>Main Interface. BE1 Interface is represented by ESI-1</t>
  <t>Sub-Interface. Subnet/VLAN 1 is represented by Attachment Circuit ID 1.</t>
</list></t>

<section anchor="l3vrf-route target"><name>Usage of L3VRF route target</name>

<t> The synchronization of information between peering PEs is done via various 
EVPN route types. For instance, adjacencies in ARP/ND tables are synchronized
by leveraging EVPN route type-2. When dealing with Layer-3 interface, basic 
principles described in <xref target="RFC9136"/> are leverage. By default, 
any routes used for synchronization are advertised with IP-VRF route targets.
</t>

<t> Alternatively, EVPN routes may be advertised with ES-import route targets
along with EVI-RT EC equal to associated IP-VRF route target. This allows BGP 
to distribute the route(s) to only the PEs attached to the associated ESI, 
and also allows routes to be applied to the respective IP-VRF(s) at receiving end.</t>

<t>In the example <xref target="fig-esi-to-vrf"/>, route synchronization from CUST1 has
IP-VRF1 RT(s) and CUST2 has IP-VRF2 RT(s). As an optimization, route synchronization uses
ES-import RT(s). On top of that, CUST1 has
EVI-RT BGP Extended Community (EC) with IP-VRF1 RT(s), and CUST2 
EVI-RT BGP Extended Community (EC) has IP-VRF2 RT(s). 
</t>
</section>

<section anchor="usage-evpn-instance"><name>Usage of EVPN instance</name>

<t><xref target="RFC7432"/> eases the auto-generation of BGP constructs such as
route-distinguisher and route targets per MAC-VRF, based on a unique value
for the Broadcast Domain that, in this document, we referred to as EVI.
Similarly as in <xref target="RFC9136"/>, the usage of EVI is not required when
dealing with L3VPN multi-homing scenarios. The RD may be auto-generated locally 
with a unique Id and associated RT(s) may be taken from the IP-VRF</t>

<t> The synchronization over GRT is different. In that specific situation, an EVPN instance
may be assigned to support non-VPN layer-3 services. The assignment is only serving 
the purpose of providing route targets as requested by <xref target="RFC7432"/>;
where RT(s) are mandatory per EVPN route.</t>

<t>EVPN enhances the multi-homing layer 3 service with the following 
synchronization routes:</t>

<t><list style="symbols">
  <t>ARP/ND</t>
  <t>IGMP/MLD</t>
  <t>IP (for customer subnets learned from IGP adjacency)</t>
</list></t>

</section>
<section anchor="mapping-for-l3-interface-to-esi"><name>Mapping for L3 Interface to ESI</name>

<t>The ESI represents the L3 LAG interface between PE and CEs.
This ESI is signaled using RT-4 with the ES-Import Route Target as described
in Section 8.1.1 of <xref target="RFC7432"/> so that the service PE peers can discover each
other's common ES.</t>

<t>In the example <xref target="fig-esi-to-vrf"/>, route-syncs from interface BE1
have IP-VRF RT(s) or ES-Import RT and EVI-RT EC with ESI 1 as an optimization.</t>

</section>
<section anchor="mapping-for-l3-sub-interface-to-attachment-circuit-id"><name>Mapping for L3 Sub-Interface to Attachment Circuit ID</name>

<t>The Attachment Circuit ID represents the sub-interface subnet on the L3
LAG interface between PE and CEs.
The AC-ID is signaled using RT-2, RT-5, RT-7 and RT-8 by attaching
Attachment Circuit ID Extended community as described in Section 6.1 of
<xref target="I-D.sajassi-bess-evpn-ac-aware-bundling"/>.</t>

<t>In the example <xref target="fig-esi-to-vrf"/>, route-syncs from sub-interface BE1.1 (VLAN1)
have Attachment-Circuit-ID EC with ID 1</t>

</section>
<section anchor="solution-route-sync"><name>Route sync for ARP/ND</name>

<t>This document proposes solving the issue described in <xref target="problem-unicast"/>
using RT-2 IP/MAC route sync as described in Section 10 of <xref target="RFC7432"/> with a
modification described below.</t>

<section anchor="local-adjacency-arpnd-learning"><name>Local adjacency (ARP/ND) learning</name>

<t>In EVPN or/and EVPN-IRB (<xref target="RFC7432"/> or/and 
<xref target="RFC9135"/>) where multi-homing is enabled through L2 access interfaces, 
peering PEs learn local adjacencies upon receiving ARP and/or ND 
messages. Using EVPN route type-2 (MAC/IP), adjacencies are 
synchronized between peering PE sharing common Ethernet Segments. 
This allows for proper layer-2 forwarding chain establishment based on configured 
load-balancing mode. Locally learned MAC may also be synchronized for some Layer-2 services.</t>

<t>Similarly with L3 interfaces, local ARP/ND learning triggers an EVPN route 
type-2 synchronization to any peer PE. However, there is no need for local MAC 
learning or synchronization since there is no layer-2 service being offer.
The MAC-only RT-2 route is NOT advertised to peer PE and L2 forwarding chains 
should not be programmed.</t>

<t>Section 9.1 of <xref target="RFC7432"/> describes different mechanisms to learn adjacency
routes locally.</t>

<t>ARP/ND route synchronization (refer as ARP/ND sync route in this document), 
uses EVPN non-zero ESI EVPN type-2 (MAC/IP) routes to exchange between peering 
PE all locally learned adjacencies. Few more add-ons are needed to allow proper 
behavior:</t>

<t><list style="symbols">
  <t>An ARP/ND Sync route SHOULD carry the IP-VRF Route Target of associated VRF</t>

  <t>Optionally, an ARP/ND Sync route MAY carry exactly one ES-Import Route Target extended
community, the one that corresponds to the ES on which the ARP or ND was
received. This is in replacement of the IP-VRF RT(s) mentioned previously. Moreover, 
if an ES-Import Route Target extended community is used instead of
the IP-VRF Route target, the ARP/ND Sync route MUST also carry exactly one
EVI-RT extended community corresponding to the associated IP-VRF on which the
ARP or ND was received. See Section 9.5 of <xref target="RFC9251"/> for details on how to
construct the EVI-RT extended community.</t>

  <t>In the case where PE supports AC aware bundling, it MUST also carry one
Attachment Circuit ID Extended Community. The circuit ID maps the
sub-interface (or subnet) where this route was received. For details on how to
encode and construct this Extended Community, see section 6.1 of
<xref target="I-D.sajassi-bess-evpn-ac-aware-bundling"/>.</t>
</list></t>

</section>
<section anchor="remote-arpnd-learning"><name>Remote ARP/ND learning</name>

<t>When consuming a remote EVPN route type-2 synchronization route:</t>

<t><list style="symbols">
  <t>BGP only imports layer-3 sync route(s) based on IP-VRF Route-targets or optionally when 
  both ES-Import and EVI-RT extended communities match those locally configured</t>
  <t>The main interface is derived from the ESI</t>
  <t>The VLAN / sub-interface is derived from the AC-ID provided in the
  Attachment-Circuit-ID extended community</t>
</list></t>

</section>
</section>
<section anchor="solution-igmp-route-sync"><name>Route sync for IGMP/MLD</name>

<t>This document proposes solving the issue described in <xref target="problem-multicast"/>
using RT-7 and RT-8 route sync as described by <xref target="RFC9251"/>.</t>

<t>Local IGMP/MLD join and leave triggers a RT-7/8 route sync to peer PE.</t>

<section anchor="local-igmp-joinleave-learning"><name>Local IGMP/MLD Join/Leave learning</name>

<t>An IGP Join or Leave triggers a RT-7/8 route sync to any peer PE.</t>

<t>Section 9.1 of <xref target="RFC7432"/> describes different mechanisms to learn adjacency
routes locally.</t>

<t><list style="symbols">
  <t>As per unicast, multicast routes SHOULD carry associated IP-VRF route targets.</t>
  <t>Optionally, an Multicast Join or Leave Sync route MAY carry exactly one ES-Import Route
  Target extended community, the one that corresponds to the ES on which the IGMP/MLD
  Join or Leave was received.</t>
  <t>It MAY also carry exactly one EVI-RT EC, the one that corresponds to the 
  associated VRF on which the IGMP Join or Leave was received.
  See Section 9.5 of <xref target="RFC9251"/> for details on how to
  encode and construct the EVI-RT EC.</t>
  <t>In case where the PE supports multiple subinterfaces within the same Ethernet
  Segment, the Multicast Sync routes MUST also carry one Attachment Circuit ID
  extended community. The circuit ID maps the
  sub-interface (or subnet) this route was received. For details on how to
  encode and construct this Extended Community, see section 6.1 of
  <xref target="I-D.sajassi-bess-evpn-ac-aware-bundling"/>.</t>

</list></t>

</section>
<section anchor="remote-igmp-joinleave-learning"><name>Remote IGMP/MLD Join/Leave learning</name>

<t>When consuming a remote multicast RT-7 or RT-8 sync route:</t>

<t><list style="symbols">
  <t>A PE only imports Multicast Sync routes received with either a Route Target
or an EVI-RT that matches one of the local IP-VRF(s) (assuming the ES-import
Route Target matches the Route Target of one of the local Ethernet Segments).</t>
  <t>The layer-3 VRF is derived from the matching EVI.</t>
  <t>The main interface is derived from the ESI.</t>
  <t>The VLAN / sub-interface is derived from the AC-ID provided in the
  Attachment-Circuit-ID extended community.</t>
</list></t>
</section>

<section anchor="upstream-pim-joinprune"><name>Upstream PIM Join/Prune</name>
  <t>With the IGMP join/leave sync routes, both the PEs have the membership 
  request from a multi-homed receiver. Both the PEs are DR and send a 
  PIM join/prune message to the RP. Both the PEs are added as leaf nodes 
  in the multicast distribution tree. Hence, both the PEs get traffic. 
  The PE that is the DF for the multicast flow will send the traffic on the 
  Ethernet Segment to the receiver. The NDF PE will drop the traffic.</t>
</section>
</section>
<section anchor="solution-subnet-sync"><name>Customer Subnet Route sync using Route-type(5)</name>

   <t>Section 3 of <xref target="RFC9136"/> provides a mechanism to synchronize layer-3
   customer subnets between the PEs in order to solve problem described
   in <xref target="problem-adj"/>.</t>
 
   <t>Using <xref target="fig-igp-adjacency"/> as example, if PE1 forms the IGP adjacency with CE, it
   is the only PE with knowledge of the customer subnet R1.  BGP on PE1
   advertises R1 to remote PEs using L3-VPN signaling, either based on
   <xref target="RFC4364"/> IP-VPN routes or <xref target="RFC9136"/> EVPN IP Prefix routes.</t>
 
   <t>Although PE2 has the same ES connection to the CE, and could provide
   load balancing to remote PEs, since it has not formed an IGP
   adjacency with CE, it is not aware of the customer subnet R1.</t>
 
   <t>This is solved by PE1 signaling R1 to PE2 using a RT-5
   synchronization route. PE2 can then advertise this customer
   subnet R1 towards the core as if it was locally learned through IGP, and
   provide load-balancing from the remote PEs. There are two possible 
   options to achieve synchronization:
   	<t><list style="numbers">
   		<t>ESI based approach.</t>
 		<t>IP Gateway based approach.</t>
	</list>
   	</t>
   </t>

 <section anchor="ESI-based-approach"><name>ESI based approach</name>  
   <t>The procedures differ depending on whether the core is running 
   <xref target="RFC4364"/> IP-VPN or the <xref target="RFC9136"/> 
   EVPN IP-VRF-to-IP-VRF model:</t>
 
   <t><list style="symbols">
     <t>If the core is running <xref target="RFC4364"/> IP-VPN, the PE receiving the R1 IGP route
     from the CE advertises R1 in a RT-5 with the ESI of the Ethernet
     Segment, and also in an IP-VPN route. Both routes carry the IP-VRF Route Target(s).
     The peer PE attached to the same Ethernet Segment (PE2 in Figure 2) imports
     both routes for R1, but treats the non-zero ESI RT-5 as if it was a local route
     associated to the local Ethernet Segment. Therefore the RT-5 route is
     selected over the IP-VPN route for R1, and PE2 advertises a
     new IP-VPN route for R1 so that the remote PEs in the IP-VPN network can load
     balance R1 traffic to both, PE1 and PE2.</t>
 
     <t>If the core is running <xref target="RFC9136"/> EVPN (IP-VRF-to-IP-VRF model), the PE with the
     IGP adjacency (PE1) advertises R1 in a RT-5 with the corresponding ESI as before,
     and PE2 synchronizes the route as per section 4.2 of <xref target="I-D.sajassi-bess-evpn-ip-aliasing"/>.
     The advertisement of the IP A-D routes (for the ESI) from PE1 and PE2 guarantees
     that the remote EVPN PEs load balance the R1 traffic to both PEs attached to the
     Ethernet Segment (section 4 of <xref target="I-D.sajassi-bess-evpn-ip-aliasing"/>).</t>
	</list> </t>
 </section>

 <section anchor="IP-GW-based-approach"><name>IP Gateway based approach</name>  
   	<t>The procedures is very similar depending on whether the core is running <xref target="RFC4364"/> IP-VPN or the
   <xref target="RFC9136"/> EVPN IP-VRF-to-IP-VRF model:</t>

   <t><list style="symbols">
      <t>If the core is running <xref target="RFC4364"/> IP-VPN, the PE receiving the R1
      IGP route from the CE advertises R1 in a RT-5 with the IP gateway field equal to the
      R1 nexthop, and also a corresponding IP-VPN route.  Both routes carry
      the IP-VRF Route Target(s).  The peer PE imports both routes for R1 where
      the RT-5 route is selected over the IP-VPN route for R1.
      Due to the adjacency synchronization done via EVPN RT-2, peer PE resolves 
      R1 over the IP gateway pointing to the local interface. Peering PE advertises a new
      IP-VPN route for R1 so that the remote PEs in the IP-VPN network
      can load balance R1 traffic to both, PE1 and PE2.</t>

      <t>If the core is running <xref target="RFC9136"/> EVPN (IP-VRF-to-IP-VRF model), the mechanism 
      works exactly like before without the need to select EVPN RT-5 over IP-VPN route.
      Furthermore, there is no need to generate IP-VPN route but only EVPN-RT5 for R1 so that the 
      remote PEs can load balance R1 traffic to both, PE1 and PE2.</t>
	</list> </t>
 </section>

</section>
<section anchor="mapping-for-vlan-to-etag"><name>Mapping for VLAN to ETAG</name>
<t><xref target="I-D.sajassi-bess-evpn-ac-aware-bundling"/> proposes the use of 
an Attachment Circuit ID Extended Community to carry specific VLAN identification.
To avoid the usage of EC, the Ethernet-tag field may be used to
signal VLAN/sub-interface identification between service PE peers in RT-2, RT-5, RT-7 and RT-8 as
opposed to the Attachment Circuit Extended Community.</t>

</section>
</section>
<section anchor="extensions-to-rt-2-rt-5-rt-7-and-rt-8"><name>Extensions to RT-2, RT-5, RT-7 and RT-8</name>

<t>This document proposes extending the use case of Extended communities
already defined in other drafts for the route types
RT-2, RT-5, RT-7 and RT-8.</t>

<t><list style="symbols">
  <t>EVI-RT Extended Community as defined in Section 9.5 of
<xref target="RFC9251"/>.</t>
  <t>Attachment Circuit ID Extended Community as defined in Section 6.1 of
<xref target="I-D.sajassi-bess-evpn-ac-aware-bundling"/>.</t>
</list></t>

</section>

<section anchor="convergence-considerations"><name>Convergence Considerations</name>
<t> Left for future study.</t>
</section>

<section anchor="overall-advantages"><name>Overall Advantages</name>

<t>The use of EVPN ESI-LAG all active multi-homing brings the following benefits
to L3 BGP services:</t>

<t><list style="symbols">
  <t>Open standards based per interface all-active redundancy
mechanism that eliminates the need to run ICCP and LDP.</t>
  <t>Agnostic of underlay technology (MPLS, VXLAN, SRv6) and
associated services (L3, L3-VPN).</t>
  <t>Replaces legacy MC-LAG ICCP-based solution, and offers following
additional benefits:
  <list style="symbols">
      <t>Fast convergence with mass-withdraw is possible with EVPN.</t>
      <t>Avoid the need of a dedicated ICCP channel between peering PEs.</t>
  </list></t>

  <t>Requires signaling already defined in existing EVPN RFCs
  <xref target="RFC7432"/>, <xref target="RFC9136"/>, <xref target="RFC9251"/> 
  and draft 
  <xref target="I-D.sajassi-bess-evpn-ac-aware-bundling"/>. 
  </t>
  <t> Removes the burden of having the need for ICL link and any proprietary
   protocols.</t>
</list></t>
</section>

<section anchor="security-considerations"><name>Security Considerations</name>

<t>The same Security Considerations described in <xref target="RFC7432"/> are valid for this
document.</t>

</section>
<section anchor="iana-considerations"><name>IANA Considerations</name>

<t>There are no IANA considerations.</t>

</section>
  </middle>

  <back>

    <references title='Normative References'>

<?rfc include="reference.I-D.draft-sajassi-bess-evpn-ac-aware-bundling-06.xml"?>
<?rfc include="reference.I-D.draft-sajassi-bess-evpn-ip-aliasing-05.xml"?>
<?rfc include="reference.RFC.2119.xml"?>
<?rfc include="reference.RFC.8174.xml"?>
<?rfc include="reference.RFC.9135.xml"?>
<?rfc include="reference.RFC.9136.xml"?>
<?rfc include="reference.RFC.9251.xml"?>

    </references>

    <references title='Informative References'>
<?rfc include="reference.RFC.4364.xml"?>
<?rfc include="reference.RFC.7432.xml"?>
<?rfc include="reference.RFC.7761.xml"?>

    </references>

    <section title="Contributors">
        <t>The following people has contributed substantially to this document:
        </t>
        <t>Jiri Chaloupka<br/>Cisco</t>
        <t>EMail: jichalou@cisco.com</t>
        <t>Jayashree Subramanian<br/>Cisco</t>
        <t>EMail: jays@cisco.com</t>
    </section>

  </back>
</rfc>

