Scalability Considerations for Network Resource Partition

RFC Editor Note: Please replace "RFC XXXX" in this document with the RFC number assigned to draft-ietf-teas-ietf-network-slices, and remove this note. defines network slicing in networks built using IETF technologies. These network slices may be referred to as RFC XXXX Network Slices, but in this document we simply use the term "network slice" to refer to this concept: this document only applies to the type of network slice described in . The network slice aims to offer a connectivity service to a network slice customer with specific Service Level Objectives (SLOs) and Service Level Expectations (SLEs) over a common underlay network. defines the terminologies and the characteristics of network slices. It also discusses the general framework, the components and interfaces for requesting and operating network slices. The concept of a Network Resource Partition (NRP) is introduced by as part of the realization of network slices. An NRP is a collection of network resources in the underlay network, which can be used to ensure the requested SLOs and SLEs of network slice services are met. describes a layered architecture and the candidate technologies in different layers for delivering enhanced VPN (VPN+) services. VPN+ aims to meet the needs of customers or applications which require connectivity services with advanced characteristics, such as the assurance of SLOs and specific SLEs. VPN+ services can be delivered by mapping one or a group of overlay VPNs to a virtual underlay network which is allocated with a set of network resources. The VPN+ architecture and technologies could be used for the realization of network slices and, in the context of network slicing, an NRP could be used to instantiate the virtual underlay network construct for VPN+. As the demand for network slice services increases, scalability (the number of network slices a network can support) becomes an important factor. Although the scalability of network slices can be improved by mapping a group of network slices to a single NRP, that design may not be suitable or possible for all deployments, thus there are concerns about the scalability of NRPs themselves. This document discusses some considerations for NRP scalability in the control and data planes. It also investigates a set of optimization mechanisms.

As described in , the connectivity constructs of network slices may be grouped together according to their characteristics (including SLOs and SLEs) and mapped to a given NRP. The grouping and mapping of network slices are policy-based and under the control of operator. For example, a policy can be considered by an operator to host a large number of network slices using a relatively small number of NRPs to reduce the amount of state information to be maintained in the underlay network. On the other hand, a one-to-one mapping between network slices and NRPs gives more fine-grained control of the network slices but comes at the cost of increased (per network slice) state in the underlay network. With the introduction of various services that require enhanced connectivity, it is expected that the number of network slices will increase. The potential number of network slices and the underlying NRPs are estimated by classifying the network slice deployment into three typical scenarios: Network slices can be used by a network operator to deliver different types of services. For example, in a multi-service network, different network slices can be created to carry, e.g., mobile transport services, fixed broadband services, and enterprise services respectively. Each type of service could be managed by a separate team. Some other types of service, such as multicast services, may also be deployed in a separate virtual underlay network. Then a separate NRP may be created for each service type. It is also possible that a network infrastructure operator provides network slice services to other network operators as wholesale services, and an NRP may also be needed for each wholesale service operator. In this scenario, the number of NRPs in a network could be relatively small, such as in the order of 10 or so. Network slice services can be requested by customers of industrial verticals, where the assurance of SLOs and the fulfilment of SLEs are contractually defined between the customer and the slice service provider, possibly including financial penalties in case the service provider fails to honor the contract (SLO or SLE). At the early stage of the vertical industrial deployment, a few customers in some industries will start using network slices to address the connectivity requirements and performance assurance raised by their business, such as smart grid, manufacturing, public safety, on-line gaming, etc. The realization of such network slices may require the provision of different NRPs for different industries, and some customers may require dedicated NRPs for strict service performance guarantees. Considering the number of vertical industries and the number of customers in each industry, the number of NRPs needed may be in the order of 100. With the advances in 5G and cloud networks, RFC XXXX network slices services could be widely used by customers of various vertical industries and enterprises who require guaranteed or predictable network service performance. The number of network slices may increase to the order of thousands. Accordingly, the number of NRPs needed may be in the order of 1000. In , the 3GPP defines a 32-bit identifier for a 5G network slice with an 8-bit Slice/Service Type (SST) and a 24-bit Slice Differentiator (SD). This allows mobile networks (the Radio Access Networks (RANs) and mobile core networks) to potentially support a large number of 5G network slices. It is likely that multiple 5G network slices may be mapped to a single RFC XXXX network slice, but in some cases (for example, for specific SST or SD) the mapping may be closer to one-to-one. This may require increasing number of RFC XXXX network slices, the number of required NRPs may increase as well. Thus the question of scalable network slice services arises. Mapping multiple network slices to a single NRP presents a significant scaling benefit, while a large number of NRPs may also be required, which raises scalability challenges too.

Scaling of network slicing uses a hierarchy of aggregation in order to achieve scalability. Multiple slices can be supported by a single NRP; multiple NRPs can be enabled on a filtered (logical) topology; and multiple filtered (logical) topologies utilise a single underlying network. The hierarchy, at any stage, may be made trivial (i.e., collapsed to a one-to-one mapping) according to the deployment objectives of the operator and the capabilities of the network technology. To recap, and in general terms:

The network slice is an edge-to-edge service.
The NRP is a set of network resources (e.g., buffers, bandwidth, queues) and assigned per-packet behaviors.
The filtered topology defines a set of network resources (call it a virtual network if it makes it easier for you to think about) on which path computation or traffic steering can be performed.

Scalability concerns exist at multiple points in the solution:

The control protocols must be able to handle the distribution of information necessary to support the slices, NRPs, and filtered topologies.
The network nodes must be able to handle the computational load of determining paths.
The forwarding engines must be able to access the information in packets and make forwarding decisions at line speed.
Path selection tools must be able to process network information and determine paths on demand.

Assuming that it is achievable, it is desirable for NRPs to have no more than small impact (zero being preferred) on the IGP information that is propagated today, and to not required additional SPF computations beyond those that are already required. Assuming that external mechanisms can deal with path selection, NRP identification should be decoupled from forwarding decisions for packets. Given all of these considerations, we can set out the following design principles:

A filtered topology is a subset of the underlying physical topology. Thus, it defines which links (and nodes) are eligible to be used by the NRPs. It may be selected as a set of links with particular characteristics, or it may be a set of forwarding paradigms applied to the topology. Thus, a filtered topology may be realised through multi-topology techniques (such as colored links), as a virtual TE topology, or using flex-algo.
It is not envisaged that there would be many filtered topologies active, and running SPF per filtered topology is not a high burden.
Multiple NRPs can run on a single filtered topology meaning that the NRPs can be associated with the same filtered topology and use that topology's SPF computation results.
Three separate things need to be identified by information carried within a packet:
- path
- NRP
- topology (i.e., filtered topology)
How this information is encoded (separate fields, same field, overloading existing fields) forms part of the solution work.
NRP IDs should have domain-wide scope, and must be unique within a filtered topology.
Configuration mechanisms are used to set up packet/resource treatments on nodes.
Configuration mechanisms (such as southbound protocols from a controller) are used to install bindings on network nodes between domain-wide resource treatment identifiers (NRP IDs) and configured packet treatment as per (6).
The path selection performed by or within a traffic engineering process, within or external to the head end node, (in particular the topology selection and path computation within that topology) may consider the characteristics of the filtered topology and the attributes of the NRP, but is agnostic to the resource treatment that the packets will receive within the network. Ensuring that the selected components of the path that are configured are capable of supporting the resource treatments identified by the NRP ID, is a separate matter.
The selected path is indicated in the packets using existing or new mechanisms. Whether that is SR-Policy (for some variety of SR), Flex-Algo (for whatever flex-algo expression you like), is something out of scope for now, but it will obviously form part of the full set of solution specifications.
The components or mechanisms that are responsible for deciding what path to select, for deciding how to mark the packets to follow the selected path, and for determining what resource treatment identifier (NRP ID) to apply to packets are also responsible for ensuring sufficient consistency so that the whole solution works.

The result of this is that different operators can choose to deploy things at different scales, and while we may have opinions about what scales are sensible / workable / desirable, we do not have to get WG agreement on that aspect. The routing protocols (IGP or BGP) do not need to be involved in any of these points, and it is important to isolate them from these aspects in order that there is no impact on scaling or stability. Furthermore, the complexity of SPF in the control plan is unaffected by this. Note that there is always a trade-off between optimal solutions and scalable solutions.

We need to achieve a scalable solution that can be deployed in all circumstances. We should acknowledge that:
- We may need some extensions to the data/control/management plane to achieve this result. I.e., it may be that this cannot be done today with existing tools.
- The scalable solution might not be optimal everywhere.
We must understand that optimal solutions are good for specific environments, but:
- Might not work in other environments
- May have scalability issues.

We should allow for both of these approaches, but we need to be clear of the costs and benefits in all cases in order that:

We support significant optimisations
Do not let non-scalable solutions creep into wider deployment.

In particular, we should be open to the use of approaches that do not require control plane extensions and that can be applied to deployments with limited scope. Included in this are:

Resource-aware SIDs
L3VPN

This section analyses the scalability of NRPs in the control plane and data plane to understand the possible gaps in meeting the scalability requirements.

The control plane for establishing and managing NRPs could be based on the combination of a centralized controller and a distributed control plane. The following subsections consider the scalability property of both the distributed and centralized control plane in such design.

In some networks, multiple NRPs may need to be created for the delivery of network slice services. Each NRP is associated with a logical topology. The network resource attributes and the associated topology information of each NRP may need to be exchanged among the network nodes. The scalability of the distributed control plane used for the distribution of NRP information needs to be considered from the following aspects: The number of control protocol instances maintained on each node The number of control protocol sessions maintained on each link The number of control messages advertised by each node The amount of attributes associated with each message The number of computations (e.g., SPF computation) executed by each node As the number of NRPs increases, it is expected that at least in some of the above aspects, the overhead in the control plane may increase in proportion to the number of the NRPs. For example, the overhead of maintaining separate control protocol instances (e.g., IGP instances) for each NRP is considered higher than maintaining the information of multiple NRPs in the same control protocol instance with appropriate separation, and the overhead of maintaining separate protocol sessions for different NRPs is considered higher than using a shared protocol session for exchanging the information of multiple NRPs. To meet the scalability and performance requirements with increasing number of NRPs, it is suggested to select the control plane mechanisms which have better scalability while can still provide the required functionality, isolation and security for the NRPs.

The use of centralized network controllers may help to reduce the amount of computation overhead in the distributed control plane, while it may also transfer some of the scalability concerns from network nodes to the network controllers, thus the scalability of the controller also needs to be considered. A centralized controller can have a global view of the network, and is usually used for Traffic Engineering (TE) path computation with various constraints, or the global optimization of TE paths in the network. To provide TE paths computation and optimization for multiple NRPs, the controller needs to keep the topology and resource information of all the NRPs up-to-date. And for some events such as link or node failures, the resulting updates to the NRPs may need to be distributed to the controller in real time, and may affect the planning and operation of some NRPs. When there is a significant change in the network which impacts multiple NRPs, or multiple NRPs require global optimization concurrently, there may be a heavy processing burden at the controllers, and a large amount of signaling traffic to be exchanged between the controller and corresponding NRP components. These need to be taken into consideration from a scalability and performance standpoints.

To provide different network slice services with the required SLOs and SLEs, it is important to allocate as many different subsets of network resources as there are different NRPs to avoid or reduce the risk of interference both between different network slice services and between slice services and other services in the network. With both the use cases and the number of NRPs increases, it is required that the underlay network can provide a finer granularity of network resource partitioning for more network slice services, which means the amount of state about the partitioned network resources to be maintained on the network nodes is likely to increase. Network slice service traffic needs to be processed and forwarded by network nodes according to a forwarding policy that is associated with the topology and the resource attributes of the NRP it is mapped to, this means that some fields in the data packet need to be used to identify the NRP and its associated topology and resources either directly or implicitly. Different approaches for encapsulating the NRP information in data packets may have different scalability implications. One practical approach is to reuse some of the existing fields in the data packet to additionally indicate the NRP the packet belongs to. For example, destination IP address or MPLS forwarding label may be reused to identify the NRP. This avoids the complexity of introducing new fields in the data packet, while the additional semantics introduced to the existing fields may require additional processing. Moreover, introducing NRP-specific semantics to existing identifiers in the packet may result in the amount of the existing identifiers increasing in proportion to the number of the NRPs. For example, if IP address is reused to further identify an NRP, for a node which participate in M NRPs, the amount of IP addresses needed for reaching this node in different NRPs would increase from 1 to M. This may cause scalability problems in networks where a relatively large number of NRPs is in operation. An alternative approach is to introduce a new dedicated field in the data packet for identifying an NRP. And if this new field carries a network wide unique NRP identifier (NRP ID), it could be used together with the existing fields to determine the packet forwarding behavior. The potential issue with this approach lies in the difficulty of introducing a new field in some of data plane technologies. In addition, the introduction of NRP-specific packet forwarding impacts the number of the forwarding entries maintained by the network nodes.

This section provides gap analysis of existing mechanisms which may be used to provide NRP identification in the data plane and the distribution of NRP related information using control plane protocols. One existing mechanism of building NRPs is to use resource-aware Segment Identifiers (either SR-MPLS or SRv6) to identify the allocated network resources in the data plane based on the mechanisms described in , and then distribute the resource attributes and the associated logical topology information in the control plane using mechanisms based on Multi-topology or Flex-Algo . This mechanism is suitable for networks where a relatively small number of NRPs are needed. As the number of NRPs increases, there may be several scalability challenges with this approach: The number of SR SIDs will increase in proportion to the number of NRPs in the network, which will bring challenges both to the distribution of SR SIDs and the related information in the control plane, and to the installation of forwarding entries for resource-aware SIDs in the data plane. If each NRP is associated with an independent logical topology or algorithm, the number of route computations (e.g., SPF computations) will increase in proportion to the number of NRPs in the network, which may introduce significant overhead to the control plane of network nodes. The maximum number of logical topologies supported by OSPF is 128, the maximum number of logical topologies supported by IS-IS is 4096, and the maximum number of Flexible Algorithms is 128. Some of these technologies may not meet the required number of NRPs in some network scenarios.

To support more network slice services while keeping the amount of network state at a reasonable scale, one basic approach is to classify a set of network slice services (e.g., services which have similar service characteristics and performance requirements) into a group, and such group of network slice services are mapped to one NRP, which is allocated with an aggregated set of network resources and the combination of the required logical topologies to meet the service requirements of the whole group of network slice services. Different groups of network slice services may be mapped to different NRPs, each of which is allocated with different set of network resources from the underlay network. According to operator's deployment policy, appropriate grouping of network slice services and mapping them to a set of NRPs with proper network resource allocation could still meet the network slice service requirements. However, in some network scenarios, such aggregation mechanism may not be applicable. The following sub-sections proposes further optimization in control plane and data plane respectively.

Several optimization mechanisms can be considered to reduce the distributed control plane overhead and improve its scalability. The first control plane optimization consists in reducing the amount of control plane sessions used for the establishment and maintenance of the NRPs. When multiple NRPs have the same connection relationship between two adjacent network nodes, it is proposed that one single control protocol session is used for these NRPs. The information specific to the different NRPs can be exchanged over the same control protocol session, with necessary identification information to distinguish the information of different NRPs in the control message. This could reduce the overhead of node in creating and maintaining a separate control protocol session for each NRP, and could also reduce the amount of control plane messages. The second control plane optimization is to decouple the resource information of the NRP from the associated logical topology information, so that the resource attributes and the topology attributes of the NRP can be advertised and processed separately. In a network, it is possible that multiple NRPs are associated with the same logical topology, or multiple NRPs may share the same set of network resources hosted by a specific set of network nodes and links. With topology sharing, it is more efficient to advertise only one copy of the topology information, and multiple NRPs deployed over the very same topology could exploit such topology information. More importantly, with this approach, the result of topology-based route computation could also be shared by multiple NRPs, so that the overhead of per NRP route computation could be avoided. Similarly, for the resource sharing case, information about a set of network resources allocated on a particular network node or link could be advertised in the control plane only once and then be referred to by multiple NRPs which share that set of resource.

>>> O | X | | \ | / \ | | O-----O-----O Underlay Network Topology Legend O Virtual node ### Virtual links with a set of reserved resources *** Virtual links with another set of reserved resources Figure 1. Topology Sharing between NRPs ]]> Figure 1 gives an example of two NRPs that share the same logical topology. NRP-1 and NRP-2 are associated with the same logical topology, while the resource attributes of each NRP are different. In this case, the information of the shared network topology can be advertised using either MT or Flex-Algo, then the two NRPs can be associated with the same MT or Flex-Algo, and the outcomes of topology-based route computation can be shared by the two NRPs for further generating the corresponding NRP-specific routing and forwarding entries.

>>> O | X | | \ | / \ | | O-----O-----O Underlay Network Topology Legend O Virtual node ### Virtual links with a set of reserved resource *** Virtual links with another set of reserved resource Figure 2. Resource Sharing between NRPs]]> Figure 2 gives another example of two NRPs which have different logical topologies, while they share the same set of network resources on a subset of the links. In this case, the information about the shared resources allocated on the those links needs to be advertised only once, then both NRP-1 and NRP-2 can refer to the common set of allocated link resource for constraint based path computation. The control protocol extensions for support of scalable NRP are out of the scope of this document and are specified in relevant documents such as .

For the optimization of the centralized control plane, it is suggested that the centralized controller is used as a complementary computational facility to the distributed control plane rather than a replacement, so that the workload for NRP-specific path computation can be shared by both the centralized controller and the network nodes. In addition, the centralized controller may be realized with multiple network entities, each of which is responsible for one subset or region of the network. This is the typical approach for scale out of the centralized controller.

One optimization in the data plane consists in decoupling the identifiers used for topology-based forwarding from the identifier used for the NRP-inferred resource-specific processing. One possible mechanism is to introduce a dedicated network-wide NRP Identifier (NRP ID) in the packet header to uniquely identify the set of local network resources allocated to an NRP on each participating network node and link for the processing of packets. Then the existing identifiers in the packet header used for topology based forwarding (e.g., destination IP address, MPLS forwarding labels) are kept unchanged. The benefit is the amount of the existing topology-specific identifiers will not be impacted by the increasing number of NRPs. Since this new NRP ID field will be used together with other existing fields of the packet to determine the packet forwarding behavior, this may require network nodes to maintain a hierarchical forwarding table in data plane. Figure 3 shows the concept of using separate data plane identifiers for topology-specific and resource-specific packet forwarding and processing purposes.

In an IPv6 network, this could be achieved by introducing a dedicated field in either the IPv6 base header or the extension headers to carry the NRP ID for the resource-specific forwarding, while keeping the destination IP address field used for routing towards the destination prefix in the corresponding topology. Note that the NRP ID needs to be parsed by every node along the path which is capable of NRP-aware forwarding. introduces the mechanism of carrying the VTN resource ID (which is equivalent to NRP ID in the context of network slicing) in IPv6 Hop-by-Hop extension header. In an MPLS network, this may be achieved by inserting a dedicated NRP ID either in the MPLS label stack or a specific field that follows the MPLS label stack. Thus the existing MPLS forwarding labels are used for topology-specific packet forwarding purposes, and the NRP ID is used to determine the set of network resources for packet processing. This requires that both the forwarding label and the NRP ID are parsed by nodes along the forwarding path of the packet, and the forwarding behavior may depend on the position of the NRP ID in the packet. The detailed extensions to MPLS is currently under discussion as part of the work conducted by the MPLS Working Group, and is out of the scope of this document.

Based on the analysis provided by this document, the control and data plane for NRP need to evolve to support the increasing number of network slice services and the increasing number of NRPs in the network. This section describes the foreseeable solution evolution taking the SR-based NRP solutions as an example, while the analysis and optimization in this document are generic and not specific to SR. First, by introducing resource-awareness with specific SR SIDs , and using Multi-Topology or Flex-Algo mechanisms to define the logical topology of the NRP, providing a limited number of NRPs in the network is possible, and can meet the requirements for a relatively small number of network slice services. This mechanism is called the "basic SR-based NRP". As the required number of network slice services increases, more NRPs may be needed, then the control plane scalability could be improved by decoupling the topology attributes from the resource attributes, so that multiple NRPs could share the same topology or resource attributes to reduce the overhead. The data plane can still rely on the resource-aware SIDs. This mechanism is called the "scalable SR-based NRP". Both the basic and the scalable SR-based NRP mechanisms are described in . Whenever the data plane scalability becomes a concern, a dedicated NRP ID can be introduced in the data packet to decouple the resource-specific identifiers from the topology-specific identifiers in the data plane, so as to reduce the number of IP addresses or SR SIDs needed in supporting a large number of NRPs. This is called the NRP-ID-based mechanism.

The instantiation of NRPs require NRP-specific configurations of the participating network nodes and links. There can also be cases where the topology or the set of network resources allocated to an existing NRP needs to be modified. Of course, the amount of configurations for NRP instantiation and modification will increase with the number of NRPs. For the management and operation of NRPs and the optimization of paths within the NRPs, the status of NRPs needs to be monitored and reported to the network controller. The increasing number of NRPs would require additional NRP status information to be monitored.

This document discusses scalability considerations about the network control plane and data plane of NRPs in the realization of network slice services, and investigates some mechanisms for scalability optimization. As the number of NRPs supported in the data plane and control plane of the network can be limited, this may be exploited as an attack vector by requesting a large number of network slices, which then result in the creation of a large number of NRPs. One protection against this is to improve the scalability of the system to support more NRPs. Another possible solution is to make the network slice controller aware of the scaling constraints of the system and dampen the arrival rate of new network slices and NRPs request, and raise alarms when the thresholds are crossed. The security considerations in and also apply to this document.

This document makes no request of IANA.

The authors would like to thank Adrian Farrel, Dhruv Dhody, Donald Eastlake, Kenichi Ogaki, Mohamed Boucadair, Christian Jacquenet and Kiran Makhijani for their review and valuable comments to this document. Thanks, also, to the ad hoc design team of Les Ginsberg, Pavan Beeram, John Drake, Tarek Saad, Francois Clad, Tony Li, Adrian Farrel, Joel Halpern, and Peter Psenak who contributed substantially to establishing the design principles for scaling network slices.