<?xml version="1.0" encoding="US-ASCII"?>
<!-- This template is for creating an Internet Draft using xml2rfc,
     which is available here: http://xml.resource.org. -->
<!DOCTYPE rfc SYSTEM "rfc2629.dtd" [
<!-- One method to get references from the online citation libraries.
     There has to be one entity for each item to be referenced.
     An alternate method (rfc include) is described in the references. -->
<!ENTITY RFC2119 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.2119.xml">
<!ENTITY RFC6241 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.6241.xml">
<!ENTITY RFC7950 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7950.xml">
<!ENTITY RFC7149 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7149.xml">
<!ENTITY RFC7426 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.7426.xml">
<!ENTITY RFC8299 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8299.xml">
<!ENTITY RFC8309 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8309.xml">
<!ENTITY RFC8340 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8340.xml">
<!ENTITY RFC8453 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8453.xml">
<!ENTITY RFC8174 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8174.xml">
<!ENTITY RFC8345 SYSTEM "http://xml.resource.org/public/rfc/bibxml/reference.RFC.8345.xml">
]>
<?xml-stylesheet type='text/xsl' href='rfc2629.xslt' ?>
<!-- used by XSLT processors -->
<!-- For a complete list and description of processing instructions (PIs),
     please see http://xml.resource.org/authoring/README.html. -->
<!-- Below are generally applicable Processing Instructions (PIs) that most I-Ds might want to use.
     (Here they are set differently than their defaults in xml2rfc v1.32) -->
<?rfc strict="yes" ?>
<!-- give errors regarding ID-nits and DTD validation -->
<!-- control the table of contents (ToC) -->
<?rfc toc="yes"?>
<!-- generate a ToC -->
<?rfc tocdepth="4"?>
<!-- the number of levels of subsections in ToC. default: 3 -->
<!-- control references -->
<?rfc symrefs="yes"?>
<!-- use symbolic references tags, i.e, [RFC2119] instead of [1] -->
<?rfc sortrefs="yes" ?>
<!-- sort the reference entries alphabetically -->
<!-- control vertical white space
     (using these PIs as follows is recommended by the RFC Editor) -->
<?rfc compact="yes" ?>
<!-- do not start each main section on a new page -->
<?rfc subcompact="no" ?>
<!-- keep one blank line between list items -->
<!-- end of list of popular I-D processing instructions -->
<rfc category="std" docName="draft-he-netconf-adaptive-collection-usecases-01"
     ipr="trust200902">
  <front>
    <title abbrev="Adaptive Traffic Data Collection">Problem Statement and Use
    Cases of Adaptive Traffic Data Collection</title>

    <author fullname="Xiaoming He" initials="X." surname="He">
      <organization>China Telecom</organization>

      <address>
        <email>hexm4@chinatelecom.cn</email>
      </address>
    </author>

    <author fullname="Dongfeng Mao" initials="X." surname="Mao">
      <organization>China Telecom</organization>

      <address>
        <email>maodf@chinatelecom.cn</email>
      </address>
    </author>

    <author fullname="Qiufang Ma" initials="Q." surname="Ma">
      <organization>Huawei</organization>

      <address>
        <postal>
          <street>101 Software Avenue, Yuhua District</street>

          <city>Nanjing</city>

          <region>Jiangsu</region>

          <code>210012</code>

          <country>China</country>
        </postal>

        <email>maqiufang1@huawei.com</email>
      </address>
    </author>

    <author fullname="Tianran Zhou" initials="T." surname="Zhou">
      <organization>Huawei</organization>

      <address>
        <email>zhoutianran@huawei.com</email>
      </address>
    </author>

    <date year="2022"/>

    <area>ops</area>

    <workgroup>NETCONF Working Group</workgroup>

    <keyword>Adaptive Telemetry</keyword>

    <abstract>
      <t>IP carrier network needs to provide real-time traffic visibility to
      help network operators quickly and accurately locate network congestion
      and packet loss, and make timely path adjustment for deterministic
      services in order to avoid congestion. It is essential to explore the
      adaptive traffic data collection mechanism so as to capture real-time
      network state at minimum resource consumption.</t>

      <t>This document summarizes the problems currently faced by network
      operators when attempting to provide timely traffic data collection to
      satisfy various scenarios that require real-time network state and
      traffic visibility, and aggregates the requirements for adaptive traffic
      collecting mechanism from a variety of deployment scenarios.</t>
    </abstract>
  </front>

  <middle>
    <section anchor="Introduction" title="Introduction">
      <t>With the advent of cloud computing, big data and Artificial
      Intelligence (AI) , as well as the large-scale deployment of 5G mobile
      communication technology, a large number of ultra Reliable &amp; Low
      Latency Communication (uRLLC) services such as Augmented Reality
      (AR)/Virtual Reality (VR), Industrial Internet and Computing Power
      Network (CPN) have emerged, which puts forward higher requirements for
      service quality of IP carrier networks. IP carrier networks need to
      provide real-time traffic visibility to help network operators quickly
      and accurately locate network congestion and packet loss, and make
      timely path adjustment for the services of deterministic delay in order
      to avoid the congested nodes and links. For such business scenarios, the
      network needs to provide traffic sampling interval of sub seconds or
      even milliseconds level so as to gain real-time network state. </t>

      <t>For decades, SNMP [RFC3416] and the Management Information Bases
      (MIBs) have been widely deployed and the de facto choice for many
      monitoring solutions, especially in collecting interface traffic.
      Arguably the biggest shortcoming of SNMP for those applications concerns
      the need to rely on periodic polling, because it introduces an
      additional load on the network and devices, and it is brittle if polling
      cycles are missed. Therefore, SNMP has no capability to realize
      real-time traffic sampling at sub seconds or even milliseconds
      intervals. Telemetry, as a revolutionary data acquisition technique,
      based on pull mechanism that is able to deliver object changes as they
      happen, overcomes the limitations of SNMP such as "low sampling rate,
      inefficiency and more processing resources". Nevertheless, for the sake
      of capturing real- time network state, persistent sampling of interface
      traffic at milliseconds intervals will generate a considerable amount of
      data which may claim too much transport bandwidth and overload the
      servers for data collection, storage and analysis. Increasing the data
      handling capacity is technically feasible but expensive, and difficult
      to achieve large-scale deployment in operator's networks. It is
      essential to explore the adaptive traffic data collection mechanism so
      as to capture real-time network state at minimum resource consumption.
      </t>

      <t> This document summarizes the problems currently faced by network
      operators when attempting to provide timely traffic data collection to
      satisfy the aforementioned new services and applications that require
      real-time network state and traffic visibility. Also, this document
      aggregates the requirements for adaptive traffic collection mechanism
      from a variety of deployment scenarios. </t>

      <section title="Abbreviations">
        <t><list style="hanging">
            <t hangText="AI: ">Artificial Intelligence<vspace
            blankLines="1"/></t>

            <t hangText="AR: ">Augmented Reality<vspace blankLines="1"/></t>

            <t hangText="VR: ">Virtual Reality<vspace blankLines="1"/></t>

            <t hangText="CPN: ">Computing Power Network<vspace
            blankLines="1"/></t>

            <t hangText="gNMI: ">Google Network Management Interface<vspace
            blankLines="1"/></t>

            <t hangText="IP RAN: ">IP Radio Access Network<vspace
            blankLines="1"/></t>

            <t hangText="DetNet: ">Deterministic Networking<vspace
            blankLines="1"/></t>

            <t hangText="QoE: ">Quality of Experience<vspace
            blankLines="1"/></t>

            <t hangText="SLA: ">Service Level Agreement<vspace
            blankLines="1"/></t>

            <t hangText="uRLLC: ">ultra Reliable &amp; Low Latency
            Communication<vspace blankLines="1"/></t>

            <t hangText="NMS: ">Network Management System<vspace
            blankLines="1"/></t>

            <t hangText="IDC: ">Internet Data Center<vspace
            blankLines="1"/></t>

            <t hangText="SNMP: ">Simple Network Management Protocol<vspace
            blankLines="1"/></t>

            <t hangText="MIB: ">Management Information Base<vspace
            blankLines="1"/></t>
          </list></t>
      </section>
    </section>

    <section anchor="terminology" title="Terminology">
      <t>The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
      "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
      "OPTIONAL" in this document are to be interpreted as described in BCP 14
      <xref target="RFC2119"/> <xref target="RFC8174"/> when, and only when,
      they appear in all capitals, as shown here.</t>

      <t>The following terms are defined in this document:<list
          style="hanging">
          <t hangText="adaptive traffic data collection: ">Allow servers
          automatically switch to different telemetry sampling period to
          collect traffic data according to the threshold change.<vspace
          blankLines="1"/></t>
        </list></t>
    </section>

    <section title="Problem Statement">
      <t>As is well known ,IP network, based on statistical multiplexing
      model, is of traffic burst characteristic. In order to avoid congestion,
      network operators have been keeping network utilization at a rather low
      level. For a long time, operators have obtained traffic visibility from
      the Network Management System (NMS), and satisfied with 30~40% bandwidth
      utilization from traffic statistics curves. In spite of such a low link
      usage, many complaints have still been received about poor Quality of
      Experience (QoE) in delivering applications with the sensitivity of
      delay and packet loss. The fundamental cause lies in that the observed
      average network traffic at every sampling cycle masks the characteristic
      of traffic burst, given that SNMP is widely employed in operator's
      networks to collect interface traffic at 5 minutes intervals. Because of
      low sampling rate, SNMP has no capability to capture traffic burst
      characteristic. </t>

      <t>A large quantity of laboratory data and network operational data
      indicate that a microburst phenomenon occurs frequently in operator's
      carrier networks, such as IP based Radio Access Network (IP RAN), IP
      metropolitan network, IP backbone network and Internet Data Center
      (IDC). The typical duration of such a microburst is tens to hundreds of
      milliseconds, easy to cause instantaneous congestion of interface output
      queue. Network congestion amplifies queuing delay and jitter, and in
      severe cases, it may even cause packet loss. Thus, the congestion caused
      by microburst is not beneficial to the deterministic-delay applications.
      The congestion problem is a major challenge for IP networks, and the
      congestion caused by microburst is difficult to eliminate, but must be
      avoided. </t>

      <t>Although the mechanism of microburst is not very distinct, it does
      not hinder us to detect it. Fortunately, Telemetry (e.g., YANG PUSH
      <xref target="RFC8639"/> <xref target="RFC8641"/>, gNMI <xref
      target="gNMI"/>) has the capability to collect interface traffic at a
      higher frequency, i.e., milliseconds interval. So, by means of telemetry
      technique, we can capture the complete aspects of a microburst traffic.
      However, it is impractical to gain the real-time traffic visibility at
      the cost of persistent sampling at millisecond intervals. For example,
      in order to capture a microburst traffic of interface, at least
      10-millisecond sampling cycle is necessary, and as a result, the
      required resources for data storage and analysis will increase by 30000
      times, compared with the today's widely employed 5-minute sampling cycle
      based on SNMP. </t>

      <t>It is essential to investigate the adaptive traffic data collection
      mechanism so as to capture real-time network state at minimum resource
      consumption. Generally speaking, under normal non-congested network
      conditions, which happen at the time of 95% above, minutes-level
      sampling cycle is enough because of almost invariable forwarding delay
      and less jitter of interface. However, when detecting a congestion state
      or congestion trend, sampling period must be timely tuned to
      milliseconds to capture a microburst traffic of interface. A congestion
      state or congestion trend of interface manifests itself in the form of
      packet loss due to queue overflow, queue depth beyond the threshold or
      too high link utilization, which can be defined as Event-triggered data.
      Such event data can be actively pushed through subscription or passively
      polled through query. Although the microburst phenomenon occurs
      frequently, it is transient and a real-time detection tool is preferable
      to pinpoint it timely. The traditional method of using CPU on main
      control board through query is processing resources consuming, the
      network device must possess built-in hardware designed specially to
      monitor it. </t>

      <t>In order to reduce the excessive consumption of resources caused by
      milliseconds-level collection of the single data, batch data such as
      hundreds of sampled traffic data from an interface can be packaged as a
      telemetry packet and is sent to the collector. The timestamp is required
      for every sampled traffic data for the convenience of the collector
      visualizing the interface traffic trend in the form of curve. And the
      collector must make traffic visualization in real-time manner so that
      the operators can observe it immediately. </t>
    </section>

    <section title="Scenarios of Adaptive Traffic data collection">
      <t>This section presents several typical scenarios which require
      adaptive traffic data collection to gain real-time network state and
      traffic visibility at minimum resource consumption.</t>

      <section title="Multi-dimensional real-time portrait of interface traffic characteristic">
        <t>Interface traffic data collection is one of the most important
        functions for NMS. Today, more and more applications are of
        latency-sensitive and loss-sensitive characteristic, and the real-time
        traffic visibility can help operators better understand network
        performance so as to achieve SLA guarantees. On the other hand,
        obtaining the holistic and genuine characteristic of interface traffic
        is also a basic requirement for the statistical multiplexing model of
        IP network, which is of great significance for traffic prediction,
        network planning, network capacity expansion, network optimization,
        etc. For example, a higher long-term average utilization prompts need
        of capacity expansion; a higher ratio between the peak and the
        average, as well as frequent microbursts detected, implies a intense
        traffic burst characteristic, suggesting the timely path adjustment
        for those key traffic flows of deterministic delay. However, the
        traditional NMS based on SNMP has no capability to depict genuine
        characteristic of interface traffic, and interface traffic data
        collection based on telemetry techniques is preferable. </t>

        <t>It is essential to exploit the adaptive traffic data collection
        techniques to depict multi-dimensional real-time portrait of interface
        traffic characteristic at minimum resource consumption. That is to
        say, in normal non-congested network conditions, which happen at the
        time of 95% above, minutes-level sampling cycle is enough as it is.
        But, while detecting a congestion state or congestion trend, sampling
        cycle must be timely tuned to milliseconds to capture a microburst
        traffic of interface. Such an adaptive traffic data collection
        technique can not only reflect the coarse-grained interface traffic
        characteristics, but also capture the congestion state of interface
        with finer time granularity. Because the traffic data collection with very high rate is seldom (i.e., only triggered by the detected microbursts), we can depict multi-dimensional real-time portrait of interface traffic characteristic at minimum resource consumption. Because of the lower
        cost, it can be deployed on large-scale in operator's networks.</t>
      </section>

      <section title="Microburst traffic detecting">
        <t>Microburst traffic, as an instantaneous congestion phenomenon
        occurring frequently in IP carrier network, will cause critical delay
        jitter and even packet loss, which will seriously affect the QoE of
        latency-sensitive and loss-sensitive applications. The ability of
        detecting microburst traffic of interface will help network operators
        quickly and accurately locate network congestion and packet loss, and
        make timely path adjustment for deterministic-delay services in order
        to avoid the congested nodes and links. In order to have a
        comprehensive understanding of microburst, we must timely collect
        interface traffic as soon as it occurs. For example, how often does it
        occur? and what duration does it last? only event data representing a
        microburst such as packet loss and queue length beyond threshold are
        not enough to describe its characteristic.</t>

        <t> Because the typical duration of such a microburst is generally
        tens to hundreds of milliseconds, at least 10-millisecond sampling
        cycle is necessary. Although the microburst phenomenon occurs
        frequently, it takes very little time of 24 hours a day. It is not a
        good approach to observe it through persistent millisecond sampling
        period. Preferably, we can capture it as soon as a microburst occurs
        to ensure important diagnose data will not be missed. Because it is
        transient, and an on-line detection tool based on the dedicated
        hardware is required to pinpoint it timely. Triggered by the events
        such as packet loss, queue depth beyond the threshold which is
        detected timely, sampling period must be timely tuned to milliseconds
        to capture a microburst traffic of interface. In a word, it is of
        practical significance to explore the microburst detection technique
        aiming at minimizing resource consumption.</t>
      </section>

      <section title="Congestion avoidance for deterministic services">
        <t>Network congestion will rapidly increase queuing delay and jitter,
        and may even give rise to packet loss, which will seriously affect the
        QoE of delay-sensitive and packet loss-sensitive applications. The
        goal of network optimization is to reduce the occurrence of network
        congestion as much as possible. </t>

        <t>It is a complicated problem for network operators to accurately
        predict the trend of network congestion and make network adjustment in
        advance. The real-time traffic visibility based on the adaptive
        traffic data collection techniques can accurately predict the long-
        term congestion, and quickly capture the instantaneous congestion
        (i.e., microburst) of interface. By means of the real-time traffic
        visibility, the automatic optimization tool (e.g., AI) can make timely
        path adjustment for key traffic flows. For example, based on the
        real-time traffic visibility and microburst events (e.g., packet loss,
        queue depth) collected, the controller can accurately predict the
        congestion trend of interface and make timely traffic redirection to
        the non-congested interface for deterministic delay applications. </t>
      </section>

      <section title="On-path telemetry based on adaptive traffic sampling">
        <t>On-path telemetry (e.g., IOAM <xref target="RFC9197"/>) is useful
        for application-aware networking operations. For example, it is
        critical for the operators who offer high-bandwidth, latency and
        loss-sensitive services such as video streaming and online gaming to
        closely monitor the relevant flows in real-time as the basis for any
        further optimizations. Applying on- path telemetry on all packets of
        the selected flows is resource consuming. A sampling rate should be
        set for these flows and only enable telemetry on the sampled packets.
        However, a too high rate would exhaust the network resource and even
        cause packet drops; an overly low rate, on the contrary, would result
        in the loss of information and inaccuracy of measurements. </t>

        <t>An adaptive approach can be used based on the network conditions to
        dynamically adjust the sampling rate. In normal network state, a low
        sampling rate is enough to reflect network performance (i.e., almost
        invariable forwarding delay and less jitter of interface) ; But, in
        the case of network congestion, the controller is aware of it from the
        real- time traffic visibility and events data collected (e.g., packet
        loss, queue depth), and timely adjust the packet sampling rate at very
        high level. Even all packets of the selected flows are applicable to
        be sampled so as to acquire actual measurement data such as latency,
        jitter and packet loss. </t>

        <t>Similarly, such an adaptive approach can applicable to the
        traditional active measurement methods (e.g., a Two-Way Active
        Measurement Protocol (TWAMP)<xref target="RFC5357"/>), so as to
        improve measurement accuracy at minimal resource consumption. In the
        case of normal non-congested conditions, the probing packets are send at longer
        intervals, But, in case of network congestion caused by
        microburst, the controller is aware of it from the real- time traffic
        visibility, and change the probing packets to the shorter intervals
        timely, which can capture the microburst traffic and therefore get
        real measurements of congestion state. </t>
      </section>
    </section>

    <section anchor="IANA" title="IANA Considerations">
      <t>This document does not include an IANA request.</t>
    </section>

    <section anchor="scecurity" title="Security Considerations">
      <t>This document provides an adaptive telemetry mechanism to minimize
      the resource consumption. The increased complexity of network telemetry
      may give rise to some security concerns. For example, persistent traffic
      collection at very high rate (e.g., at milliseconds intervals) induced
      by misconfiguration or spurious triggering might exhaust resources of
      network device as well as the collector; Also, an inappropriate
      threshold setting which trigger high sampling rate should be avoided.
      Therefore, access control for enabling and disabling adaptive telemetry
      is required , also, rate control for collecting telemetry data is
      recommended so as to avoid degradation of network performance. </t>

      <t>On the other hand, for security considerations of telemetry
      management interface such as NETCONF or gNMI, it must provide
      authentication, data integrity,confidentiality, and replay protection.
      The lowest NETCONF layer is the secure transport layer, and the
      mandatory-to-implement secure transport is Secure Shell (SSH) <xref
      target="RFC6242"/>. The lowest gNMI layer is HTTPS, and the
      mandatory-to-implement secure transport is TLS <xref target="RFC5246"/>.
      And further study of the security issues will be required. </t>
    </section>
  </middle>

  <back>
    <references title="Normative References">
      <?rfc include="reference.RFC.2119.xml"?>

      <?rfc include="reference.RFC.5246.xml"?>

      <?rfc include="reference.RFC.6242.xml"?>

      <?rfc include="reference.RFC.8639.xml"?>

      <?rfc include="reference.RFC.8641.xml"?>
    </references>

    <references title="Informative References">
      <?rfc include="reference.RFC.3416.xml"?>

      <?rfc include="reference.RFC.5357.xml"?>

      <?rfc include="reference.RFC.8174.xml"?>

      <?rfc include="reference.RFC.9197.xml"?>

      <reference anchor="gNMI">
        <front>
          <title>https://github.com/openconfig/gnmi</title>

          <author>
            <organization/>
          </author>

          <date/>
        </front>
      </reference>
    </references>
  </back>
</rfc>
