<?xml version="1.0" encoding="UTF-8"?>
<rfc category="std" docName="draft-zeng-opsawg-applicability-mcp-a2a-00"
     ipr="trust200902" submissionType="IETF" consensus="true"
     xml:lang="en" version="3" xmlns:xi="http://www.w3.org/2001/XInclude">
  <front>
    <title abbrev="MCP-A2A-Gap">When NETCONF Is Not Enough: Applicability of MCP and A2A for Advanced Network Management Scenarios</title>
        <author fullname="Guanming Zeng" initials="G." surname="Zeng">
      <organization>Huawei</organization>
      <address><email>zengguanming@huawei.com</email></address>
    </author>
    <author fullname="Jianwei Mao" initials="J." surname="Mao">
      <organization>Huawei</organization>
      <address><email>maojianwei@huawei.com</email></address>
    </author>
    <author fullname="Bing Liu" initials="B." surname="Liu">
      <organization>Huawei</organization>
      <address><email>leo.liubing@huawei.com</email></address>
    </author>
    <author fullname="Nan Geng" initials="N." surname="Geng">
      <organization>Huawei</organization>
      <address><email>gengnan@huawei.com</email></address>
    </author>
    <author fullname="Xiaotong Shang" initials="X." surname="Shang">
      <organization>Huawei</organization>
      <address><email>shangxiaotong@huawei.com</email></address>
    </author>
    <author fullname="Qiangzhou Gao" initials="Q." surname="Gao">
      <organization>Huawei</organization>
      <address><email>gaoqiangzhou@huawei.com</email></address>
    </author>
    <author fullname="Zhenbin Li" initials="Z." surname="Li">
      <organization>Huawei</organization>
      <address><email>robinli314@163.com</email></address>
    </author>
    <date year="2025"/>
    <abstract>
      <t>NETCONF provides robust configuration transactions and YANG-based data models, but falls short in scenarios requiring AI-driven semantic translation, long-lived cross-domain orchestration, multi-agent consensus, rapid DevOps iteration, or delivery of large non-configuration artifacts. This document systematically analyzes the functional gaps and presents Model Context Protocol (MCP) and Agent-to-Agent (A2A) as complementary solutions. Implementation guidance and coexistence models are also provided.</t>
    </abstract>
  </front>

  <middle>
    <section title="Introduction">
      <t>NETCONF <xref target="RFC6241"/> remains the gold standard for network configuration transactions. However, five emerging scenarios expose its fundamental limitations: </t>
      <ul spacing="normal">
        <li>(1) AI natural-language intent </li>
        <li>(2) Long-flow cross-controller orchestration </li>
        <li>(3) multi-agent consensus </li>
        <li>(4) weekly DevOps release cycles </li>
        <li> (5) multi-modal artifact delivery </li>
      </ul>
      <t>This document identifies objective gaps and specifies when and how MCP <xref target="I-D.yang-nmrg-mcp-nm"/> and A2A <xref target="I-D.google-agent2agent"/> should be engaged.</t>
    </section>

<section title="Gap Analysis Summary" anchor="gap">
  <t>This section enumerates the fundamental gaps between NETCONF and the advanced management scenarios introduced in Section 1.  For each gap, the table below identifies:</t>
  <ul spacing="normal">
    <li>the missing capability,</li>
    <li>its root cause in NETCONF design, and</li>
    <li>the protocol (MCP or A2A) that natively provides it.</li>
  </ul>

  <table anchor="gap-table">
    <thead>
      <tr>
        <th align="left">Gap</th>
        <th align="left">Root Cause in NETCONF</th>
        <th align="left">MCP/A2A Solution</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>AI Semantic Layer</td>
        <td>XML-centric, no function registry</td>
        <td>MCP /tools/list + JSON-Schema</td>
      </tr>
      <tr>
        <td>Long-Flow Orchestration </td>
        <td>No Task life-cycle or human-in-the-loop</td>
        <td>A2A Task state machine</td>
      </tr>
      <tr>
        <td>Multi-Agent Consensus</td>
        <td>Client-server only; no peer negotiation</td>
        <td>A2A AgentCard + Message</td>
      </tr>
      <tr>
        <td>Weekly DevOps Iteration</td>
        <td>YANG revision 6-9 months; firmware lock</td>
        <td>MCP Tool hot-register</td>
      </tr>
      <tr>
        <td>Large Artifact Delivery</td>
        <td>64 kB chunk; no MIME/hash/URL</td>
        <td>MCP/A2A Artifact (cloud URL)</td>
      </tr>
    </tbody>
  </table>

    <t>The gaps are not implementation defects but architectural invariants of RFC 6241.  They become blocking only in the five advanced scenarios identified.  Outside these scenarios, NETCONF continues to provide the most robust configuration transactions and should remain the south-bound protocol of choice.</t>
</section>

    <section title="When MCP Must Be Used">
      <section title="AI Natural-Language Intent" anchor="ai-intent">
  <t>Operators increasingly expect to issue instructions in natural language: “Raise MTU to 9000 for all Beijing core switches” or “Block source 1.2.3.4 for 30 minutes”.  NETCONF requires an edit-config XML blob with exact leaf names and namespaces; even experienced engineers make syntax mistakes under time pressure. </t>

  <t>The root cause is architectural:</t>
  <ul spacing="normal">
    <li>XML is attribute-heavy and case-sensitive; forgotten namespaces or mismatched quotes silently fail.</li>
    <li>There is no machine-discoverable “function catalogue” — an LLM must rely on static prompt examples which drift as models evolve.</li>
    <li>Multi-vendor differences (OpenConfig vs. proprietary YANG) force the LLM to choose branches inside the XML, exploding prompt size.</li>
  </ul>

  <t>MCP solves these issues with three primitives:</t>
  <ul spacing="normal">
    <li><tt>/tools/list</tt> — JSON array of callable functions, each carrying human-readable description and JSON-Schema input.</li>
    <li>JSON-Schema — strong-typed, no namespaces, direct mapping to primitive types (string, integer, enum, array).</li>
    <li>JSON-RPC 2.0 — single-line request/response, easily parsed by LLM and by controller gateways.</li>
  </ul>


  <t>Example MCP Tool Descriptor (simplified):</t>
  <figure>
    <artwork><![CDATA[{
  "name": "batch_set_mtu",
  "description": "Set interface MTU on multiple devices",
  "inputSchema": {
    "type": "object",
    "properties": {
      "device_group": {"type": "string"},
      "mtu": {"type": "integer", "minimum": 576, "maximum": 9216}
    },
    "required": ["device_group", "mtu"]
  }
}]]></artwork>
  </figure>

  <t>The LLM now produces:</t>
  <figure>
    <artwork><![CDATA[{
  "jsonrpc": "2.0",
  "method": "batch_set_mtu",
  "params": {"device_group": "beijing-core", "mtu": 9000},
  "id": 1
}]]></artwork>
  </figure>

</section>
      
      <section title="Rapid Model Iteration (DevOps Week-Release)" anchor="devops">
  <t>Cloud-era value-added services must be deployed within days, not months.  NETCONF's revision cycle (IETF draft → RFC: 6-9 months) and firmware upgrade windows (≤ 1 per year) are incompatible with weekly release trains.  The blocking points are:</t>
  <ul spacing="normal">
    <li>YANG module must be burned into firmware before the first config leaf is usable;</li>
    <li>Controller regression suite recompiles the entire YANG tree even for a single new leaf;</li>
    <li>Backward-compatibility review (must not break old devices) stretches internal QA to weeks.</li>
  </ul>

  <t>MCP breaks the deadlock by treating "intent" as a hot-swappable Tool rather than a permanent YANG node:</t>
  <ul spacing="normal">
    <li>Private YANG is compiled to JSON-Schema in the controller (milliseconds);</li>
    <li>Tool registers via <tt>/tools/register</tt> and is immediately callable;</li>
    <li>Gray-list rollout (10 % → 30 % → 100 %) and instant rollback (re-register previous Tool) are done without touching device flash.</li>
  </ul>

  <t>Example: Cloud-Shield DDoS Cleaning Service</t>
  <figure>
    <artwork><![CDATA[Private YANG (120 lines)
  +--rw start-cleanse
  | +--rw target-ip      inet:ipv4-address
  | +--rw bandwidth-Mbps uint32
  | +--rw duration-min   uint16]]></artwork>
  </figure>

  <t>Compiled JSON-Schema and registered in 30 s:</t>
  <figure>
    <artwork><![CDATA[{
  "name": "start_cleanse",
  "version": "1.0.0",
  "description": "Start DDoS cleanse for target IP",
  "inputSchema": {
    "type": "object",
    "properties": {
      "target_ip": {"type": "string", "format": "ipv4"},
      "bandwidth_Mbps": {"type": "integer", "minimum": 100, "maximum": 100000},
      "duration_min": {"type": "integer", "minimum": 5, "maximum": 1440}
    },
    "required": ["target_ip", "bandwidth_Mbps"]
  }
}]]></artwork>
  </figure>

  <t>Thus MCP is mandatory for any management surface that must support weekly or daily release cycles without waiting for firmware or standards body timelines.</t>
</section>


    </section>

    <section title="When A2A Must Be Used">
      <section title="Cross-Controller Long-Flow Orchestration" anchor="long-flow">
  <t>Maintenance windows for core-network upgrades often exceed 30 minutes and span multiple vendor domains.  NETCONF provides atomic configuration on a single controller, but lacks:</t>
  <ul spacing="normal">
    <li>a cross-vendor task life-cycle,</li>
    <li>human-in-the-loop approval gates, and</li>
    <li>delivery of large artifacts (firmware, images, diff reports).</li>
  </ul>

  <t>A2A fills these gaps with three primitives:</t>
  <ul spacing="normal">
    <li><tt>Task</tt> — state machine (pending → working → completed/failed/cancelled) persisting across agent restarts;</li>
    <li><tt>Artifact</tt> — hash-signed object store (≤ 2 GB, resumable upload);</li>
    <li><tt>Message</tt> — multi-round negotiation (JSON or natural language).</li>
  </ul>

  <table anchor="long-flow-tasks">
    <name>A2A Task States vs. NETCONF Operations</name>
    <thead>
      <tr>
        <th align="left">State</th>
        <th align="left">Meaning</th>
        <th align="left">NETCONF Equivalent</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>pending</td>
        <td>Waiting for resources or approval</td>
        <td>None (RPC is fire-and-forget)</td>
      </tr>
      <tr>
        <td>awaiting-human-approval</td>
        <td>Human must click approve/cancel</td>
        <td>None</td>
      </tr>
      <tr>
        <td>working</td>
        <td>Agents executing sub-tasks</td>
        <td>edit-config (local only)</td>
      </tr>
      <tr>
        <td>completed</td>
        <td>All agents report success</td>
        <td>commit</td>
      </tr>
      <tr>
        <td>failed</td>
        <td>Any agent reports failure</td>
        <td>rollback-on-error</td>
      </tr>
      <tr>
        <td>cancelled</td>
        <td>Operator or policy cancelled</td>
        <td>discard-changes</td>
      </tr>
    </tbody>
  </table>

  <t>Real-World Example: Five-City Core MTU Migration</t>
  <figure>
    <artwork><![CDATA[Controllers: Huawei NCE (3 cities) + Cisco NSO (2 cities)
Devices: 312 PE routers
Window: 120 min (02:00-04:00)
Artifact: 2 GB firmware image + 40 MB diff-report]]></artwork>
  </figure>

  <t>Step-wise A2A Flow (time-stamps):</t>
  <ul spacing="normal">
    <li>T+0 min: Orchestrator creates Task T100, goal="Raise MTU to 9000 on core links".</li>
    <li>T+5 min: Each controller Agent posts Artifact pre-check.csv (link health KPI).</li>
    <li>T+10 min: Orchestrator Artifact hash-verified; human approval card sent to WeChat.</li>
    <li>T+15 min: Engineer clicks "approve"; Task state → working.</li>
    <li>T+20-90 min: Controllers download 2 GB image via Artifact URL; local NETCONF edit-config issued; progress Artifacts streamed every 5 min.</li>
    <li>T+95 min: Last Artifact post-upgrade-verification.csv uploaded.</li>
    <li>T+100 min: All agents report success; Task → completed.  Total human intervention: 1 click.</li>
  </ul>


  <t>A2A is mandatory for any multi-vendor, multi-hour workflow that demands task persistence, human gates, and multi-gigabyte artifact delivery—scenarios where NETCONF's single-controller, single-RPC paradigm is insufficient.</t>
</section>

<section title="Multi-Agent Consensus (Agent-to-Agent)" anchor="multi-agent">
  <t>Fault recovery, security mitigation and resource optimisation often require multiple autonomous agents (monitoring, security, controller, human) to reach a common decision.  NETCONF's strict client-server model provides no peer-to-peer capability advertisement, multi-round negotiation or voting primitives.</t>

  <t>A2A introduces three building blocks:</t>
  <ul spacing="normal">
    <li><tt>AgentCard</tt> — JSON-LD advertisement of skills and endpoint;</li>
    <li><tt>Message</tt> — multi-round negotiation (JSON or natural language);</li>
    <li><tt>Consensus Engine</tt> — policy-based scoring, voting, human-in-the-loop.</li>
  </ul>

  <table anchor="agent-card-fields">
    <name>AgentCard Mandatory Fields</name>
    <thead>
      <tr>
        <th align="left">Field</th>
        <th align="left">Description</th>
        <th align="left">Example Value</th>
      </tr>
    </thead>
    <tbody>
      <tr>
        <td>id</td>
        <td>globally unique agent identifier</td>
        <td>monitor-sh-01</td>
      </tr>
      <tr>
        <td>skills</td>
        <td>array of skill objects (name, description)</td>
        <td>{name: "threat_analyze", desc: "Return 0-10 threat score"}</td>
      </tr>
      <tr>
        <td>endpoint</td>
        <td>HTTPS URL for A2A messages</td>
        <td>https://mon-sh.example:9443/a2a</td>
      </tr>
      <tr>
        <td>authentication</td>
        <td>mTLS + OIDC</td>
        <td>{"type": "mTLS", "sha256": "8f66..."}</td>
      </tr>
    </tbody>
  </table>

  <t>Consensus Flow Example: DDoS Port Shutdown</t>
  <figure>
    <artwork><![CDATA[Agents: Monitor, Security, Controller, Human
Decision: shutdown port 10/1 ?
Scoring: threat_level×0.6 + impact×0.4
Threshold: ≥ 5.0 → shutdown]]></artwork>
  </figure>

  <t>Message Sequence (time-stamps):</t>
  <ul spacing="normal">
    <li>T+0 s: Monitor Agent posts threat_score=9.0 via Message.</li>
    <li>T+5 s: Security Agent confirms attack signature; score unchanged.</li>
    <li>T+10 s: Controller Agent posts impact=300 VPN down; computed score = 9×0.6 + 3×0.4 = 6.6 (> 5.0).</li>
    <li>T+12 s: Task state → awaiting-human-approval; WeChat card sent.</li>
    <li>T+135 s: Human clicks "approve".</li>
    <li>T+140 s: Controller Agent calls NETCONF shutdown; Artifact post-action.log uploaded.</li>
    <li>T+180 s: All agents report success; Task → completed.</li>
  </ul>



  <t>Therefore A2A is mandatory whenever multiple autonomous agents must discover, negotiate, vote and reach a binding decision — scenarios that NETCONF's unidirectional client-server paradigm cannot emulate.</t>
</section>

    </section>

<section title="Coexistence Model" anchor="coexist">
  <t>This section describes how MCP and A2A can be deployed <em>without</em> forcing a redesign of the existing NETCONF ecosystem.  The architecture keeps NETCONF as the configuration authority and allows <em>either</em> controller-hosted <em>or</em> device-hosted MCP servers — the latter avoids a central gateway bottleneck while preserving operator investment in controllers.</t>

  <section title="Design Choices at a Glance" anchor="choices">
    <t>TBD</t>
  </section>

  <section title="Common Layering" anchor="layer">
    <t>TBD</t>
  </section>

  <section title="Controller-Gateway Model" anchor="ctrl-gw">
    <t>TBD</t>
  </section>

  <section title="Device-Embedded Model" anchor="dev-embed">
    <t>TBD</t>

  </section>


  <section title="Migration Roadmap" anchor="roadmap">
    <t>TBD</t>
  </section>

  <!-- <t>In summary, the coexistence model offers two plug-and-play options: keep the gateway inside the controller for brown-field sites, or embed MCP in the device for green-field deployments.  Both preserve NETCONF as the configuration authority while adding AI semantics, long-flow orchestration and large-artifact delivery without disrupting existing OSS, AAA or firmware pipelines.</t> -->
</section>

    <section title="Security Considerations">
      <t>MCP and A2A introduce OAuth2/JWT and long-lived Tasks.</t>
    </section>

  </middle>

  <back>
    <references title="Normative References">
  <reference anchor="RFC6241">
    <front>
      <title>Network Configuration Protocol (NETCONF)</title>
      <author initials="R." surname="Enns" fullname="Rob Enns">
        <organization/>
      </author>
      <author initials="M." surname="Bjorklund" fullname="Martin Bjorklund">
        <organization/>
      </author>
      <author initials="J." surname="Schoenwaelder" fullname="Juergen Schoenwaelder">
        <organization/>
      </author>
      <author initials="A." surname="Bierman" fullname="Andy Bierman">
        <organization/>
      </author>
      <date year="2011" month="June"/>
    </front>
    <seriesInfo name="RFC" value="6241"/>
    <seriesInfo name="DOI" value="10.17487/RFC6241"/>
  </reference>

  <reference anchor="RFC9200">
    <front>
      <title>Guidelines for Using the Transport Layer Security (TLS) Protocol in Devices</title>
      <author initials="H." surname="Tschofenig" fullname="Hannes Tschofenig">
        <organization/>
      </author>
      <date year="2022" month="March"/>
    </front>
    <seriesInfo name="RFC" value="9200"/>
    <seriesInfo name="DOI" value="10.17487/RFC9200"/>
  </reference>
</references>

    <references title="Informative References">
      <reference anchor="I-D.yang-nmrg-mcp-nm">
        <front>
          <title>Applicability of MCP for the Network Management</title>
          <author initials="Y." surname="Yang" fullname="Yuanyuan Yang"><organization>Huawei</organization></author>
          <date year="2025" month="July"/>
        </front>
        <seriesInfo name="Internet-Draft" value="draft-yang-nmrg-mcp-nm-00"/>
      </reference>
      <reference anchor="I-D.google-agent2agent">
        <front>
          <title>Agent-to-Agent (A2A) Protocol</title>
          <author initials="Google" surname=""><organization>Google</organization></author>
          <date year="2025" month="May"/>
        </front>
        <seriesInfo name="Internet-Draft" value="draft-google-agent2agent-00"/>
      </reference>
    </references>
  </back>
</rfc>