Gap Analysis of Network Configuration Protocols in LLM-Driven Intent-Based Networking

The design goal of MCP is to give a single Large Language Model (LLM) a "plug-and-play" tool-calling capability. When deployed directly between a network controller and the devices, however, the following structural gaps are exposed.

MCP's tools/call is a stateless, one-shot JSON-RPC invocation. Network changes normally require the multi-stage semantics "candidate → validate → commit → rollback." MCP has neither a candidate datastore nor two-phase-commit primitives. Consequently, cross-device bulk deployments cannot guarantee "all-or-nothing" atomicity. When partial failures occur, the controller must supply its own compensation logic, lengthening the LLM's reasoning chain and increasing uncertainty.

Today MCP tool descriptors are written by hand. Network-device capabilities are authoritatively defined by YANG models; whenever a model is updated, the tool list must be manually re-synchronized. Without an automated pipeline "YANG → JSON-Schema → tool descriptor," maintaining the tool catalogue in a large multi-vendor environment becomes a bottleneck.

Network-ops scenarios often involve high-frequency telemetry (5–60 s sampling, 10 k metrics per node). MCP specifies only JSON-RPC over HTTP/1.1, resulting in highly redundant messages and no streaming push primitive. When an LLM needs real-time anomaly detection, frequent polling consumes excessive bandwidth and CPU, violating data-center goals of low latency and high throughput.

MCP's invocation context is confined to a single connection; it cannot natively carry network-level intent such as "change the same VLAN across three leaf switches while keeping the STP root bridge unchanged." The LLM must repeat the constraints in the prompt, wasting tokens and raising the error rate.

Network operations require audit logs that "trace down to the leaf node." MCP's tool return body contains only a JSON result; there are no standardized fields for rollback-point, commit-id, or syslog-severity. Root-cause analysis and compliance audits therefore require additional integration with device syslog or NETCONF logs, increasing cost.

Devices commonly enforce certificate-based mutual-TLS plus NACM path-level permissions. MCP currently defines only a Bearer-token header and offers no mapping between a tool call and the read/write/exec permissions on a YANG node. If a tool-descriptor file leaks, an LLM could combine calls to bypass existing ACLs, creating a privilege-escalation risk.

Network changes often last several minutes (waiting for BGP convergence or MAC migration). Once an MCP call completes, its context is discarded immediately, so there is no way to stream intermediate updates such as “convergence 90 %.” The LLM has no choice but to poll repeatedly, increasing load on both itself and the device while still failing to achieve a true state-machine-driven closed loop.

For MCP to serve as "the universal glue between LLMs and devices" in production networks, an upper layer must supply a transactional state machine, a YANG self-description channel, streaming encodings, and fine-grained audit semantics. Without these additions, MCP will remain confined to labs or single-device scripting and will be unable to close the loop on production-grade intent.

Positioned as a "multi-agent collaboration layer," the Agent-to-Agent (A2A) protocol was created so that any two LLM-Agents can discover each other, negotiate, and jointly finish long-running tasks. When it is dropped straight into "network-controller ↔ network-device" or "controller ↔ controller" settings, however, the following deep gaps surface:

A2A Tasks target macro-level business goals (e.g., "relocate a DC"). The smallest deliverable is an Artifact. Network changes, by contrast, must touch a single YANG leaf (e.g., "set interface X MTU = 9216"). The spec offers no "micro-task" primitive, so one Task either carries thousands of lines and becomes bloated, or is split into hundreds of Tasks that explode the state machine and raise LLM-orchestration complexity.

A2A's state machine is limited to pending → working → completed/failed. On failure the controller only gets a free-text Task.statusMessage. Network ops demand cross-device atomic commit plus a rollback tag. The protocol today defines no:

two-phase-commit token (transaction-id),
distributed lock or conflict detection,
unified rollback API (rollback-on-failure).

Controllers must therefore implement compensation themselves, forcing the LLM to reason about "how to write a rollback script," which violates intent-based principles.

A2A mandates JSON for Artifact payloads and runs over HTTP/1.1. For high-frequency telemetry (5 s interval, 10 k metrics/node) or bulk config pushes, JSON's textual redundancy causes:

controller-device link congestion,
wasted LLM-context tokens,
repetitive header parsing and higher CPU load.

The protocol lacks a binary or streaming encoding option and offers no back-pressure mechanism.

A2A Task context is scoped to a single "conversation"; there is no standard field to express topology-level constraints such as "change the same VLAN on three leaf switches while keeping the STP root bridge unchanged." The LLM must repeat inter-device relations in the prompt, burning tokens and risking truncation that produces configurations which are syntactically valid but topologically wrong.

Devices generally enforce certificate-based mutual TLS plus NACM path-level access control. A2A currently specifies only an OAuth2 delegation token and provides no mapping from "Task-level role" to YANG node read/write/exec permissions, nor per-Artifact fine-grained ACLs. Once an Artifact is cached or forwarded it may bypass the certificate chain, leading to privilege escalation or configuration pollution.

Skills are advertised in the Agent Card, but the Card is free text. There are no standard fields saying "I support OpenConfig BGP 4.0 YANG" or "I manage AS 65001-65500." LLMs must rely on fuzzy matching, often selecting the wrong partner and raising Task failure rates.

Network changes can last minutes (waiting for BGP convergence, MAC moves). After a Task enters "working," A2A only mandates a final Artifact; there is no standard way to push interim states such as "convergence 70 %" or "MTU changed, waiting for LLDP neighbor re-discovery." The LLM must poll or wait until timeout, increasing load and preventing a true state-machine-driven closed loop.

To act as a "multi-agent collaboration bus" in network environments, A2A must be systematically extended in task granularity, transaction semantics, binary encoding, topology context, security mapping, life-cycle management, and intermediate-state push. Otherwise it will remain suited only for macro business flows and will be unable to close the fine-grained, reliable, and roll-backable network-intent loop required in production.

NETCONF provides transactional, XML-encoded RPCs over SSH. It lacks:

Semantic discovery: YANG models are not self-describing for LLMs; no runtime tool list.
Session context: no standard place to store intent-id, LLM prompt, or multi-device correlation.
Streaming telemetry: <notification> is push-style but insufficient for high-frequency KPI.
Function-level audit: <commit> is atomic, but per-leaf authorization is out-of-scope.

TBD

RESTCONF maps YANG to HTTP URIs. Gaps include:

No candidate datastore—every PUT/PATCH is immediate.
No server-side discovery document for LLMs.
Stateless: no place to store multi-request intent.
Encoding flexibility may confuse LLM prompt consistency.

TBD

gNMI delivers high-speed telemetry but:

No semantic metadata for LLMs.
Set() is non-transactional across multiple paths.
No multi-agent signalling—gNMI is 1:1.
No standardized error ontology.

TBD