Large Model based Agents for Network Operation and Maintenance

Large models refer to AI systems based on deep learning techniques, containing massive parameters (typically billions to trillions). It is trained on large-scale datasets, and is capable of capturing complex patterns and associations, demonstrating outstanding abilities in natural language processing, image generation, decision-making, and reasoning. Recent breakthroughs in models like GPT-4 and DeepSeek have continuously pushed technical boundaries and enhancing the performance of models.Users can use the capabilities of large models by accessing or deploying inference models, and combining with Fine tuning, Prompt Learning, etc. The big model has been empowered in multiple vertical domains, like: Research: AlphaFold for protein structure prediction, Galactica for scientific paper assistance. Industry: Generative design (e.g., automotive/chip architecture optimization), automated code development (GitHub Copilot). Finance: Risk prediction, automated report generation. In the future, large models will also move towards embodied AI , embedding model capabilities into physical terminals such as robots and autonomous driving, continuously building an open-source developer ecosystem, opening up some model capability interfaces, and promoting industry collaborative innovation.

Intelligent agent, as an important concept in the field of artificial intelligence, refers to a system that can autonomously perceive the environment, make decisions, and execute actions. It has basic characteristics such as autonomy, interactivity, reactivity, and adaptability, and can independently complete tasks in complex and changing environments. Intelligent agents have the ability to learn and make decisions. Through learning algorithms and data analysis, they can extract useful information from massive amounts of data and form their own knowledge base. In the decision-making process, intelligent agents can comprehensively consider various factors and use methods such as logical reasoning and probability statistics to make the optimal decision. This ability gives intelligent agents a significant advantage in solving complex problems. There are four design patterns for intelligent agent workflow: Reflection: Let the agent review and revise the output generated by themselves; Tool Use: LLM generates code, calls APIs, and performs practical operations; Planning: Let the agent decompose complex tasks and execute them according to the plan; Multi-agent Collaboration: Multiple agents play different roles and collaborate to complete tasks. At present, intelligent agents have been used in the following scenarios: Personal assistant: Cross platform task agent: Automatically organize emails, schedule meetings, and manage schedules (such as Microsoft Copilot). Life Butler: Adjust smart homes according to user habits and recommend personalized health plans. Industry Intelligence: Financial advisory: Real time analysis of market data, generation of investment portfolio recommendations, and automatic execution of trades. Medical diagnosis: Provide dynamic treatment recommendations based on the patient's medical history and real-time monitoring data. Industrial operation and maintenance: Predicting equipment failures and scheduling maintenance resources to optimize production line efficiency. Virtual world interaction: Game NPC: Intelligent characters with emotions and memories (such as AI driven open world NPCs). Metaverse Guide: Help users explore virtual spaces and provide personalized content recommendations. Scientific research: Laboratory assistant: Automatically design experiments, analyze data, and propose hypotheses (such as chemical synthesis agents). Climate simulation: Coordinating multidimensional data models to predict extreme weather and generate response plans.

Machine learning models with large-scale parameters and computing power are typically constructed from deep neural networks, containing billions or even hundreds of billions of parameters, capable of understanding text, images, speech, and other content, and performing tasks such as text generation, image generation, inference question answering, and scientific prediction. An AI agent is an intelligent entity with autonomous perception, decision-making, and execution capabilities, driven by goals in dynamic environments.

The current network undergoes a large number of service migration or device switchover every day/month, which have a high degree of similarity in steps and processes, involving querying and filling a large amount of data and configuration. There are two typical types of migrations: service provisioning (for external service data configuration) and migration change (for internal tasks such as route publishing and network optimization). Large models naturally have the ability to process and recognize massive amounts of data, and intelligent agents can guide the process of each step like experienced experts. Automation via large models and agents can reduce errors and free human resources. Key tasks include: Migration Plan Generation: Designing workflows and deployment strategies. Plan Auditing: Checking configurations, compliance, and correcting errors (e.g., typos, hallucinations). Automated Execution: Replacing manual configurations with AI-generated scripts, call corresponding systems to finish tasks. Taking the service provisioning scenario as an example, typically, when doing migration, it was necessary to manually log in the device configuration parameters. Now, through the interaction of the large model, the large model generates a script to distribute the device, also configure and audit it. The agent can call other systems, such as digit twin platform for script testing, view the impact of the changed parameters, and return to the assigned system to reduce manual errors. Finally, based on the analysis of the results, it can achieve automatic distribution when there shows no problem.

(Content to be expanded)

Intelligent agents based on large models can automate network operations by coordinating system scheduling and leveraging diverse capabilities of large models. This process involves multiple interactions with systems such as large models and network management systems. Each agent has specialized functions, such as agents for intent understanding or agents dedicated to fault localization and demarcation in specific network scenarios. Current operational systems already provide basic data support, foundational atomic capabilities, and well-defined orchestration workflows for task execution. However, most processes are manually connected, involve repetitive mechanical work, and lack an intelligent coordination "brain". See Figure 1.

Agents Network +------------------------------------------------------+ +---------------------------+ | | | | | +------------+ | |Network Systems & Platforms| | | Perception | | | | | +------+-----+ +-> AI Models | | | | | | | +--------v--------+ | | Atomic Capabilities | | +----------+ | Brain | +----------+ | | | | | Planning <+-+-+ +-+-+> Action | | | Tools | | +----------+ | LLM | LVM | LSM | +----------+ <-+ | | +------+--^-------+ | | Data | | | | | | | | +----v--+----+ | | | | | Memory | | | | | +------------+ | | | +------------------------------------------------------+ +---------------------------+ Functions of Agents: Intent Recognition: Understand and interpret user input intentions. Determine whether subsequent tasks require identifying suitable agents or multi-turn dialogues to complete intent recognition and parsing. Intent Classification and Analysis: Decompose tasks based on recognized user intent.Categorize tasks according to different functional requirements. Perception: Proactively receive alarms, threshold-exceeding notifications, or environmental change information, issuing warnings when necessary.Accept task requests from other systems, potentially involving multimodal data processing. Memory: Long-term memory: Stores user habits, domain-specific processing experiences (e.g., failure/success cases, encountered faults) in knowledge bases. Short-term memory: Caches temporary processing data (e.g., context). Agents perform reflection and error correction by interacting with long-term memory and contextual information. Planning: Analyze and decompose intent based on task objectives and learned knowledge. Orchestrate subtasks (e.g., breaking complex problems into simpler ones). Identify required system components (other agents, large models, APIs, etc.). Decision-Making: Finalize execution plans and match workflows to current tasks. Generate instantiated, executable solutions by aligning system components, data, and model strategies. Execution: Convert orchestrated results into network-understandable commands. Execute tasks by mobilizing resources and dynamically adjusting based on feedback. Multi-Agent Collaboration: Team Collaboration: Enable coordinated teamwork among multiple agents. Competitive Collaboration: Manage competitive relationships to avoid efficiency loss.

The data that an agent can learn or perceive includes expert knowledge in operation and maintenance processes, logs, configuration rules, policy knowledge, case manuals, alarms, network topologies, fault reports, and more.

Atomic capability refers to a series of orchestrated workflows designed to accomplish a subtask. It encapsulates various APIs, exposes a unified interface and capabilities externally, and serves as the minimal functional unit for achieving specific subtasks. Atomic capabilities can be defined with standardized inputs and outputs to facilitate cross-system communication and calls.