Open access peer-reviewed chapter - ONLINE FIRST

Cutting-Edge Technologies in Agent Architecture for Use at the Microgrid Protection Level

Written By

Andres Mauricio Diaz Caicedo

Submitted: 09 February 2025 Reviewed: 06 March 2025 Published: 17 April 2025

DOI: 10.5772/intechopen.1009974

Multi-agent Systems - From Basic Concepts to Cutting-Edge Technologies IntechOpen
Multi-agent Systems - From Basic Concepts to Cutting-Edge Technol... Edited by Nohaidda Sariff

From the Edited Volume

Multi-agent Systems - From Basic Concepts to Cutting-Edge Technologies [Working Title]

Dr. Nohaidda Sariff, Dr. Zool H Ismail, Dr. Denesh Sooriamoorthy and Dr. Puteri Nor Aznie Fahsyar

Chapter metrics overview

9 Chapter Downloads

View Full Metrics

Abstract

This chapter discusses cutting-edge technologies in agent architecture for microgrid protection, focusing on agent-based hierarchical control. It introduces adaptive multi-agent systems (MAS) that enhance protection schemes by assigning agents to intelligent electronic devices (IEDs) for tailored solutions in dynamic environments. The proposed methodologies aim to improve operational efficiency and fault tolerance through structured communication, validation mechanisms, and real-time control, ultimately redefining energy management in microgrids.

Keywords

  • microgrid
  • multi-agent system
  • wavelet indicators
  • edge computing
  • hierarchical protection

1. Introduction

In the evolving landscape of microgrid management, the need for robust and adaptive protection schemes has become increasingly critical. Traditional methodologies often fall short in addressing the complexities of dynamic topologies and power fluctuations, leading to vulnerabilities in system resilience. This section introduces agent-based hierarchical control as a transformative approach, leveraging adaptive multi-agent systems (MAS) to enhance protection strategies by assigning dedicated agents to intelligent electronic devices (IEDs), thereby facilitating efficient task execution and localized decision-making in diverse application zones.

Advertisement

2. Agent-based hierarchical control: From white-box dynamics to management strategies

In the evolving landscape of microgrid management, the need for robust and adaptive protection schemes has become increasingly critical. Traditional methodologies often fall short in addressing the complexities of dynamic topologies and power fluctuations, leading to vulnerabilities in system resilience [1]. This section introduces agent-based hierarchical control as a transformative approach, leveraging adaptive multi-agent systems (MAS) to enhance protection strategies by assigning dedicated agents to intelligent electronic devices (IEDs), thereby facilitating efficient task execution and localized decision-making in diverse application zones.

The purpose of structured protection schemes is to formulate a multistaged solution tailored for diverse application zones. Typically found in networks with dynamic topologies and significant power fluctuations, the methodologies existing in the literature often lack adaptive protection features. The proposed method introduces adaptive multi-agent systems (MAS), assigning each agent to an IED (zones). These agents function as executory units, conducting specific tasks by means of logical selective mapping in local environments. The suggested method unveils two distinctive environments: serialized branches (localities) [2] and parallelized acquisition, arranging different DER according to their internal dynamics and their connection to the bus bar for state extraction [3].

The system defines two types of agents: the parametric-selective agent and the validation agent. The main goal is to create overlapping zones based on short-circuit grouping indices, enabling the parametric agent to gather feedback from a local dynamic detector. This detector separates logical outputs based on the fault type and conveys these outputs to network equipment, including DER, relays, and loads, to classify protective functions across multiple levels. The secondary agent verifies fault clearance, adjusts settings, and maintains protection coordination. This process is executed through a two-tier hierarchical process. The first tier establishes an updatable quick-reference table for operational scenarios to load adjustments chosen within the power system transient analysis program [4, 5].

Figure 1 illustrates a hierarchical approach to the integration of communication interfaces in power systems, highlighting three key levels: local, parallel, and higher. At the local level, emulations/simulations of distributed energy systems (DERs) are performed using environments such as MATLAB and IED test devices. The parallel level handles concurrent tasks related to substation signal data extraction and integration into frameworks such as ETAP/Power Factory [7]. Finally, the top level optimizes operational management using Python, applying methods such as Numba for the validation and analysis of protection signals. This framework ensures the transitivity of the dynamics and facilitates efficient command distribution and real-time data mining.

Figure 1.

Index of structured programming (start-up simulation multi-stage) [6].

Incorporating edge computing into these schemes introduces transformative capabilities, allowing real-time decision-making to occur directly at the periphery of the network. By enabling intelligent electronic devices (IEDs) and distributed controllers to process data locally, edge computing minimizes latency and improves the responsiveness of the system under fault conditions. Architecturally, the use of edge computing leverages highly distributed processing nodes equipped with advanced hardware accelerators, such as ARM-based SoCs, GPUs, and FPGAs, to handle computationally intensive tasks, including fault classification and adaptive relay coordination. These devices employ microservice-oriented software designs, where containerized modules encapsulate protection algorithms. Technologies such as Kubernetes and lightweight orchestrators like K3s are employed to manage these services dynamically across the edge nodes, ensuring high availability and scalability even under dynamic system conditions.

The proposed scheme employs distributed intelligence, where each agent embedded in the edge infrastructure processes local signals using pre-trained neural networks optimized for on-device inference. These models are compressed via techniques like quantization and pruning to minimize computational overhead, enabling their deployment on devices with constrained resources. Communication between agents is facilitated using high-efficiency protocols like MQTT and CoAP, which allow low-latency data exchange across the protection zones. The distributed nature of this architecture ensures resilience against single-point failures, as localized decision-making remains functional even if remote communication links are disrupted.

Furthermore, edge computing strengthens the integration of parallel processing capabilities into structured protection schemes. By segmenting data acquisition and decision-making tasks across multiple edge devices, the system achieves substantial improvements in fault clearance times. For instance, edge clusters responsible for coordinating distributed energy resources (DERs) employ frameworks such as Dask to parallelize data processing workloads. These clusters aggregate measurements from serially connected DERs, mapping fault impacts dynamically across interconnected zones. Such architectures also utilize decentralized ledger technologies to securely record protection actions, ensuring traceability and compliance with operational requirements.

A critical aspect of this approach is the synchronization of edge processing nodes with centralized SCADA systems. Edge devices periodically validate their operational states by querying simulation datasets hosted in SCADA. This synchronization is achieved through the establishment of shared memory structures and reference models that adapt to changing system conditions. Fault scenarios are simulated in near real-time at the edge, allowing for rapid adjustments to relay settings and protection logic. These simulations utilize transient analysis libraries encoded in high-performance languages, with JIT compilers like Numba optimizing the execution of computational kernels for dynamic system modeling.

Despite these advancements, structured protection schemes must align with rigorous regulatory frameworks to ensure their safety, interoperability, and cybersecurity compliance [8]. Standards such as IEC 61850 play a pivotal role in defining the communication protocols and data models required for seamless integration of edge-based agents with traditional protection infrastructure. Additionally, adherence to IEEE 1547 ensures that distributed energy resources operate within permissible voltage and frequency limits during fault conditions. These standards provide the baseline for designing interoperable systems that can adapt to both centralized and decentralized operational paradigms.

2.1 Hierarchical communication and data storage

This protection scheme uses distributed communication for adjacent agents and remote links for distant ones. Local devices update via memory, confirming faults through mapping tests and relaying event data to the central device for state transitions. All events are logged in the control center for offline analysis [9].

The fault detection framework using Kolmogorov–Smirnov chains ensures completeness and safety for nine-cycle records at up to 50 kHz sampling, focusing on impedance transitions in bus couplers for rapid fault classification. Time-sensitive networking (TSN) enables deterministic communication with ultralow latency and bounded jitter, ensuring timely transmission of critical fault data and precise synchronization across local agents, essential for high-speed decision-making in dynamic networks.

The protection system is designed to address the increasing complexity arising from the expansion of sub-areas at a national level, requiring advanced tools for distributed data management. Fault logs and operational metrics are distributed across multiple nodes to ensure high availability and reliability, even during hardware failures. The system employs block-level redundancy to efficiently process high-frequency sampled measured values (SMV), supporting both offline analytics and real-time historical queries. By integrating parallel computing frameworks, it enables the analysis of large protection event datasets while maintaining scalability to meet the demands of a growing network of sub-areas.

The proposed relay agents also encompass prioritization recognition functions, operating based on weighted event ordinal functions. In this scheme, overcurrent protections take precedence, followed by voltage, frequency, and, finally, internal islanding prevention protections. These agents relay information in the form of voltage and current block data across various domains, contrasting with the local coordination map. The hierarchical design allows each agent to act as an independent decision-making node while remaining synchronized with the system’s overarching coordination protocols. This multilevel synchronization ensures that local protections remain effective even during partial network outages, achieving a balance between resilience and flexibility.

The communication protocol adheres to IEC-61850, with enhancements to support dynamic data streaming and advanced configuration of relays through a centralized controller. For faster and more reliable coordination, sampled measured values (SMV) are transmitted directly to relay agents, which utilize real-time transient data to adjust their protection settings dynamically. This configuration significantly reduces the reaction time to transient faults and enhances the system’s capacity to adapt to changing conditions.

Local agents handle the observability and classification of dynamics for event selectivity, subsequently forming a logical chain of triggers. These triggers are passed to a decision layer, where the digital twin platform simulates various scenarios to validate and optimize the chosen protection strategies. The integration of a digital twin with transient simulation capabilities not only improves operational reliability but also minimizes the risk of unnecessary tripping in complex grid scenarios. The hierarchical nature of the system ensures that decisions validated at the digital twin level are implemented seamlessly, with adjustments communicated back to local agents and controllers via TSN-enabled channels.

Resilience is further enhanced through the implementation of distributed edge computing capabilities. Each agent is equipped with sufficient computational power to execute critical protection algorithms locally, ensuring that decisions are made even in the absence of central controller connectivity. The memory system embedded in local devices is designed with modularity in mind, allowing for easy updates and capacity expansions as the network evolves. These edge nodes also incorporate AI-based anomaly detection algorithms to proactively identify potential system instabilities and initiate pre-emptive protective actions.

Flexibility is achieved by enabling agents to dynamically redefine their operational zones based on real-time network conditions. For instance, an agent detecting a significant voltage sag in its area of influence can autonomously expand its protection zone to cover adjacent areas, coordinating with nearby agents to ensure comprehensive fault coverage. This capability is critical for modern grids with high penetration of distributed energy resources (DER), where traditional fixed-zone protections are insufficient.

As shown in Figure 2, the temporary process outlines the transfer of master control across various management stages in response to measurement failures, non-optimal conditions, and external disturbances such as low power quality. This operational framework emphasizes the dynamic adaptability of the system, where events like abrupt changes in distributed energy resources (DERs) or undimensioned failures trigger protective responses.

Figure 2.

Temporary process in the presence of changes in guidance agents due to measurement failures, non-optimality, and external insertion of low power quality [6].

A key component of this process involves the validation framework between the server and simulation layers. This step ensures that the optimization function is recalibrated to accommodate the specific requirements of local stages. Moreover, as the system transitions from an impaired state to a stabilized one, mechanisms such as transitive impedance changes and master coordination are implemented to maintain system resilience. Such coordinated actions underscore the importance of hierarchical control and the role of agents in sustaining operational continuity across interconnected grid zones.

2.2 Hardware and software protection-management agents

Protection agents manifest in two forms: hardware and software. The local aspect employs a distributed architecture anchored in single board computers (SBCs), where dynamic detection algorithms are hosted. These algorithms facilitate the activation of a master switch mechanism during communication or acquisition agent failures. Such agents are designed to operate autonomously, leveraging network data and their direct coupling with the microgrid to make informed decisions. Communication protocols enable robust data exchange through feedback bimodal systems, which utilize probabilistic operational functions to validate state transitions between hierarchical levels. Effective communication among agents remains a cornerstone, enabling precise evaluation of the network’s status and adaptive decision-making under rapidly changing conditions.

In microgrids dominated by inverter-based configurations, the intrinsic observational framework tasked with state assessment faces challenges in processing complex waveforms. Traditional sensitivity sweeps often fail to extract actionable insights, particularly when addressing high-frequency disturbances or irregularities in power quality. To overcome these limitations, cost-effective embedded cards are deployed on a per-board basis, acting as dynamic sensing nodes (Figure 3). These embedded devices are instrumental in enhancing the granularity of local measurements, enabling precise fault detection and state estimation. At the bay level, retaining and integrating adaptive schemes is crucial for system validation, especially in topologies where diverse command structures and fluctuating operational scenarios demand resilient and redundant architectures. This redundancy is achieved through hybrid communication standards that combine staggered feedback mechanisms with comparative assessments across distinct strategies, thus bolstering system reliability and responsiveness during contingencies.

Figure 3.

Control responses in inverters due to controlled injection during faults.

The resolution of coordination phases necessitates rigorous validation processes. This begins with analyzing the physical controller’s primary layer via a real-time complementary detection stage, ensuring rapid response times in transient simulations. Persistent branch disconnections, for instance, are categorized as operational deviations rather than systemic failures. This distinction ensures that the system maintains operability even amidst communication link disruptions, effectively preventing cascading failures. All events, including transient anomalies and operational adjustments, are meticulously logged for post-event analysis, providing operators with actionable insights to refine system performance further.

Flexibility in managing distributed energy resource (DER) states is another critical attribute of the proposed system. By dynamically moderating converter impacts on overcurrents and high-frequency voltage fluctuations, the system ensures seamless coordination among protection devices. This adaptability mitigates risks associated with fluctuating power quality while maintaining the integrity of interconnected network components. The operational dynamics are complemented by a layered approach to decision-making, where local agents execute first-level classifications, while higher-order algorithms validate and refine these decisions based on comprehensive system-wide metrics [10].

A pivotal aspect of the proposed framework is the validation loop established between the physical server and simulation environments. This loop not only ensures real-time readjustment of optimization functions tailored to local stages but also facilitates impedance recalibrations during system recovery from a downed state. For example, changes in transitive impedance are executed to accommodate dynamic power flow adjustments and maintain operational stability. Concurrently, ID containers play an essential role in recalibrating command and control parameters, ensuring continuity and robustness under varying operational conditions.

The integration of redundancy within the system architecture is underscored by the dual-layered protection strategy, wherein hardware agents operate as the primary safeguards, and virtual layers provide backup mechanisms. This dual approach significantly enhances system resilience against unforeseen events, such as hardware failures or cyber intrusions. Furthermore, the optimization of resource allocation and the continuous adaptation of control algorithms directly contribute to mitigating economic losses by minimizing downtime and ensuring consistent energy delivery.

In conclusion, the orchestration of hardware and software protection-management agents within this multi-layered framework exemplifies the forefront of modern grid protection technologies. By leveraging real-time data processing, hierarchical communication, and adaptive optimization techniques, the system achieves unparalleled levels of reliability, flexibility, and economic efficiency in microgrid and DER management.

2.3 System identification for insertion (DERs by branch and coupling by converter)

Certainly, this process holds a pivotal role in the hierarchical protective structure, as it is responsible for establishing a sequence for each agent incorporated into the system in a modular manner [3]. This step is critical in laying down the foundational framework for analysis, ensuring adjustments validation and protection coordination. Notably, the topological data should be based on a pre-event structure. Synchronizing it would not only be resource-intensive but also lead to characterization problems due to dynamic overlap. Therefore, a global mapping of the microgrid must be organized through a concatenated digital mapping of three main structures. The first delineates the insertion ID, specifying the order in which the element enters the microgrid bus. The second structure focuses on the type of white box block used for the internal dynamics of serialized branches of DER. Lastly, the last structure details the black box of the parallelism groupings, highlighting the type of control for coupling to the bus. These dynamics are loaded onto a shared SIL coupling frame, entered as INT_DYN and IMP_Models, extracted from the program’s SQL database through the Python connection interface [9, 11].

Given the ongoing changes in the system’s topology, the operational mapping remains variable and requires consistent reordering. To accommodate this, a matrix of sub-indexes and states (Figure 4) is created, automatically adjusting with cycle variations. It’s crucial not to concatenate registration information in the agent; thus, the system merely stores the size of an event frame. A hash table is established, assigning a management value to the grouping function based on the type of failure, its location, and the respective response in each case. This method facilitates organized searches by characteristics within the simulation program. To transmit information to a higher tier, a compartmentalized chain of the phenomenon is dispatched, comprising a 3-level identification ID (Branch, serial stage, and type of control). This information is loaded into the physical environment, forming the basis for real-time verification of power system variations.

Figure 4.

Operation of proposed protection scheme for power restoration (Resilient Int/Ext Failure) (ID) [6].

Although this architecture is primarily tailored for decentralized networks, it also holds potential for systemic-level formulation. Globally, distributed system operator (DSO) models with energy distribution grid edge (EDGE) boundaries are under development, aiming to integrate decentralized networks into a coordinated infrastructure capable of dynamically managing distributed resources with high flexibility [12]. However, in regions like Colombia, regulatory and technical constraints have hindered significant progress in DSO implementation. A critical challenge is the lack of modern technologies in older inverters, which prevents effective controllability during events. This limitation is especially significant in mitigating high-frequency oscillations or transient overcurrents that could compromise network stability. As a temporary solution, some functionalities in initial preset configurations are disabled to prevent high-frequency oscillations. While this approach mitigates short-term risks, it restricts the dynamic response capability of the network.

Upgrading existing plants and retrofitting legacy inverters with advanced controls are essential to meet modern technical requirements. Countries like Australia and Germany have led this transition by enforcing stricter inverter standards and implementing retrofit programs, providing valuable references for evolving frameworks in regions like Colombia.

After ensuring stability with transient safety measures, the system is partitioned by load significance and power quality needs. Local agents connect only within the same tier, requiring distinct classifications. Each DER is managed by a dedicated master agent that handles requests and regulates power for DERs with the same matrix code [13]. This reduces communication links and costs while enabling autonomous zone management through decentralized master agents.

Agents operate in a matrix array, communicating via rows (agent type), columns (location and insertion order), and depth (controller-driven variability). Column-based interactions require pre-validation to ensure compliance with coordination IDs and power quality under national standards. Optimization focuses on low-power operations, minimizing control wear through rapid impedance transitions in bus bar coupling. Cross-zone communication occurs via the master agent of the requesting zone.

In the feedback stage, mutual ID vectors couple with the ordering matrix to track transitions per electrical cycle [14]. This creates a hash-based assurance search engine for operator traceability. Neighboring master agents share ID tables for DER coupling, ensuring updates. New zones announce their connection and share ID tables with neighboring master agents, maintaining a cohesive system.

2.4 Fault-tolerant network adaptation

The algorithm alternative route selection in distributed networks is designed to dynamically adapt the topology of a distributed network in the event of node failures. It identifies the direct neighbors of a failed node, removes the failed node from the graph, and evaluates potential replacement nodes based on their load and sensitivity. Sensitivity is calculated as an inverse function of node degree, ensuring that less congested and structurally optimal nodes are prioritized for re-routing. Parallelized evaluations by local agents reduce computation time and enable real-time responsiveness. The selected node is then integrated into the network by creating a new link with the neighbors of the failed node, ensuring minimal disruption to the system’s overall operation. This approach is particularly relevant for power grids and communication networks, where fault tolerance and dynamic adaptability are critical.

The impact of topology based fault location detection-CV multistage and faulted section isolation on fault-tolerant network adaptation is enhanced through their implementation in agents within the power system. The first algorithm, topology based fault location detection-CV multistage, functions as an accumulator at the bus level, collecting and processing data from various field devices, protection relays, and system stability monitors. These agents are responsible for detecting faults by analyzing the network’s topology and its transition states. When a fault is detected, the agent identifies the location of the faulted section by applying topological methods and homology persistence, considering the statuses of protection devices, switches, and other critical components. The agent then adapts the network by following the demand of the next microgrid in the serial cluster, ensuring that fault detection and isolation are carried out efficiently, maintaining system stability.

Algorithm 1. Alternative route selection in distributed networks.

Require: Graph G=VE, Failed node Sf

Ensure: Updated graph G

 1: Identify the direct neighbors Nf of Sf in the graph

 2: Remove Sf from the graph: VV\Sf, EE\SfvvNf

 3: Find candidate nodes C: nodes in V that do not belong to Nf and are not Sf

 4: Evaluate nodes in C based on load and sensitivity

   (1) Compute load loadv=uneighborsvwuv

   (2) Compute sensitivity sensitivityv=1degv+1

 5: Parallelize evaluation for all nodes in C using local agents

 6: Select the node SbestC with the minimum load load and maximum sensitivity

sensitivity

 7: Create a new link SbestNf1, where Nf1 is the first neighbor of Sf

 8: returnG

Faulted section isolation is handled locally by agents that manage the isolation of faulted sections within the microgrid. They monitor circuit breakers and switches, issuing open commands to isolate affected areas during faults. These agents adjust the power output of distributed energy resources (DERs), ensure proper switch operation, and prevent fault propagation to maintain system stability. Additionally, they validate the operability of isolation systems, addressing failures promptly to stabilize the network. These monitoring nodes (signal trains) are integrated through a Pareto front within the resource optimization framework, considering the number of nodes (orchestrator and agents) and coordinating the transition during transient periods (ensuring correct control mode and compliance with inertia zones).

Ultimately, the previous algorithms for transient detection, when integrated with system observability, necessitate an architecture grounded in edge computing. This approach enhances the responsiveness and efficiency of protection schemes by enabling localized processing and decision-making. The subsequent section will illustrate the deployment of this architecture within a service-based computing (SBC) framework, further emphasizing its pivotal role in advancing microgrid management.

The integration of renewable energy sources and their diverse control strategies introduces dynamic shifts in the static security regions of power systems, significantly impacting protection schemes and reserve configurations. To address these challenges, advanced algorithms are proposed to ensure optimal PMU transitions during faults under a hierarchical communication standard. This approach enhances information reliability for control centers, aligning with Northern Europe’s sectionalized system methodology, which emphasizes distributed agent-based management.

Algorithm 2. Topology based fault location detection-CV multistage.

Input: Measurement and statuses of field devices, information from protective IEDs, DFIs from switches, state of latent space of energetic Wavelet and system stability monitoring (Resilient Self-Healing)(SQL)

Output: Location of the faulted section in microgrid (Modular internal fault in the converter, power filter coupling or external fault in the feeder)

STATUS: Field devices (TWEjl,TWEjlPmax,TWEthr,SWcjt,SWvjt,SWstjt,

CBcjt,CBvjt,CBstjt,CBdfijt)Nctr,jm; Switching conditions manager (p, o list(DMS,FRT,FCM)) Fcon/comprev_state; HIL conditions (Digpriority,BusLocal_parametric_insertion,OPCcontrollers) Changes_process_container.

N = len(M_ID.row)

m Number of agents

fc Forecast of generative NN (follow up ϰ(FCM register, MATLAB event framework), test (Mutli-CE), optimizer(ADAM), hyperparameters validation(RandomSearchCV)

Previous _state = M_ID(Class)Zone(STATUSpo),t Map of fault transitions

whileDigpriority=trueProjection_latent_classifierTWEjlPmaxTWEthr=Faultdo

ifIEDtrjt=truethen

  Calculate topology-based fault location detection, Nctr,j   HomologyPersistent(Vietoris.rips,n.ring.follow

  =N_transitories_training,SWset,CBset)

  Calculate information failure trip in HIL, FRT_Zone(STATUS-UDM models)

 end if

ifCBj=r''#j_False_External" in Previous _state then ⊳ Contrast (Manager.list,  field)

  For 1: N do

    Dpre= get(before the event, STATUS(t))

    Dpost= get(after the event, STATUS(t+))

  EndFor

  SQL_Ext = SQL manager extraction (Fcon/comprev_state)

  Check DFIs (Dpre,Dpost)

  if [DFIs.STATUS(Direct)(Dpre=Dpost) Stability(BusLocal_param_insertion=true)  OPC.Dll= fcgen] then

    Fault at the next section of the last DFI

    OPCcontrollerst+.list = Digital map(m, Controlled external harmonic  insertion)

  else if DFIs.STATUS(Direct)(DpreDpost) OPC.Dll= fcgenthen

    Fault at the pointing section

    execute (SIPS conceptual design)-Class Python

  else if OPC.Dll fcgen Stability(BusLocal_param_insertion=false) then

    Failure in (IEC 61850 metrics, Tuning-dynamic detector, Internal board  failure) DMS virtualized

    Execute (Changes_process_containerguidance_agents)

  end if

end if

end while.

Algorithm 3. Faulted section isolation.

Input: Measurement and statuses of field devices, and status feedback from the switches and DERs

Output: Isolation of faulted section within microgrid

STATUS:SWcjt,SWstjt,CBcjt,CBstjt,TMstjtNctr,jm,Mj

Check the status of the target CB, switches SWstjt,CBstjt

ifSWstjt=close & CBstjt=close then

 Send open command to SWj and CBj

 Change TMstjt=true

For 1:j do

Mj=αMj+1αMj1, αsignjTMstj+1FrameworkDERcLoadpqj

TMstj+1=TMstmjeVhighFinjection

 EndFor

 Validation of switchover operability in manager.

end if

if DER exists within the faulted section SWstjt=closethen

 Send open command to SW j

else

 Reduce power order of DER j to min. Set DERpowerjt = minimum

Check API (physical/virtual) (register_movements.sdf = SWcjt   Switch.oti = CBcjt)

end if

Advertisement

3. Kubernetes architecture for energy management: Agent-based flexibility systems

As the demand for efficient energy management systems grows, the integration of advanced technologies becomes essential for optimizing microgrid operations. This section delves into the architecture of Kubernetes for energy management, highlighting its role in facilitating agent-based flexibility systems. By leveraging Kubernetes, the proposed framework aims to enhance scalability, resilience, and real-time responsiveness in managing complex energy systems, ultimately driving improvements in operational efficiency and system stability.

The increasing complexity of modern energy systems necessitates robust and scalable architectures to manage flexibility effectively. Kubernetes, as a container orchestration platform, provides a powerful foundation for deploying and managing agent-based systems within energy management environments. By leveraging a cluster-based architecture, energy systems can achieve heightened levels of automation, adaptability, and fault tolerance. This capability enables the efficient deployment of agents tasked with critical operations such as demand response, resource optimization, and real-time grid balancing, ensuring the system remains flexible and responsive to dynamic conditions.

In the evolving paradigm of flexible power system operations, three core aspects demand attention: observability, response speed, and failure or downtime rates. These factors are pivotal in designing an optimal architecture tailored to the incremental rate of the Independent System Operator (ISO). This incremental rate, alongside the corresponding data pools and databases, drives the need for increasingly granular and scalable architectures. With the progressive proliferation of low-voltage systems, such as microgrids, the necessity for hybrid managers based on distributed system operators (DSOs) has become evident. These developments underscore the growing volumes of information and the escalating functionality requirements of Energy Management Systems (EMS).

Traditional monolithic architectures can no longer meet the low coupling and high-reliability demands of modern EMSs. Gartner’s proposed architecture from 1996 introduced a new operational paradigm by integrating functionalities such as five-minute forecasting and short-term hourly offers, emphasizing grid security, reliability, and resilience. Current control centers predominantly utilize web-based service applications to manage EMS functionalities. Service-Oriented Architectures (SOA) offer ease of replacement and high reliability, making them a mainstream choice for software development. SOA has thus become a standard approach in the implementation of modern EMSs.

However, SOA-based energy management systems (S-EMS) face several limitations. First, the coarse-grained decoupling in SOA results in a lack of fine-grained fault isolation. Service failures within an application can propagate, causing the application to fail entirely. This issue complicates fault localization, requiring prolonged system downtimes and jeopardizing the safety and stability of the power system. Second, the absence of robust application isolation increases the risk of cascading failures triggered by individual application malfunctions. Third, a sudden surge in system access can lead to performance degradation due to the enterprise service bus (ESB). Failures or performance bottlenecks in the ESB compromise the robustness of the entire system.

In response to these limitations, Kubernetes offers a microservices-based architecture that addresses the drawbacks of traditional SOA. By employing fine-grained decoupling, Kubernetes-based EMS architectures ensure better fault isolation and resilience. Each microservice operates independently, allowing the system to remain operational even during localized failures. Furthermore, Kubernetes’ ability to dynamically scale resources enhances system robustness during high-demand scenarios, ensuring uninterrupted service. This shift towards a Kubernetes-based architecture represents a transformative advancement in the design and deployment of flexible and resilient EMSs, enabling them to meet the evolving demands of modern energy systems effectively.

Complementing Kubernetes, a physical deployment based on intelligent electronic devices (IEDs) and single board computers (SBCs) is proposed to establish a foundational layer at the local level. The IEC 61850 standard provides the necessary framework for the physical layer and the implementation of protection and control functionalities. This standard ensures interoperability and reliability in the communication and coordination of devices within the power system. On the other hand, a RESTful API serves as a lightweight communication method for transmitting global indicators, enabling efficient data exchange between local and central systems. This dual-layered approach enhances both localized control and global coordination, offering a robust and scalable solution for modern EMS architectures.

  • Lighter communication method: RESTful realizes the management of and access to resources through a URI, which has stronger scalability and a clearer system structure. Communication between services is carried out by accessing the specific API of the target service, which breaks the performance bottleneck of the ESB and has high reliability.

  • Excellent application isolation: The K-EMS decouples EMS applications into services and deploys them in containers. Services are effectively isolated, and cascading failure is avoided.

  • Upgraded expansion method: The system expansion method is refined from the system to the application, and the system can achieve higher reliability under limited resources.

To validate the communication system’s reliability and synchronization under real-world conditions, physical tests were conducted using a Raspberry Pi-based setup. These tests focused on offset calibration between local agents and higher-level controllers, ensuring precise timing and seamless coordination in distributed tasks. The results confirmed the system’s ability to dynamically adjust offsets and maintain synchronization across hierarchical layers, even under variable operating conditions. Figure 5 illustrates the physical test setup and the offset calibration process, emphasizing the accuracy achieved during these trials.

Figure 5.

Offset detection for minimum trigger level adjustment to local sequences.

Figure 6 illustrates the information flow dynamics between hierarchical services implementing master transition through zonification, with particular emphasis on scenarios involving high memory consumption and critical importance cases as zone masters. The diagram presents three distinct states of the distributed network: the original configuration with full connectivity (Figure 6a), followed by two post-failure scenarios showcasing the implementation of Algorithm 1 Alternative Route Selection in Distributed Networks. This algorithm facilitates dynamic master reassignment based on local network metrics and zonification parameters, ensuring service continuity through intelligent path recalculation.

Figure 6.

Information Flow Between Hierarchical Services with Master Transition Through Zonification, focusing on high memory consumption and critical importance cases as zone masters: (a) original state of distributed agent processes, (b) decision-making path after service failure in 7, and (c) decision-making path after service failure in 10.

When analyzing the transition states after service failures in nodes 7 and 10 (Figure 6b and 6c respectively), we observe how the algorithm orchestrates master transitions by evaluating local performance metrics within each zone. The network’s adaptive response is evident in the modified connection patterns - represented by red paths in the post-node 7 failure state and blue paths in the post-node 10 failure state - demonstrating how the algorithm optimizes the network topology while maintaining the hierarchical service structure. This implementation shows particular efficiency in handling high memory consumption scenarios, where master selection decisions are critical for maintaining optimal service distribution across the zonified architecture.

3.1 Resource allocation in distributed process management systems

This structure is disruptive compared to conventional methods for identifying natural constraints. By integrating the communication topology with the power system, uncertainties related to demand variations and topological changes are significantly reduced. This approach narrows the analysis framework to short-term events, shifting the maintenance perspective towards dynamic disconnection management. Such a shift not only requires regulatory improvements but also facilitates flexibility markets, effectively eliminating the need to explore incremental pessimistic scenarios.

This technological variation necessitates the subdivision of operational areas into smaller compact elements, where DSOs (distribution system operators) are tasked with managing system outputs and making critical control decisions. This subdivision aims to address challenges in grid robustness and stability, ensuring that localized issues can be managed efficiently while maintaining overall system performance.

This technological variation necessitates the subdivision of operational areas into smaller, compact elements, where DSOs (distribution system operators) are tasked with managing system outputs and making critical control decisions. This subdivision aims to address challenges in grid robustness and stability, ensuring that localized issues can be managed efficiently while maintaining overall system performance. As demonstrated in Figure 7, the evaluation of resource depletion sensitivity in relation to function scalability reveals a significant pattern: every twelve processes correspond to a microgrid process that exhibits decremental behavior. This observation is particularly evident in the memory and sensitivity curves (12–24), where the resource utilization pattern shows distinct cyclical variations, suggesting an inherent relationship between process allocation and system efficiency in managing distributed resources.

Figure 7.

Evaluation of resource depletion sensitivity in relation to function scalability, considering the involvement of resilient agents (12–24).

Furthermore, under a system-based control (SBC) framework, physical constraints inevitably emerge. These include limitations related to memory capacity, response times, and communication distances between agents. If this data structure were integrated into an optimization problem involving power flow analysis, an incremental variant of constraints at the n-gram level would appear. Therefore, it is recommended that the communication structure be expanded significantly. Such scaling would allow tasks managed by agents to be progressively subdivided with greater precision, rendering the Pareto front unnecessary, as system supportability would be overestimated.

3.2 Structural hierarchy for service functions in intelligent microgrids

Figures 8 and 9 illustrate a hierarchical multi-agent system using hybrid control, organized into upper, middle, and lower levels. The upper level handles global decision-making with optimization strategies and energy management, supported by Kubernetes containers for scalable agent deployment and cloud databases for centralized knowledge management.

Figure 8.

SIPS diagram (information system for planning and supervision).

Figure 9.

Architecture of a hierarchical multi-agent system based on hybrid control [15].

The process bus links the physical layer—comprising sensors, actuators, and field devices—to agents deployed in Kubernetes containers. These agents, especially at the lower level, perform local control, event recognition, and perception, leveraging Kubernetes for low-latency processing and rapid response to environmental changes.

At the middle level, agents coordinate strategies between the upper and lower levels, ensuring optimized models are validated and adjusted for grid dynamics. Kubernetes enables scalable container orchestration for efficient evaluation, learning, and communication across levels, while cloud databases manage information flow with redundancy and failover capabilities.

Knowledge management integrates with system infrastructure through a cloud-hosted repository for control strategies and system integrity protection scheme (SIPS) results. Kubernetes facilitates modular agent implementation for protection logic and fast grid response, redefining traditional approaches by virtualizing protection and control layers.

Integration studies ensure grid stability during transitions using pseudo-random generation and operational strategies. Kubernetes dynamically scales agents based on computational needs, optimizing resources for system demands.

Finally, the process bus ensures robust communication between the physical layer and agents, enabling synchronized operation and dynamic real-time control. This architecture, managed by Kubernetes, provides a scalable, resilient platform for systematic security studies and efficient grid operation.

Advertisement

4. Conclusions

The integration of cutting-edge technologies, such as adaptive multi-agent systems (MAS), edge computing, and hierarchical control, has significantly advanced microgrid protection. These methodologies enhance fault tolerance, operational efficiency, and real-time decision-making by enabling localized control through dedicated agents assigned to intelligent electronic devices (IEDs). Edge computing, in particular, minimizes latency, improves fault clearance times, and ensures system resilience by allowing autonomous operation even during communication disruptions.

4.1 Future recommendations

Future work should focus on finding the Pareto optimal point between the electrical proximity model with state estimation and the unit-based resource allocation model. This approach will balance the trade-offs between accuracy, computational efficiency, and resource utilization, enabling more precise fault detection, load balancing, and grid management. By integrating this optimization into edge computing frameworks, microgrids can achieve enhanced responsiveness and reliability, reducing dependency on traditional state estimators and improving real-world operational performance.

Advertisement

Acknowledgments

This research was funded by Minciencias and Universidad del Valle under the project “Estrategias para el desarrollo de sistemas energéticos sostenibles, confiables, eficientes y accesibles para el futuro de Colombia” (Minciencias Code: 1150-852-70378, Hermes Code: 46771).

Advertisement

Notes

Note from author: The structural basis of the architecture can be found in article [6] which was developed with the Colombian Ministry of Science and tested through a laboratory, in addition to international consultancy for the implementation at the microgrid level. In case of any doubts regarding the types of tests, safety conditions, cascade simulations, and master reconnection, please refer to the article. Additionally, for inquiries about real-time setup, wavelet mother criteria definition, or creation of virtualized IEDs, contact the author at andres.diaz.caicedo@correounivalle.edu.co.

Advertisement

Thanks

I would like to express my profound gratitude to my family for their unwavering support throughout my academic journey, which has been a constant source of strength and motivation. I am also deeply thankful to the directors of my research group for their trust and mentorship, enabling me to lead the development of new algorithms in microgrids. Additionally, I acknowledge the University of Valle for providing an exceptional intellectual environment and essential resources that have been instrumental in advancing my research.

Advertisement

Abbreviations

ADAM

algorithm for optimization used in neural networks.

AN

agent node.

API

application programming interface.

ARM

advanced RISC machine (processor architecture).

BAT

battery.

CE

control event or efficiency condition.

CIP

critical infrastructure protection.

CLUSTER

group of interconnected computing nodes.

CV

cross-validation.

DE

data event.

DER

distributed energy resources.

DFI

distributed fault indicator.

DMS

distribution management system.

DSO

distribution system operator.

DYN

dynamic processes or states.

EMS

energy management system.

ESB

enterprise service bus.

ETAP

electrical transient analyzer program.

FCM

fault current management.

FRT

fault ride through (grid fault tolerance).

GOOSE

generic object-oriented substation event (IEC 61850).

HIL

hardware-in-the-loop.

ICDE

international conference on data engineering.

IED

intelligent electronic device.

INT

interaction model.

ISO

independent system operator.

JIT

just-in-time compiler.

MAS

multi-agent systems.

MCC

microgrid central controller.

MQTT

message queuing telemetry transport (protocol).

NERC

North American electric reliability corporation.

OPC

open platform communications (industrial protocol).

PEDG

power electronics for distributed generation.

PV

photovoltaic system.

SBC

single board computer.

SCADA

supervisory control and data acquisition.

SCL

substation configuration language.

SIGPL

special interest group on programming languages.

SIL

Software-in-the-loop.

SIPS

system integrity protection scheme.

SMV

sampled measured values (IEC 61850).

SNR

signal-to-noise ratio.

SOA

service-oriented architecture.

TSN

time-sensitive networking.

UDM

universal dynamic models.

URI

uniform resource identifier.

VLDB

very large data bases.

References

  1. 1. Cuadrado N, Gutierrez R, Zhu Y, Takac M. Mahtm: A Multi-Agent Framework for Hierarchical Transactive Microgrids. arXiv; 2023
  2. 2. Zhang X, Zhong Q-C, Ming W-L. Adaptive series-virtual-impedance control strategy for load converters to improve the stability of the cascaded system. In: 2016 IEEE 7th International Symposium on Power Electronics for Distributed Generation Systems (PEDG). Piscataway, NJ, USA: IEEE; 2016. pp. 1-5
  3. 3. Liu C, Jiang B, Patton RJ, Zhang K. Hierarchical-structure-based fault estimation and fault-tolerant control for multiagent systems. IEEE Transactions on Control of Network Systems. 2019;6(2):586-597
  4. 4. Fkaier S, Khalgui M, Frey G. Meta-model for control applications of microgrids. In: 2020 6th IEEE International Energy Conference (ENERGYCon). Piscataway, NJ, USA: IEEE; 2020. pp. 945-950
  5. 5. Shobole AA, Wadi M. Multiagent systems application for the smart grid protection. Renewable and Sustainable Energy Reviews. 2021;149:111352
  6. 6. Diaz Caicedo AM, Mejía ÉF, Gómez-Luna E. Revolutionizing protection dynamics in microgrids: Local validation environment and a novel global management control through multi-agent systems. Computers and Electrical Engineering. 2024;120:109748
  7. 7. Alasali F, Hayajneh AM, Ghalyon SA, El-Naily N, AlMajali A, Itradat A, et al. Enhancing resilience of advanced power protection systems in smart grids against cyber–physical threats. IET Renewable Power Generation. 2024;18(5):837-862
  8. 8. Sahu A, Wlazlo P, Mao Z, Huang H, Goulart A, Davis K, Zonouz S. Design and Evaluation of a Cyber-Physical Resilient Power System Testbed. arXiv; 2020
  9. 9. Abbaspour E, Fani B, Sadeghkhani I, Alhelou HH. Multi-agent system-based hierarchical protection scheme for distribution networks with high penetration of electronically-coupled dgs. IEEE Access. 2021;9:102998-103018
  10. 10. Taveras-Cruz AJ, Mariano-Hernández D, Jiménez-Matos E, Aybar-Mejia M, Mendoza-Araya PA, Molina-Garca A. Adaptive protection based on multi-agent systems for ac microgrids: A review. Applied Energy. 2025;377:124673
  11. 11. Waraphok P, Saengsuwan T. Database development for power quality in pea’s distribution system. In: 2007 9th International Conference on Electrical Power Quality and Utilisation (EPQU). Piscataway, NJ, USA: IEEE; 2007. pp. 1-6
  12. 12. Galantino S, Risso F, Cazzaniga A, Garrone F, Terruggia R, Lazzari R. An edge-based architecture for phasor measurements in smart grids. In: 2022 AEIT International Annual Conference. Piscataway, NJ, USA: IEEE; 2022. pp. 1-6
  13. 13. Mahmoudian A, Garmabdari R, Bai F. Adaptive power-sharing strategy in hybrid ac/dc microgrid for enhancing voltage and frequency regulation. International Journal of Electrical Power and Energy Systems. 2024;156:109696
  14. 14. Duan J, Wang C, Hao X, Liu W, Xue Y, Peng J-c, et al. Distributed control of inverter-interfaced microgrids based on consensus algorithm with improved transient performance. IEEE Transactions on Smart Grid. 2019;10(2):1303-1312
  15. 15. Dou C-X, Liu B. Multi-agent based hierarchical hybrid control for smart microgrid. IEEE Transactions on Smart Grid. 2013;4(2):771-778

Written By

Andres Mauricio Diaz Caicedo

Submitted: 09 February 2025 Reviewed: 06 March 2025 Published: 17 April 2025