Smart Detection: Reinforcement Learning for Network Intrusion Defense

Faheem Yar Khuhawar; Tayyaba Shaikh; Abdul Latif Memon; Irfan Halepoto; Fahim Umrani; Rizwan Ali Shah; Hyder Bux Mangrio; Omar Bani Fayyad

doi:10.5772/intechopen.1008896

Abstract

As cyber threats grow in complexity, the demand for intelligent and adaptive intrusion detection systems (IDS) is more critical than ever. Traditional machine learning models, while effective, often struggle to keep up with the dynamic and evolving nature of cyberattacks. This chapter presents an advanced approach to network intrusion detection using reinforcement learning (RL), a machine learning paradigm that enables systems to learn optimal actions through trial and error without the need for extensive retraining. Specifically, the proposed IDS leverages Q-learning, enhanced by dueling deep Q-learning (DQL) and double deep Q-networks (DDQN), to autonomously monitor and protect networks. By learning from its environment and making decisions based on real-time feedback, the system continuously improves its detection capabilities, even as new threats emerge. When tested on the CIC-IDS 2018 dataset, the DQL-based IDS achieved an impressive accuracy of 94%, significantly outperforming traditional machine learning algorithms such as Decision Trees, Random Forest, and XGBoost. Unlike conventional models constrained by static feature sets and predefined learning, the RL-driven IDS adapts dynamically to changing environments, offering robust detection of sophisticated intrusions. Despite its strong performance in simulated environments, the practical application of this approach to real-world scenarios presents challenges, such as ensuring scalability and handling diverse network conditions. Nonetheless, this research demonstrates the transformative potential of reinforcement learning in network security, paving the way for systems capable of autonomously detecting and responding to complex cyber threats in an efficient and adaptive manner.

Keywords

intrusion detection system (IDS)
reinforcement learning (RL)
Q-learning
dueling deep Q-learning (DQL)
double deep Q-network (DDQN)

Author Information

Show +

Faheem Yar Khuhawar *
- Department of Telecommunication Engineering, MUET, Pakistan
Tayyaba Shaikh
- Department of Telecommunication Engineering, MUET, Pakistan
Abdul Latif Memon
- Department of Telecommunication Engineering, MUET, Pakistan
Irfan Halepoto
- Department of Electronic Engineering, MUET, Pakistan
Fahim Umrani
- Department of Telecommunication Engineering, MUET, Pakistan
Rizwan Ali Shah
- Department of Telecommunication Engineering, MUET, Pakistan
Hyder Bux Mangrio
- Department of Telecommunication Engineering, MUET, Pakistan
Omar Bani Fayyad
- Bahrain Polytechnic, Bahrai

*Address all correspondence to: faheem.khuhawar@faculty.muet.edu.pk

1. Introduction

1.1 Rise of big data

Big data has become a popular concept with the technological advances in the past decade. Before then, it was extremely costly to store the data because acquiring the needed disk space was not as easy as today. Cloud systems were expensive and not common; therefore, to store data, companies needed to have their own storage facilities which required time, space, and money, much more than compared to today. In the present day, we have unlimited storage, especially provided by cloud services, at our disposal. With the competition between service providers, the prices of these services are decreasing, whereas the quality of the services is rising. Moreover, many different frameworks and tools exist to handle different types of big data. All these advancements have recently made big data one of the most popular concepts in computer science.

The capability of storing and retrieving data sheds light on big data. In real time, it is important to highlight the fact that it could be of great help in detecting any attacks on systems. We worked on a big data analytics approach for intrusion detection in networks using deep reinforcement learning, incorporating the system’s time feature during anomaly detection, which aligns with the goals of most intrusion detection tasks. In this regard, as the system is planned to act in near real time, it utilizes streaming big data from the network. On the contrary, the processing of streaming data is done in real time, so real-time queries on the side are also possible. For this reason, stream processing has been selected as the relevant mode of processing in our case.

1.2 Machine learning for intrusion detection

Over the last few years, there has been a huge leap in the machine learning domain and it started being employed in several areas. One of the popular areas, which have been utilizing machine learning-based solutions, is the cybersecurity domain for detection of malicious activities. Due to signature-based approaches now considered to be inadequate for the detection of current cyber threats, machine learning approaches have attained enormous relevance to address this shortfall.

In the survey [1], the machine learning techniques used in intrusion detection systems are of three broad categories, namely supervised learning, unsupervised learning, and reinforcement learning.

Supervised learning: This technique of learning is also referred to as classification and in this technique, the model is presented with a given dataset that contains labeled instances to train the model. For that reason, supervised learning algorithms seek to estimate the output values of new data points by building dependencies and relations that exist between the features of the inputs and the labels of the dataset that will become the outputs after the model has been trained. Boosting, Ensemble classifiers (Bagging, Boosting), Linear classifiers (Logistic regression, Fisher Linear discriminant, Naive Bayes classifier, Perceptron, SVM), Quadratic classifiers, etc. are some of the commonly used supervised learning techniques.
Unsupervised learning: In contrast to supervised learning, the model is provided with an unlabeled dataset in this unsupervised learning. The algorithms operating on pattern detection and descriptive modeling are among the most common methods employed in this strategy. Since there are no labels to learn, algorithms that fall under this category explore the unlabeled input data by clustering, summarizing, and detecting patterns in order to extract useful information to make predictions. Some of the most commonly used unsupervised ones are: Cluster analysis (K-means clustering, Fuzzy clustering), Hierarchical clustering, Self-organizing map, Apriori algorithm, Eclat algorithm, and Outlier Factor (Local Outlier Factor).
Reinforcement learning: Additionally, while belonging to Machine Learning kinds of techniques, reinforcement learning can also be considered a type of artificial intelligence. In this strategy, the algorithm keeps on learning in cycles from the environment it is operating in with respect to a reward framework. The overarching objective behind this is to increase the total sum of the rewards as the complete set of states is able to be reached. Commonly used reinforcement learning algorithms are: Q-Methods, T-D learning, and deep adversarial networks.

Mishra et al. [2] discussed a survey on the pattern of analysis in machine learning for intrusion detection. In this survey, they have discussed a few of the approaches of machine learning with examples of the related works completed. Using this survey, the general framework for network anomaly detection is employed for machine learning. First, it was decided that the labeled or unlabeled dataset will be the input into the selected technique. Following that, in the operation convenience, in the data processing phase, the dataset is reorganized and processed. Depending on the selected machine learning technique, anomaly detection can be performed using either supervised or unsupervised learning. The process results in an output that serves as a prediction from the model. Finally, the calculated scores or labels determine the model’s efficiency and effectiveness

Recent advances in the uptake of network mean that critical applications are executed over the computer networks, hence enhancing the need to embrace network security [3]. With continually increasing network traffic, attacks on military, government, and commercial networks have intensified [4, 5]. Intrusion detection includes the practice of watching a number of, or all, the operations that take place within a network environment for the first signs of security threats. Various attacks over the internet present a real menace to computer users and organizations. Intrusion detection in a system is the process of identification of unauthorized attempts to break the system to affect the CIA triad of confidentiality, integrity, and availability [3].

1.3 Framing the problem

The detection of network intrusion is important when it comes to protecting networks. With deep learning-based IDS, the current research has acquired significant detection performance; however, it faces two important limitations: the problem of imbalanced datasets and the inability to effectively detect minority and other unknown invasion types. Numerous studies employ various machine learning algorithms to identify network intrusions, yet, these algorithms often fall short against intelligent attackers. Traditional IDS, whether signature-based or anomaly-based, necessitates continuous updates to their databases to remain effective.

To address these challenges, we develop a reinforcement learning (RL) model capable of automatically adapting to new traffic patterns, thereby improving intrusion detection capabilities. This approach using the RL is designed to improve the current solutions to above-mentioned problems by offering a better model to use in network security.

The proposed reinforcement learning model, an intrusion detection method with feature selection based on deep reinforcement learning, is a solution to the current issues faced by intrusion detection systems (IDS), such as large computation and poor recognition of unknown network attacks. DDQN model is a method for detecting intrusions based on deep reinforcement learning.

Using correlation-based feature selection method, we first pick the ideal feature subset that best captures the deep information of the original dataset, and then we utilize the Mini-Batch module to generate the data to accommodate the DDQN model. Then, we create an effective network intrusion detection model. We utilize the CSE-CIC-IDS2018 dataset to train and evaluate the performance of the DDQN model, and the experimental results are compared to other well-known Machine Learning Models, which demonstrate that our proposed strategy can successfully pick the best feature subset of the original dataset and further enhance the model’s performance.

2. Review of previous relevant work

2.1 Intrusion detection in cloud networks

Deshpande et al. [6] proposed an intrusion detection model for cloud environments. The model consists of a data logging module, a preprocessing module, an analysis and decision engine, and a management module. Logs are obtained using the Linux audit framework. After logs are ready for processing, a k-nearest neighbor classifier comes into play and decides if there is an anomaly or not. In Ref. [7], Maiero and Miculan suggested that with the usage of virtualization techniques, intrusion detection monitors can be deployed in a guest VM or virtual machine monitors. Beyond host or network device monitoring, distributed collaborative monitoring approaches are also utilized to catch system-wide attacks as described by Bharadwaja et al. [8].

2.2 Big data approaches for IDS

In Ref. [9] Mahmood and Afzal stated that threat detection and monitoring is the largest field in security analytics for financial and defense institutions. Big data analytics helps in this area by predicting and detecting malicious or dangerous network traffic patterns, as well as unusual user behaviors. Additionally, it will help unveil sudden changes—which typically are suspicious incidents—in network servers. Recent works have proposed using big data processing approaches to solve the problem of intrusion detection in cloud environments [10].

One of these solutions was introduced by Casas et al. [11]. They developed a system called Big-DAMA, which utilized Apache Spark for both batch data processing and streaming data processing. Then, they combined their solution with five different supervised machine learning algorithms. To detect a possible attack using intrusion detection systems (IDS), Alavizadeh et al. [12] stated that basically two techniques can be used: In misuse detection, the IDS knows about previous attack patterns and tries to catch an attack by comparing the collected data to previous patterns. In anomaly detection, the IDS does not know about any previous attacks and tries to find anomalies in the network data, which could be possible signs of attacks. In recent years, machine learning approaches have been used successfully for both of these techniques [13].

2.3 Machine learning solutions for intrusion detection

With the advancements in machine learning in recent years, most of the anomaly-based intrusion detection systems have started benefiting from machine learning algorithms. One of the successful solutions, Beehive was introduced by Badr [14]. In their solution, they used logs to detect network intrusions. They separated their features into four categories:

Destination-Based Features
Host-Based Features
Policy-Based Features
Traffic-Based Features.

By using these four different feature types, they have been able to apply an unsupervised learning algorithm, k-means clustering, to detect suspicious activities. Although the Beehive solution is simple yet effective, it does not work in real time, whereas the solution described in this chapter works near real time.

A combination of k-means clustering and K-Nearest Neighbor was proposed by Sharifi et al. [15]. They first applied k-means clustering to define clusters and their centers. The clustering process is applied multiple times in order to achieve the best structure. Then, this structure is used to classify the data using KNN. Their solution is somewhat similar to Razaq et al. ‘s solution. Rather than tweaking k-means like them, they combined it with KNN. Their overall accuracy was around 90%, which should be improved in order to establish a secure system.

Another deep learning solution is proposed by Behera et al. [16], which is implemented using convolutional neural networks (CNN). In CNN, there are neurons with learnable weights and biases. CNN has five types of layers, namely, input layer, convolution layer, rectified linear unit, pooling layer, and output layer.

Different than standard neural networks, the convolution layer uses dot product of weights and local regions to calculate inputs for the next layer. Rectified linear unit is used for better gradient propagation and effective processing. The authors had successful results with their experimentations using the NSL-KDD dataset. Their solution proves the usability of deep learning for network intrusion detection. The solution proposed in this chapter combines deep learning with reinforcement learning for creating a system, which can adapt to zero-day attacks.

2.4 Deep reinforcement learning solutions in different fields

Deep reinforcement learning is being used in many different fields. Although it is especially common in AI solutions such as robots, game-playing agents, there are various approaches implementing it for distinct purposes. Playing Atari is one of the classic examples, which is implemented by Mnih et al. [17]. In their solution, similar to the solution in this thesis, a convolutional neural network is combined with reinforcement learning. They have used a modified Q-learning algorithm to train the network. In Ref. [18], Cuayahuitl et al. implemented a DRL solution for playing a strategic board game (Settlers of Catan). Their solution had significant success over other random, rule-based or supervised-based solutions. Giraffe, a chess engine developed by Lai [19], implements deep reinforcement learning to play chess. MathDQN, proposed by Wang et al. [20], used DRL to solve arithmetic word problems. Again, similarly, they have used a two-layer feed-forward neural network in order to find out the potential Q-value.

DRL is used in the biology field as mentioned by Mahmud et al. [21] in their paper. It is being used to extract features from biological sequence data (DNA, RNA, and amino acids) and perform predictions on them. Also, it is mentioned that DRL is used for bioimaging as well for pixel-level, cell-level, and tissue-level analyses.

Additionally, it is stated that DRL is implemented in many medical imaging applications for analyzing medical images obtained from different scans (MRI, CT, PET, etc.) [22].

3. Proposed work

The following section presents the work that will be carried out in relation to the aims of this chapter with the work progressing in a methodical sequence corresponding to the process flowchart outlined below (Figure 1). The process flowchart illustrates the key steps in developing a reinforcement learning (RL) model, starting from dataset preprocessing and feature engineering to data splicing, RL model development, training and testing, and finally, model evaluation.

Figure 1.
Process flow diagram includes; (1) dataset preprocessing, (2) feature engineering, (3) data splicing, (4) RL model development, (5) training and testing, (6) model evaluation.

3.1 Overview

The main contributions of our work are summarized as follows:

We present the network intrusion detection methods of this new generation that came from the integration of Q-learning-based reinforcement learning with a deep feed-forward neural network method. Our proposed model is equipped with the ongoing auto-learning capability for a network environment it interacts with and can detect different types of network intrusions. Its self-learning capabilities allow our model to continuously enhance its detection capabilities.
We provide intrinsic details of the best approaches employed in tuning the various hyperparameters of improved deep learning RL algorithms such as learning rates dislike the discount factor to facilitate optimal self-learning and interaction with the underlying network environment for even more optimized network intrusion detection tasks.
The empirical findings for the test case on the CSE-CIC-IDS 2018 dataset reveal that our proposed DDQN is conceivably useful in identifying various intrusion classes and is superior to other comparable machine learning techniques that yield more than 90% accuracy in the classification tasks related to multiple network intrusion classes.

3.2 Reinforcement learning-based intrusion detection system (IDS)

3.2.1 Reinforcement learning

An agent is designed to carry out a specific task independently without guidance. That agent engages in trial and error through interaction with that particular environment to reach its ultimate objective. After that, the agent gets a reward representing how well the performance was executed. The agent swiftly adapts its strategy in light of the incentives and focuses on raising the reward value (Figure 2).

Figure 2.
An image of a reinforcement concept.

Starting with an analogy, say our goal is to teach a child the behavior of returning items after usage instead of punishing the child when they do not or directly instructing the child on how to do this. We will instead reinforce the child’s behavior at any point in which they use the item and return with the reinforcement “good girl plus clap.” The behavior initially may not be stable as it may take the child several trials to meet reinforcement contingencies. Any time the child does not return items after use, they will be ignored, that is, no reinforcement (good plus clap) will follow. Eventually, the child realizes that putting things back in their rightful place after use produces support (good girl plus bangs). This way, the child begins to return stuff after use. The agent will not be taught what to do or how to do it in an RL situation but instead will be rewarded for each action it does. If the agent takes a reasonable effort, we will reward it positively; if it takes a wrong step, we do otherwise. As a result, reinforcement learning can be compared to a learning process where the agent tries out several behaviors and finds the one that produces a favorable reward. In the child example, the child stands as the agent, and rewarding the child by clapping when it successfully returns the item after usage is a positive reward, while failing to do so is Harmful [23].

3.2.2 Deep Q-network (DQN) and its variants

DQN is a variation of the Q-learning technique that approximates the long-term benefits connected to various actions in a given state using a neural network instead of a Q-table. Due to the difficulty of utilizing a Q-table to express high-dimensional and continuous state spaces, DQN is now equipped to handle them. The replay buffer is used to store and sample previous experiences for training, and using a variety of stochastic gradient descents, the neural network is trained. This enhances the stability and convergence of the learning process and enables the agent to learn from a broad range of events.

3.2.3 Dataset and data preprocessing

3.2.3.1 CIC ids 2018

Different benchmark datasets have also been used to compare the intrusion detection model. The work performed on the various datasets is to display higher classification accuracy and detection rate. In the CIC-IDS 2018 dataset, there are 15 different classes: 14 representing attack types and 1 representing benign traffic. The dataset contains a total of 16,233,002 instances (approximately 16 million). The CSE-CIC-IDS2018 dataset used in this study is obtained from the Canadian Institute for Cyber Security Research at the University of New Brunswick. Data was emulated in the CIC test environment with 50 attacking machines, 420 victim PCs, and 30 victim servers in the time period of February 14 to March 2, 2018. Fresh 14 attacks were identified, and the dataset is labeled and comes with anonymized PCAP files in one place. Altogether, 80 network traffic features were identified and computed from packets using the CICFlowMeter. There are 10 CSV files that are provided for machine learning, which include records of 16,232,943. In Table 1, the class representation of this dataset is shown in summary.

File/day	Normal instances	Attack instances
Wednesday-14-02-2018	667,626	FTP-BruteForce (193,360), SSH-BruteForce (187,589)
Thursday-15-02-2018	996,077	DoS attacks-GoldenEye (41,508), DoS attacks-Slowloris (10,990)
Friday-16-02-2018	446,772	DoS attacks-SlowHTTPTest (139,890), DoS attacks-Hulk (461,912)
Thursday-20-02-2018	7,372,557	DDoS attacks-LOIC-HTTP (576,191)
Wednesday-21-02-2018	360,833	DDOS attacks-LOIC-UDP (1730), DDOS attack-HOIC (686,012)
Thursday-22-02-2018	1,048,213	Brute-Force-XSS (79), Brute-Force-Web (249), SQL Injection (34)
Friday-23-02-2018	1,048,009	Brute-Force-XSS (151), Brute-Force-Web (362), SQL Injection (53)
Wednesday-28-02-2018	544,200	Infiltration (68,871)
Thursday-01-03-2018	238,037	Infiltration (93,063)
Friday-02-03-2018	762,384	Bot (286,191)

Table 1.

Data distribution of CSE-CIC-IDS2018.

Attack instances for each file/day with corresponding normal instance counts.

The diverse environment was developed with the AWS computing platform. The overall structure was further bifurcated as the attacking and victim organization. The attacking organization consisted of 50 machines. The victim organization has 420 machines and 30 servers. This dataset was formed from the accumulated network traffic and system logs of those machines. The following are the six different attack scenarios that were performed during the experiments:

DDoS
DoS
Brute-force
Botnet
Infiltration
Web Attack

Thus, as illustrated in Figure 3 from Ref. [24], in order to emulate all the attacks, various operating systems with different versions and different services (Windows, Ubuntu, and Mac) were integrated into the environment: Ubuntu versions: 12.04 and 16.04; Windows versions: Vista, 7, 8.1, and 10, etc.

Figure 3.
CSE-CIC-IDS2018 network attack topology.

The information regarding the distribution of the network traffic on this dataset is illustrated in Table 2 from Ref. [24]. All 14 attack types are associated with the attack scenarios.

Attack scenario	Attack name	Distribution (%)
Benign	None	83.07
DDoS	DDoS attacks-LOIC-HTTP DDOS-LOIC-UDP DDOS-HOIC	7.786
DoS	DoS-GoldenEye DoS-Slowloris DoS-SlowHTTPTest DoS-Hulk	4.031
Brute-force	FTP-BruteForce SSH-BruteForce	2.347
Botnet	Bot	1.763
Infiltration	Infiltration	0.997
Web Attack	Brute-Force-Web Brute-Force-XSS SQL Injection	0.006

Table 2.

Attack scenario distribution.

This collected raw network traffic data is distributed day by day among 10 CSV files which have been collected and stored in the AWS cloud and can be downloaded. When using CICFlowMeter-V3 as a network traffic flow generator and analyzer, over 80 features can be derived from network traffic data. Using CICFlowMeter, we can generate bidirectional network flows. This is very advantageous for the detection of cyberattacks since it provides results such as Duration, number of packets, number of bytes, and packet length in both the forward direction and backward direction (from destination to source). The attack tools and victim environments used to execute each attack are presented in Table 3 of Ref. [24]. detailing the types of attacks in the CSE-CIC-IDS2018 dataset on AWS.

Attack	Tools	Victim
Brute-force attack	FTP-Patator SSH-Patator	Ubuntu 16.4 (Web Server)
DoS attack	Hulk GoldenEye Slowloris Slowhttptest	Apache
DoS attack	Heartbleed	Ubuntu 12.4 (Open SSL)
Web attack	Damn Vulnerable Web App (DVWA) In-house selenium framework (XSS and Brute-force)	Ubuntu 16.4 (Web Server)
Infiltration attack	First level: Dropbox download in a Windows machine Second level: Nmap and portscan	Windows Vista, Macintosh
Botnet attack	Ares (Python): remote shell File upload/download Capturing screenshots, keylogging	Windows Vista, 7, 8.1, 10 (32-bit, 64-bit)
DDoS+PortScan	Low Orbit Ion Canon (LOIC) for UDP TCP, or HTTP requests	Windows Vista, 7, 8.1, 10 (32-bit, 64-bit)

Table 3.

Tools and victims for different attacks.

The distribution of the classes in the dataset can be seen in the figure below, refer to Figure 4 in Ref. [25]. As for data capturing and feature selection, we use the CIC-IDS-2018 dataset, which is composed of 16,232,943 instances. The normal instances are 13,484,708 and the attack instances are 2,748,235.

Figure 4.
Bar charts showing the class distribution in the dataset.

3.2.3.2 Data preprocessing

Data are preprocessed and normalized before any training is carried out on the same data. This phase is used for filtering out the data noise and retaining only meaningful and important information. In the proposed model, preprocessing involves the following main tasks (Figure 5):

New features without NaN values.
Timestamp column and correspondent record duplicates were deleted as no time series-dependent machine learning methods have been selected in this study. Subsequently, eight features initiating with ‘Bwd’ and ‘Fwd’ were eliminated because they were all empty and provided no useful information namely, ‘Bwd URG Flags,’ ‘Bwd Pkts/b Avg,’ ‘Bwd PSH Flags,’ ‘Bwd Blk Rate Avg,’ ‘Fwd Byts/b Avg,’ ‘Fwd Pkts/b Avg,’ ‘Fwd Blk Rate Avg,’ and ‘Bwd.’
The normalization procedure was carried out in order to level out all the data values which were collected. Therefore, this research applied min-max normalization so that each of the attributes has values in the range [0,1]. After carrying out data cleaning and normalization, particularly by standardization, we got 16,137,183 instances with 70 attributes. [CIS IDS 2018]

Fixing Data Types and Handling Missing Values
- Fixing data types: Ensure each column in the dataset has the correct data type (e.g., integers for labels, floats for continuous variables).
- Handling infinity values: Replace any infinity values with null values to avoid inconsistencies during data analysis.
- Dropping null values: Eliminate all rows containing null values, ensuring that only complete data points are used for analysis.
- Dropping unnecessary columns: Remove the Timestamp column as it is not needed for the model.
Label Generation and Attack Mapping
- Generating binary labels for threat column: Transform the Threat column into binary labels (0 for Benign and 1 for Malicious).
- Mapping multi-label attacks into six main classes: Attack categories are consolidated into the following classes:
- Brute-Force, Web Attack, DoS Attack, DDoS Attack, Botnet, Benign.
- Creating label column: Assign labels to each class to reflect the transformation.
- Train-test split and balancing the data.
- Preparing data for training and testing (X, y): Split the dataset into feature matrix (X) and label vector (y) for training and testing.
- Using RandomUnderSampler: Apply undersampling to balance the class distribution in the training set.
- Sanity check: Evaluate the label distribution before and after applying the RandomUnderSampler to ensure balance in the training data.
Combining Data and Removing Unnecessary Features
- Combining all data frames: Merge multiple data frames into a single one for a unified dataset.
- Removing constant features: Calculate the variance of each feature and drop those with zero variance (constant values).
- Dropping duplicates: Eliminate duplicate rows (using axis = 0), as they may distort model performance.
Feature Correlation and Selection
- Plotting Heatmap: Generate a Pearson correlation heatmap to visualize correlations between features.
- Implementing correlation-based feature selection (CFS):
- Calculate the correlation matrix and select highly correlated features (threshold greater than 90%).
- Drop highly correlated features from both the training and testing datasets.
- Feature selection logic: Ensure that correlation-based feature removal is done only after the train-test split and before applying the RandomUnderSampler.
- MinMaxScaler: Use the MinMaxScaler to scale features between 0 and 1 based on the training set, and apply the same transformation to the testing data.
Label Encoding and Class Weight Calculation
- Generating a List of Unique Labels: Extract unique class labels from the dataset, such as [‘Benign,’ ‘Brute-force,’ ‘Infiltration,’ ‘Web attack’].
- Computing Class Weights: Compute class weights to account for imbalanced data, assigning higher weights to underrepresented classes.
- Mapping Class Weights to Class Indices: Create a dictionary where each class label is mapped to its corresponding class weight.

Figure 5.
Data preprocessing methodology.

3.2.4 Feature engineering

When performing feature selection, SelectKBest acts only on the largest classes; hence, there can be an enhancement of the conjecture to perform a feature selection in the pipeline approaches selecting the most significant features first of all, for the specific class with the minimum samples and then use selective features for other classes. Feature selection will account for the proper performance of the classifier and generally assist in improving the IDS performance. The additional and non-contributing factors complicate the overheads and the classifier [26].

3.2.4.1 Feature selection using pre-trained model

Feature selection is done by feeding the pre-trained model to the SelectFrom Model and using the learned feature importance for selecting the features. Some of the features that were selected include ‘Dst Port,’ ‘Flow Byts/s,’ ‘Bwd IAT Tot,’ ‘Bwd IAT Min,’ and ‘Bwd Pkts/s,’ ‘Init Fwd Win Byts,’ ‘Fwd Seg Size Min.’

Feature importance: The feature importances from the model are pulled out and ordered and only the high priority features are employed for the training at the end.
Feature reduction: It aids in eliminating the extraneous features that do not positively contribute to the training process or improve the final model.

3.2.5 Training/testing ML models

3.2.5.1 Decision tree classifier

A Decision Tree is a supervised, non-parametric model where nodes represent features and end nodes represent classes or labels. It is used for classifying DDoS attacks by partitioning data based on the most informative features. The tree structure is built to predict outcomes, considering probabilities and factors like costs. In this study, the decision tree classifier is trained on 85 features, with feature importance ranked using ExtraTreesClassifier. To enhance efficiency, 20 features are selected for training. The model is evaluated on the test set using metrics such as accuracy, precision, recall, F1-score, and confusion matrix.

Decision trees are effective, interpretable classifiers, commonly used in tasks like fraud detection, disease diagnosis, and credit risk assessment (Figure 6).

Figure 6.
Example of decision tree classifier.

3.2.5.2 Random Forest classifier

The Random Forest Classifier is employed in this research from the Scikit-learn ensemble module. Random Forest is one type of ensemble learning best known for its process of generating the decision tree as many numbers of times as the number of data samples. The model used in this case is decision trees, and to further divide this model, each decision tree will have the features and data points randomly selected. In this implementation, the Random Forest Classifier is created with its default attributes for training models, using the gini impurity measure for feature importance, and maintaining 100 decision trees in the forest. Random Forest Classifier is developed using the hyperparameters that are set at default and they include the number of trees in the forest as well as maximum depth of each tree run. After that, the trained model is used to select churn status of the test data. Last but not the least, we also measure the model’s accuracy, precision, recall value, and F1-score for its performance. The model is fit with the training data through the fit () method after which the model is ready for testing (Figure 7).

Figure 7.
Example of Random Forest Classifier.

3.2.5.3 Extreme gradient boosting (XGB) classifier

XGBoost is one of the most used machine learning algorithms for classification, regression, or rank purposes. It is in the family of boosting algorithms which are learning algorithms that are built iteratively and are made up of weak models. The XGBoost algorithm differs from traditional boosting algorithms in two main ways: First, a stricter model formalization to prevent overfitting is employed, which is attained through the addition of a penalty term to the loss function that is optimized during training. Second, it employs second-order approximation of the objective function so as to enhance the rate of convergence to the optimal solution and enhance the performance of the resulting model.

This approach has higher computational efficiency in several low-memory environments and is scalable in all cases. The quality of a split was inspected by the ‘friedman mse’ parameter, viz., the average of squared deviations of each split’s effectiveness from the overall average of all splits. To generate the performance score, Friedman’s percentage is added to the Mean Squared Error (MSE). This criterion provides the most reliable approximation in this study. Moreover, the identification of using probabilistic outputs is made by adopting deviance as the loss function, which is, as a matter of fact, logistic regression.

All in all, XGBoost is a very efficient and versatile machine learning tool, which demonstrates good results on different datasets and tasks. Radial basis functions are widely used due to their capability of dealing with large-scale and complicated data, and the regularization and second-order optimization approaches. Figure 8 shows the schematic illustration of the XGboost model.

Figure 8.
Example of XGBoost Classifier.

3.2.6 DDQN model description

This section outlines the DRL model used in this study alongside providing more information on it. The agent interacts with its environment by executing actions, observing their outcomes in terms of rewards, and predicting subsequent states. These experiences, which consist of states, actions, rewards, and the ensuing state—are stored in a replay buffer. This buffer enables the random sampling of batch experience for training, preventing the agent from becoming overly dependent on recent events.

A Q-function is employed to estimate the expected maximum reward achievable from a given state by taking a specific action. Therefore, the Q-value, which is represented as Q(s, a), depends on the state s and the action a. Whenever the Q-function is defined, fixing the policy function which prescribes which action should be made in a given state is possible. The policy function is directly influenced by the state and is computed based on the learned Q-values.

Policys=argmaxmaxaQsaE1

The epsilon-greedy policy is a training process that supports exploration while exploiting known information at the same time. The agent makes a random move with probability and otherwise, it chooses a move that it considers best for the high expected reward. A policy is used to designate a stochastic choice mechanism for an action at each state. After a policy has been set, one can predict what follows in terms of rewards. The action-value function estimates the expected future cumulative reward when an action is taken in a state and the policy is then pursued. The policy is one that leads to the maximum expected cumulative reward over the states. This optimal policy is a fixed point of the Bellman optimality equation, which is considered in its properties in the next section.

Q∗as=Ras+γ∫s′Ts′/s,amaxa′Q∗a′s′E2

The Q-value was updated iteratively in order to optimize. With each training, iteration starts a sample of the current state, its label, and the following state. This sample belongs to a random sub-sampling of a group of mini-batches; in other words, a mini-batch has a random selection of samples from the specified dataset. To diversify the space, the dataset is first randomly permuted before generating each mini-batch.

The action-value function is then approximated by a neural network containing three hidden layers. Softmax and tanh are generally known to give negative Q-values at the output layer, but the output layer herein employs a linear function to allow positive Q-values only. The network is taught using the Huber loss function, which determines the loss between the predicted Q-value and a reference value. This reference value is obtained by accumulating the current reward, Q-value of the successive state with discount factor and correct label.

In this way, to obtain the expected value of the current state, we take the Q-function with the labels to be evaluated equal to all available label values. This process results in a vector of values, depicted as a thick arrow in the graphic presentation of the model introduced in Figure 9.

Qsva=Qsv,a0,Qsv,a1,…,Qsv,apE3

The universal set of actions is expressed by the symbol a, with its size indicated by the symbol p. The best action from this set exhibiting the highest Q-value is chosen using the argmax function. This chosen action is then fed into the epsilon-greedy algorithm which then decides whether to take the action epsilon*p or take a random action 1-epsilon*p. The action that is expected when applying the functionalities of the decision-making process is referred to as expected action.

Using a similar approach, the next action for the next state is predicted without the inclusion of epsilon-greedy exploration.

The reference Q-value is computed from the predicted action as well as the next state of the game and then used to arrive at the action during the learning process. This can be easily done by direct maximization of the Q-value for the next state and all the possible actions.

To stabilize training, two neural networks are employed in the DDQN framework: one for the current Q-function and a separate target network, which is updated periodically during each iteration. The target network is an outdated copy of the primary network and thus slows down the changing of Q-values in the target. This mechanism contributes to solving the shifting target problem. The model was trained to 300 epochs; epoch is defined as one complete run of all the data through the model.

4. DDQN analysis and results

The states (s_t) represent the random sample from the input data that the RL agent uses to make predictions, and the actions (a_t) variable corresponds to the RL agent’s forecasts for the binary classification task. The environment selects a new state (s_t + 1) randomly from the dataset, such that it has the same intrusion label as the action chosen by the environment, and the agent chooses an action based on its policy and the new state. The reward (r_t + 1) is computed based on the agent action and the true intrusion label of the new state. Using the transition (s_t, a_t, s_t + 1, r_t + 1), the environment and the agent update the Q-function, then the new state (s_t + 1) becomes the current state (s_t) for the next iteration. In our case, the input data is the dataset with the selected features from our feature selection process. Actions can take on values 0 or 1, where 0 typically represents a prediction of ‘no attack’ or ‘normal,’ and 1 illustrates a projection of ‘attack.’ The reward variable represents the immediate reward the RL agent receives for its actions (i.e., based on the correct and incorrect predictions). The reward is calculated based on the correctness of the agent’s predictions. If the forecast is correct (the predicted activity matches the actual label), the reward is set to 1; otherwise, it is set to 0. The total reward by episode variable accumulates the total compensation obtained during an episode. It represents how well the agent is performing in terms of correct predictions.

The agent and environment interact by choosing actions (predictions) depending on its current state (input data), and rewards are collected based on its predictions’ correctness. The agent then uses these rewards to update its Q-values and improve its prediction accuracy.

4.1 Setting up the experiment and parameters used

Firstly, the dataset was preprocessed and features were selected using the Kaggle Notebook. Then, the selected features were used by the model/code. The DDQN was performed in Python using the Kaggle Notebook T4 GPU, TensorFlow, and Keras framework version 2.13.0 and 2.6.0, respectively. The DQN model has specified three main components: a data class, a neural network class, and a DQN agent class. The data class is responsible for loading, formatting, and preprocessing the dataset. The neural network class is accountable for constructing, training, and evaluating the model, processing input data, and generating Q-values for different actions. The other class of DQN is in charge of the DQN algorithm that plays the game and learns from the rewards and penalties. The neural network feature requires a specific formation of the layers of the neural network; in this case, it has two total dense layers, which make the hidden layer. The model contains hidden size neurons in each Dense layer, and Rectified Linear Unit (ReLU) as an activation function that imparts nonlinearity in the model. The dataset’s features correspond to the neural network input layers, and the count of input features determines its shape. The model output consists of two Q-values, one for each possible action (regular or attack). There are two neurons in the output layer, one for each Q-value. The final layer employs sigmoid activation to normalize it to a probability distribution of the actions, assisting in the selection of the best Q-value action and quantifying the uncertainty of the model.

Several significant parameters were determined and examined during the training to obtain the optimum values that are adequate and perfect for the model. The exploration rate or Epsilon (𝜖) of 0.1 is selected to allow the execution exploration of the agent based on a certain amount of randomness with a 0.7 decay rate, which is used to decrease the exploration rate (𝜖) over time. The tables below provide the selected features and the values for other parameters, including batch size and discount factor (𝛾).

The following table illustrates the features that are crucial for attack detection (Figure 10).

Table 4 below outlines the parameters set for the environment creation, which will guide the Neural Network and DQN Agent in optimizing their performance for the given task.

Parameters	Values	Parameters	Values
Number_Episode	100	Minimum Epsilon	0
Hidden_layers	3	Gamma	0.001
Number_iteration	100	Decoy rate	0.99
Number_units	3 × 124	Learning_rate	0.001
Activation_function	ReLU	Batch-size	100
Initial Weight Value	Normal	Optimizer	Adam
Epsilon	1

Table 4.

List of selected features.

4.2 Metrics evaluation

The parametric for the measurement of the performance of our model is derived from the characteristics of the dataset and our performance indicators. First of all, if looking at the distribution of the dataset as shown in terms of percentages in Figure 11, we conclude that there is a major problem of class imbalance within our dataset.

Figure 11.
Class distribution of the dataset in terms of percentage.

The ratio assigned to the majority (System, Malware in our case) as compared to the rest (Benign in our case) is provided as follows:

ImbalanceRatio=134139902263160=5.927

This part builds the model’s learning needs using the confusion matrix, to illustrate the effectiveness of our model’s categorization and give the confusion matrix. The confusion matrix highlights those forecasts that were accurate, as well as those that were false. In machine learning models, the outcome of the predictions for the existence of malicious traffic has four possible outcomes.

The following defines the evaluation metrics:

True Positive (TP): The number of DDoS attacks that are correctly identified as attacks.

True Negative (TN): The number of legitimate network traffic instances correctly classified as normal.

False Positive (FP): The number of normal network traffic instances incorrectly classified as attacks.

False Negative (FN): The number of DDoS attacks that are mistakenly identified as normal traffic.

Our proposed model is also evaluated using several common metrics used in intrusion detection systems. Below are the formulas for accuracy, precision, recall, and F1-score:

Accuracy: Accuracy measures the overall percentage of correct predictions (both true positives and true negatives) in the IDS model. It indicates the model’s general performance and is calculated as:

AccuracyA=TP+TN/TP+TN+FP+FN.

TP = True Positive, TN = True Negative, FP = False Positive, and FN = False Negative.

Precision: Precision, also known as the positive predictive value, calculates the ratio of correctly identified attacks to all instances labeled as attacks. It reveals the accuracy of positive predictions:

PrecisionP=TP/FP+TP.

Recall: Also called the detection rate (DR) or true positive rate (TPR), recall measures the proportion of actual attacks that are correctly identified by the model. It indicates the effectiveness of detecting harmful events:

Recallr=TP/FN+TP

F1-score: The F1-score is a crucial metric that balances precision and recall, especially in situations where the data is imbalanced. It reflects the model’s performance by considering both false positives and false negatives.

F1−Score=2×Precision×Recall/Precision+Recall

Support Score: In the scikit-learn module, the support score is used to evaluate the frequency of each class in the dataset. It helps measure how often each classification is accurate.

Confusion Matrix: The confusion matrix is a key tool for assessing the model’s performance. After training and predicting with the model, the confusion matrix is calculated using the confusion matrix function from sklearn.metrics. This function compares the true labels (y test) and the predicted labels (y pred).

The resulting confusion matrix is visualized using a heatmap generated by the heatmap () function from the seaborn library. The heatmap shows the number of true positives, false positives, false negatives, and true negatives for each class.

The diagonal elements represent the number of correct predictions, while the off-diagonal elements highlight misclassifications.

4.3 Analyzing the results

A comprehensive analysis of the results obtained from the experiment is presented in Table 5, comparing the performance of traditional machine learning models with the proposed reinforcement learning model. This comparison is based on key evaluation metrics, including accuracy, precision, recall, and F1-score.

Metric	Random forest classifier	Decision tree classifier	XG boost classifier	Double deep Q-network
Accuracy	93.48%	94.39%	95.90%	94.68%
Precision	93.48%	94.40%	95.91%	94.13%
Recall	93.48%	94.39%	95.91%	94.13%
F1 - Score	94.47%	94.39%	94.91%	93.74%

Table 5.

Comparison of machine learning model and proposed reinforcement learning model.

4.3.1 Machine learning models

The following tables present the performance measures and classification reports of various machine learning models, including the Decision Tree Classifier, Random Forest Classifier, and XGBoost Classifier, along with their respective confusion matrices (Table 6). The binary classification results are shown in Table 7, while the performance details of the Decision Tree Classifier, Random Forest Classifier, and XGBoost Classifier are provided in Table 8–10), respectively.

Dataset files	Normal	Malicious/Attack
2/14/2018	663,808	380,943
2/15/2018	988,050	52,498
2/16/2018	446,772	601,802
2/20/2018	7,313,104	576,191
2/21/2018	360,833	687,742
2/22/2018	1,042,301	362
2/23/2018	1,042,301	566
2/28/2018	538,666	68,236
3/1/2018	235,778	92,403
3/2/2018	758,334	286,191

Table 6.

Binary classification results.

	Confusion matrix for binary classification
		Normal	Attacks	Total
Actual	Normal	2,676,761	2058	2,678,819
Actual	Attacks	34,180	514,438	548,618
Total		2,710,941	56,496	3,227,437

Table 7.

Confusion matrix for binary classification.

Decision Tree Classifier
Classes	Precision	Recall	F1-Score	Support
Normal	0.94	0.99	0.96	452,999
DDoS attack	1	0.93	0.96	57,238
DoS attack	0.78	1	0.88	76,189
Brute-force	0.99	1	0.99	187,405
Botnet	1	0.79	0.88	99,854
Infiltration	0.75	0.19	0.3	32,128
Web attack	0	0	0	185
Accuracy			0.94	905,998
Macro Avg	0.78	0.7	0.71	905,998
Weighted Avg	0.94	0.94	0.93	905,998

Table 8.

Performance measures of Decision Tree Classifier.

Random forest classifier
Classes	Precision	Recall	F1-Score	Support
Normal	0.94	0.99	0.96	452,999
DDoS attack	1	0.93	0.96	57,238
DoS attack	0.78	1	0.88	76,189
Brute-force	0.99	1	0.99	187,405
Botnet	1	0.79	0.88	99,854
Infiltration	0.75	0.19	0.3	32,128
Web attack	0	0	0	185
Accuracy			0.94	905,998
Macro Avg	0.78	0.7	0.71	905,998
Weighted Avg	0.94	0.94	0.93	905,998

Table 9.

Performance measures of random forest classifier.

XG Boost Classifier
Classes	Precision	Recall	F1-Score	Support
Normal	0.94	0.99	0.96	452,999
DDoS attack	1	0.93	0.96	57,238
DoS attack	0.78	1	0.88	76,189
Brute-force	0.99	1	0.99	187,405
Botnet	1	0.79	0.88	99,854
Infiltration	0.75	0.19	0.3	32,128
Web attack	0	0	0	185
Accuracy			0.94	905,998
Macro Avg	0.78	0.7	0.71	905,998
Weighted Avg	0.94	0.94	0.93	905,998

Table 10.

Performance measures of XGBoost Classifier.

4.4 DDQN results

This section presents the results and analysis of the DDQN model, highlighting its performance and effectiveness in the given task. Detailed insights and evaluations are provided based on the experimental outcomes.

4.4.1 DDQN reward and loss values

Figure 12 below shows the reward and loss values derived from discount factor values throughout the DQN training procedure. The complete episode reward on the top graph sums up all the rewards the DQN agent achieves in each episode. The reward is 1 for each correct action and − 1 for each incorrect action. The total reward by episode reflects how well the DQN agent performs on the task of network intrusion detection. The performance improves as the overall reward increases. The above graph of reward loss in Figure 12 illustrates the loss by episode, showing the mean squared error between the target Q-values and the predicted Q-values by the constructed neural network for each episode. The Q-values represent the future possible rewards of action and state expected. The loss by episode reflects how well Q-values are approximated by the neural network. The lower the loss, the better the approximation.

Figure 12.
Reward-Loss Graph by each episode.

4.4.2 DDQN analysis and results

The evaluation metrics of the proposed DDQN model using the CIC-IDS 2018 dataset are analyzed to assess its effectiveness in detecting network intrusions (Table 11). The test set score distribution across various classes of normal and attack traffic illustrates the model’s classification performance (Figure 13), while a detailed classification report highlights key metrics such as accuracy, precision, recall, and F1-score (Table 12).

	Estimated	Correct	Total	Accuracy
Benign	475,394	448,664	452,999	99.043044
Botnet	57,691	57,188	57,238	99.912645
Brute-force	86,079	72,787	76,189	95.534788
DDoS attack	188,281	186,994	187,405	99.780689
DoS attack	90,838	86,730	99,854	86.856811
Infiltration	7715	5441	32,128	16.935383
Web attack	0	0	185	0

Table 11.

Evaluation metrics for DDQN model.

Figure 13.
Distribution over six classes.

DDQN algorithm
Classes	Precision	Recall	F1-Score	Support
Normal	0.95	0.96	0.95	452,999
DDoS attack	0.99	1	0.99	57,238
DoS attack	0.76	1	0.86	76,189
Brute-force	0.98	1	0.99	187,405
Botnet	0.98	0.78	0.87	99,854
Infiltration	0.4	0.25	0.31	32,128
Web attack	0	0	0	185
Accuracy			0.93	905,998
Macro Avg	0.72	0.71	0.71	905,998
Weighted Avg	0.93	0.93	0.92	905,998

Table 12.

Classification report for DDQN model.

4.4.2.1 Confusion matrix for all models

The following presents the confusion matrices for all models, including Decision Tree, Random Forest, XGBoost, and DDQN (Figure 14).

Figure 14.
Confusion matrix for all models.

4.4.3 Summary of results

The DDQN model (which is the DRL model that presents the best results in this article) achieves almost equal or slightly better detection performance (accuracy, F1, precision, recall, etc.) as compared to other benchmark SOTA ML models such as Random Forest, Decision trees, XGBoost, etc. The results were achieved utilizing the CIC-IDS 2018 dataset, and the proposed model based on the DDQN is in the list of the best solution models.

Thus, the conclusion made is that it is our model that offers a more effective solution in a range of circumstances. Of particular importance is the observation of the recall metric in which DDQN performs exceptionally well since for an intrusion detection algorithm, false negatives must be kept to the barest. Besides showing that DRL models can be applied to intrusion detection problems with access to a labeled dataset, the classifiers that are derived (once they are trained) are a simple and fast neural network that can work efficiently on modern high-performance, or distributed computing environments like TensorFlow. For example, the DDQN model at the test time results in much smaller prediction times compared to the best SOTA models explored for this study.

Therefore, for the situation with the use of DRL models, one can conclude that their application to the described scenario with a set of labeled data highly depends on the choice of the value of the discount factor. This is quite a surprising and large finding, for it appears that when one does not engage with a live environment (which means that the feedback loop resulting from interactions with the environment due to the actions taken is severed), then in updating our policy function, we have to be more cautious, thus making the convergence slower but more stable.

5. Conclusions

Due to the enhancements in big data technologies, big data analytics are highly popular in many fields of computer science. As cybersecurity has become such an essential element in our lives today, the incorporation of these analytics in security solutions is unavoidable and rewarding. Besides, due to the fact that most of the cyberattacks are internal and can be easily camouflaged, relatively big data can be pulled out from the network by querying traffic data or system logs, etc [27]. As they become involved extensively in many learning tasks, neural networks are employed extensively as a solution to these problems (or part thereof). Thus, the usage of a learning mechanism is a must in order to handle zero-day attacks. Since today, almost all networks are exposed to threats from zero-day vulnerabilities, a solution that should be able to recognize that potential threats being of a previously unseen type is a must.

As a summary, the contributions of the paper are as follows: (1) New algorithm that can produce better results of intrusion detection than using the conventional machine learning and deep learning techniques. (2) Another such solution intrusion detection algorithm based on an extremely simple and fast policy network that is particularly suitable for driving ultra-fast-paced advanced applications in modern data networks. (3) The obtained model can be applied to online learning, which is required for data networks with changing conditions. Another idea that is new to this paper is in point (4) where DRL is applied to supervised learning. It is motivated by a rewards function which gives no necessity to be definitely derivable and can uniformly be used in all kinds of optimization. As a result, we perform a comparative analysis of a DRL algorithm (DDQN) and its possible implementation on a dataset containing intrusions, rather than in interaction with a live network environment. Additional analysis is provided, comparing this algorithm to several alternative machine learning models, considering three performance aspects: (1) the prediction scores, (2) the training time, and (3) the prediction time of the model using CIC-IDS 2018 Dataset.

The best DRL algorithm (DDQN) has a detection performance (measured by several performance metrics) that ranks similarly to or even higher than a full range of the state-of the-art ML models, including random forest, decision trees, XGboost for the same metrics such as accuracy, F1, precision, and recall.

Furthermore, it should be noted here that, as to DDQN and, in general, DRL methods, the advantage of significantly less prediction time is evident, making them suitable for online detection and new highly demanding network services types, such as IoT networks.

As future work, we hope to explore new DRL architecture and that is new DRL architectures for DRL specifically multi-agent and adversarial models for DRL which can be practical for intrusion detection systems. Furthermore, the minority classes in our dataset can also be expanded by providing CICFlowMeter with its own generated captures.

References

1. Haq N, Avishek M, Shah F, Onik A, Rafni M, Md D. Application of machine learning approaches in intrusion detection system: A survey. International Journal of Advanced Research in Artificial Intelligence. 2015;4:10. DOI: 10.14569/IJARAI.2015.040302
2. Mishra P, Varadharajan V, Tupakula U, Pilli E. A detailed investigation and analysis of using machine learning techniques for intrusion detection. IEEE Communications Surveys & Tutorials. 2018;21:1-1. DOI: 10.1109/COMST.2018.2847722
3. Nguyen H, A, Choi D. Application of data mining to network intrusion detection: Classifier selection model. In: Asia Pacific Network Operations and Management Symposium (APNOMS) 2008. Berlin, Heidelberg: Springer; 2008. pp. 399-408. Available from: https://dblp.org/db/conf/apnoms/index.html
4. Tan. Dimensions of cybersecurity risk management. In: Advances in Cybersecurity Management. Cham, Switzerland: Springer International Publishing; 2021. pp. 21-30. Available from: https://link.springer.com/book/10.1007/978-3-030-71381-2
5. Ahsan MK. Increasing the Predictive Potential of Machine Learning Models for Enhancing Cybersecurity [Thesis]. USA: North Dakota State University; 2021
6. Deshpande P, Sharma SC, Peddoju SK, Junaid S. HIDS: A host based intrusion detection system for cloud computing environment. International Journal of Systems Assurance Engineering and Management. 2018;9(3):567-576. DOI: 10.1007/s13198-014-0277-7
7. Maiero C, Miculan M. Unobservable intrusion detection based on call traces in Paravirtualized systems. Proceedings of the International Conference on Security and Cryptography. 2012:300-306. DOI: 10.5220/0003521003000306
8. Bharadwaja S, Sun W, Niamat M, Shen F. Collabra: A xen hypervisor based collaborative intrusion detection system. In: Proceedings - 2011 8th International Conference on Information Technology: New Generations, ITNG 2011. Piscataway, NJ, USA: IEEE; 2010. pp. 695-700. DOI: 10.1109/ITNG.2011.123
9. Mahmood T, Afzal U. Security analytics: Big data analytics for cybersecurity: A review of trends, techniques and tools. In: 2013 2nd National Conference on Information Assurance (NCIA), Rawalpindi. Vol. 2013. Piscataway, NJ, USA: IEEE; 2013. pp. 129-134. DOI: 10.1109/NCIA.2013.6725337
10. Nieles M, Dempsey K, Pillitteri V. Nist Special Publication 800-12 Revision 1 an Introduction to Information Security. Gaithersburg: National Institute of Standards and Technology; 2017
11. Casas P, Soro F, Vanerio J, Settanni G, D’Alconzo A. Network security and anomaly detection with big-dama, a big data analytics framework. In: Proceedings of 2017 IEEE 6th International Conference on Cloud Networking (CloudNet). Piscataway, NJ, USA: IEEE; 2017. pp. 1-7. DOI: 10.1109/CloudNet.2017.8071525
12. Alavizadeh H, Alavizadeh H, Jang-Jaccard J. Deep Q-learning based reinforcement learning approach for network intrusion detection. Computers. 2022;11(3):41
13. Nygard KE, Rastogi A, Ahsan M, Satyal R. Dimensions of cybersecurity risk management. In: Advances in Cybersecurity Management. Springer; 2021. pp. 369-395
14. Badr Y. Enabling intrusion detection systems with dueling double deep Q-learning. Digital Transformation and Society. 2022;1(1):115-141. DOI: 10.1108/DTS-05-2022-0016
15. Sharifi AM, Amirgholipour SK, Pourebrahimi A. Intrusion detection based on joint of K-means and KNN. Journal of Convergence Informa- tion Technology. 2015;10(5):42
16. Behera S, Pradhan A, Dash R. Deep neural network architecture for anomaly based intrusion detection system. In: 2018 5th International Conference on Signal Processing and Integrated Networks, SPIN 2018. Piscataway, NJ, USA: IEEE; 2018. pp. 270-274. DOI: 10.1109/SPIN.2018.8474162
17. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, et al. Playing Atari with deep reinforcement learning. arXiv. 2013;abs/1312.5602:1-9. Available from: http://arxiv.org/abs/1312.5602
18. Cuayhuitl H, Keizer S, Lemon O. Strategic dialogue management via deep reinforcement learning. arXiv. 2015;abs/1511.08099:1-10. Available from: http://arxiv.org/abs/1511.08099
19. Lai M. Giraffe: Using deep reinforcement learning to play chess. arXiv. 2015;abs/1509.01549. Available from: http://arxiv.org/abs/1509.01549
20. Wang L, Zhang D, Gao L, Song J, Guo L, Shen HT. MathDQN: Solving arithmetic word problems via deep reinforcement learning. In: 32nd AAAI Conference on Artificial Intelligence, AAAI 2018. Palo Alto, CA, USA: AAAI; 2018. pp. 5545-5552
21. Mahmud M, Kaiser MS, Hussain A, Vassanelli S. Appli- cations of deep learning and reinforcement learning to biological data. IEEE Transactions on Neural Networks and Learning Systems. 2018;29(6):2063-2079. DOI: 10.1109/TNNLS.2018.2790388
22. LaBar J, Chowdhury M, Jochen M, Kambhampaty K. Honeypots: Security by deceiving threats. In: The Midwest Instruction and Computing Symposium 2019. Host Institution; 2018
23. Yen T-F, Oprea A, Onarlioglu K, Leetham T, Robertson W, Juels A, et al. Beehive: Large-scale log analysis for detecting suspicious activity in enterprise networks. In: Proceedings of the 29th Annual Computer Security Applications Conference (ACSAC ‘13). New York, NY, USA: Association for Computing Machinery; 2013. pp. 199-208. DOI: 10.1145/2523649.2523670
24. Bilgisayar Ag˘ larında Saldırı Tespiti ic¸in Makine O^̈ g˘ renme Yo¨ ntemleri: Kars¸ılas¸tırmalı Bir Analiz - Scientific Figure on ResearchGate. Machine Learning Methods for Intrusion Detection in Computer Networks: A Comparative Analysis. Available from: https://www.researchgate.net/figure/Network-Topology-25_f ig1₃74694982 [Accessed: December, 17, 2024]
25. Chimphlee W, Chimphlee S. Network intrusion detector using multilayer perceptron (MLP) approach. Turkish Journal of Computer and Mathematics Education (TURCOMAT). 2022;13(03):488-499. DOI: 10.17762/turcomat.v13i03.13018
26. Short-Term Load Forecasting Method Based on Feature Preference Strategy and LightGBM-XGboost - Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/Schematic-illustration-of-the-XGboost-model_f ig2₃62100649 [Accessed: December, 17, 2024]
27. Ciaravino E, Chowdhury M, Jochen M, Kambhampaty K. Security issues of scada systems. In: The Midwest Instruction and Computing Symposium 2019. Host Institution; 2018

[1] 1. Haq N, Avishek M, Shah F, Onik A, Rafni M, Md D. Application of machine learning approaches in intrusion detection system: A survey. International Journal of Advanced Research in Artificial Intelligence. 2015;4:10. DOI: 10.14569/IJARAI.2015.040302

[2] 2. Mishra P, Varadharajan V, Tupakula U, Pilli E. A detailed investigation and analysis of using machine learning techniques for intrusion detection. IEEE Communications Surveys & Tutorials. 2018;21:1-1. DOI: 10.1109/COMST.2018.2847722

[3] 3. Nguyen H, A, Choi D. Application of data mining to network intrusion detection: Classifier selection model. In: Asia Pacific Network Operations and Management Symposium (APNOMS) 2008. Berlin, Heidelberg: Springer; 2008. pp. 399-408. Available from: https://dblp.org/db/conf/apnoms/index.html

[4] 4. Tan. Dimensions of cybersecurity risk management. In: Advances in Cybersecurity Management. Cham, Switzerland: Springer International Publishing; 2021. pp. 21-30. Available from: https://link.springer.com/book/10.1007/978-3-030-71381-2

[5] 5. Ahsan MK. Increasing the Predictive Potential of Machine Learning Models for Enhancing Cybersecurity [Thesis]. USA: North Dakota State University; 2021

[6] 6. Deshpande P, Sharma SC, Peddoju SK, Junaid S. HIDS: A host based intrusion detection system for cloud computing environment. International Journal of Systems Assurance Engineering and Management. 2018;9(3):567-576. DOI: 10.1007/s13198-014-0277-7

[7] 7. Maiero C, Miculan M. Unobservable intrusion detection based on call traces in Paravirtualized systems. Proceedings of the International Conference on Security and Cryptography. 2012:300-306. DOI: 10.5220/0003521003000306

[8] 8. Bharadwaja S, Sun W, Niamat M, Shen F. Collabra: A xen hypervisor based collaborative intrusion detection system. In: Proceedings - 2011 8th International Conference on Information Technology: New Generations, ITNG 2011. Piscataway, NJ, USA: IEEE; 2010. pp. 695-700. DOI: 10.1109/ITNG.2011.123

[9] 9. Mahmood T, Afzal U. Security analytics: Big data analytics for cybersecurity: A review of trends, techniques and tools. In: 2013 2nd National Conference on Information Assurance (NCIA), Rawalpindi. Vol. 2013. Piscataway, NJ, USA: IEEE; 2013. pp. 129-134. DOI: 10.1109/NCIA.2013.6725337

[10] 10. Nieles M, Dempsey K, Pillitteri V. Nist Special Publication 800-12 Revision 1 an Introduction to Information Security. Gaithersburg: National Institute of Standards and Technology; 2017

[11] 11. Casas P, Soro F, Vanerio J, Settanni G, D’Alconzo A. Network security and anomaly detection with big-dama, a big data analytics framework. In: Proceedings of 2017 IEEE 6th International Conference on Cloud Networking (CloudNet). Piscataway, NJ, USA: IEEE; 2017. pp. 1-7. DOI: 10.1109/CloudNet.2017.8071525

[12] 12. Alavizadeh H, Alavizadeh H, Jang-Jaccard J. Deep Q-learning based reinforcement learning approach for network intrusion detection. Computers. 2022;11(3):41

[13] 13. Nygard KE, Rastogi A, Ahsan M, Satyal R. Dimensions of cybersecurity risk management. In: Advances in Cybersecurity Management. Springer; 2021. pp. 369-395

[14] 14. Badr Y. Enabling intrusion detection systems with dueling double deep Q-learning. Digital Transformation and Society. 2022;1(1):115-141. DOI: 10.1108/DTS-05-2022-0016

[15] 15. Sharifi AM, Amirgholipour SK, Pourebrahimi A. Intrusion detection based on joint of K-means and KNN. Journal of Convergence Informa- tion Technology. 2015;10(5):42

[16] 16. Behera S, Pradhan A, Dash R. Deep neural network architecture for anomaly based intrusion detection system. In: 2018 5th International Conference on Signal Processing and Integrated Networks, SPIN 2018. Piscataway, NJ, USA: IEEE; 2018. pp. 270-274. DOI: 10.1109/SPIN.2018.8474162

[17] 17. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, et al. Playing Atari with deep reinforcement learning. arXiv. 2013;abs/1312.5602:1-9. Available from: http://arxiv.org/abs/1312.5602

[18] 18. Cuayhuitl H, Keizer S, Lemon O. Strategic dialogue management via deep reinforcement learning. arXiv. 2015;abs/1511.08099:1-10. Available from: http://arxiv.org/abs/1511.08099

[19] 19. Lai M. Giraffe: Using deep reinforcement learning to play chess. arXiv. 2015;abs/1509.01549. Available from: http://arxiv.org/abs/1509.01549

[20] 20. Wang L, Zhang D, Gao L, Song J, Guo L, Shen HT. MathDQN: Solving arithmetic word problems via deep reinforcement learning. In: 32nd AAAI Conference on Artificial Intelligence, AAAI 2018. Palo Alto, CA, USA: AAAI; 2018. pp. 5545-5552

[21] 21. Mahmud M, Kaiser MS, Hussain A, Vassanelli S. Appli- cations of deep learning and reinforcement learning to biological data. IEEE Transactions on Neural Networks and Learning Systems. 2018;29(6):2063-2079. DOI: 10.1109/TNNLS.2018.2790388

[22] 22. LaBar J, Chowdhury M, Jochen M, Kambhampaty K. Honeypots: Security by deceiving threats. In: The Midwest Instruction and Computing Symposium 2019. Host Institution; 2018

[23] 23. Yen T-F, Oprea A, Onarlioglu K, Leetham T, Robertson W, Juels A, et al. Beehive: Large-scale log analysis for detecting suspicious activity in enterprise networks. In: Proceedings of the 29th Annual Computer Security Applications Conference (ACSAC ‘13). New York, NY, USA: Association for Computing Machinery; 2013. pp. 199-208. DOI: 10.1145/2523649.2523670

[24] 24. Bilgisayar Ag˘ larında Saldırı Tespiti ic¸in Makine O^̈ g˘ renme Yo¨ ntemleri: Kars¸ılas¸tırmalı Bir Analiz - Scientific Figure on ResearchGate. Machine Learning Methods for Intrusion Detection in Computer Networks: A Comparative Analysis. Available from: https://www.researchgate.net/figure/Network-Topology-25_f ig1₃74694982 [Accessed: December, 17, 2024]

[25] 25. Chimphlee W, Chimphlee S. Network intrusion detector using multilayer perceptron (MLP) approach. Turkish Journal of Computer and Mathematics Education (TURCOMAT). 2022;13(03):488-499. DOI: 10.17762/turcomat.v13i03.13018

[26] 26. Short-Term Load Forecasting Method Based on Feature Preference Strategy and LightGBM-XGboost - Scientific Figure on ResearchGate. Available from: https://www.researchgate.net/figure/Schematic-illustration-of-the-XGboost-model_f ig2₃62100649 [Accessed: December, 17, 2024]

[27] 27. Ciaravino E, Chowdhury M, Jochen M, Kambhampaty K. Security issues of scada systems. In: The Midwest Instruction and Computing Symposium 2019. Host Institution; 2018

Smart Detection: Reinforcement Learning for Network Intrusion Defense

Mastering Intrusion Detection for Cybersecurity [Working Title]

Abstract

Keywords

Author Information

Faheem Yar Khuhawar *

Tayyaba Shaikh

Abdul Latif Memon

Irfan Halepoto

Fahim Umrani

Rizwan Ali Shah

Hyder Bux Mangrio

Omar Bani Fayyad