Abstract.
The DDoS attack is one of the most powerful hacking techniques over the internet. The base weapon that the hacker uses during these types of attacks is network trafficking that takes down or crashes the websites.
There are various classifications of this attack. Each category defines the way a hacker tries to intrude into the network.
In this research, discussed is an approach to detect the DDoS attack threat through A.I. model with over 96% accuracy.
Classified 7 different subcategories of DDoS threat along with a safe or healthy network are explained.
Introduction
Distributed Denial-of-Service(DDoS) attacks target websites and online services. The objective of this attack is to jam the network or server with overwhelming traffic. It achieves effectiveness by utilizing multiple compromised systems as sources of attack traffic.
There are different subcategories of DDoS attacks based on the level of the network connection they attempt to attack, with respect to the OSI model. Some of the subcategories that SecureLayer7 has classified through their research are:
SYN Flood, UDP Flood, MSSQL, LDAP, Portmap, NetBIOS.
Machine Learning and Deep Learning are some of the most common backbones of A.I. to date. SecureLayer7 makes use of these methodologies to solve problems in various domains, with accuracy that is closer to human performance. Once again SecureLayer7 has tested the limits of A.I. in detecting threats in the domain of cybersecurity through this research.
In this research, SecureLayer7 has done a thorough analysis of the logs generated during a DDOS attack, made use of supervised and unsupervised techniques for detection of the threat, and finally used deep learning to achieve over 96% of accuracy for classification of different types of DDoS threats along with the safe connection.
Data Pre-Processing
Processing the data was one of the first challenges faced. The data had 88 attributes or features. Processing such huge data within limited RAM memory was a really challenging task. So SecureLayer7 downgraded the data type of the attributes, hence reducing the memory usage of the data frame. Data-types of float64 is downgraded to float32, int64 to int32, int32 to uint32 and so on. SecureLayer7 successfully reduced almost 42% of the initial size. The data frame still had attributes or features with the maximum value close to infinite, so we also handled that data in the pre-processing stage.
Distribution of Target Features
As it can be seen that SecureLayer7 tried to maintain an even distribution of the target features along with the dataset.
Even though UDPLag is a bit unevenly distributed with respect to others, but, SecureLayer7 has still aptly handled this case later in this research.
Exploratory data analysis
In the above two analysis, it could clearly be observed that there is drift in the flow of bits and flow of packets during a DDoS attack compared to a Benign or Safe connection.
SecureLayer7 has also analyzed the distribution of each type of threat within each type of protocol, and inbound. Below are the charts displaying the analysis for the same.
Unsupervised approach for detection of threat.
In the Unsupervised approach, SecureLayer7 do not let the model learn through the target variables, rather they force the algorithm to learn from input data itself and discover patterns and information on its own.
Pre-processing before training. SecureLayer7 has removed some of the features from our data like Flow ID’, ‘ Source IP’, ‘ Source Port’,’ Destination IP’, ‘ Destination Port’,’ Timestamp’,’ Flow Packets/s’, ‘Flow Bytes/s’. ‘Flow plackets/s’ and ‘Flow Bytes/s’ were removed because after standard scaling, these features transformed into values too large for float64 and NaN values.
They scaled the data through standard scaling and followed by normalization. The use of principal component analysis for dimension reduction and reduced the dimension into two-dimensional data was done.
So, from the above two visualizations, it was clearly observed that SecureLayer7’s algorithm could successfully cluster out the different threats from the data to some extent.
Let’s have a look at SecureLayer7’s unsupervised model could label the generated clusters.
Well, it looks like SecureLayer7’s unsupervised model successfully found the pattern in the data and could segment out the target variable on its own to some extent.
Supervised approach for detection of threat.
It is just opposite to the unsupervised approach, here SecureLayer7, let the model learn through the target variable which further helps the model to learn the pattern from the data through the target labels. They applied the same pre-processing of data as done for the unsupervised approach.
In this case, they have used deep learning to train the model.
Structure of our DL model
As the target variable was imbalanced so, SecureLayer7 used Stratified K-fold to train and validate the data over each fold. This balances the distribution of training and validation with respect to a desired unbalanced feature.
SecureLayer7 has used Adam as the base optimizer and ROC_AUC score to evaluate the performance of the model. ROC_AUC score Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
SecureLayer7 has trained and validated the model over 10 folds and have achieved ROC_AUC score of 96% and above over an average for detection of threat and achieved the highest accuracy 97% and above.
Classification report from one of the 10 folds
Precision | Recall | f1-score | support | |
BENIGN | 0.98 | 0.99 | 0.98 | 519 |
Portmap | 0.94 | 0.97 | 0.96 | 500 |
NetBIOS | 0.86 | 0.85 | 0.86 | 500 |
LDAP | 0.60 | 0.90 | 0.72 | 500 |
UDP | 0.91 | 0.10 | 0.18 | 500 |
Syn | 0.99 | 0.99 | 0.99 | 500 |
MSSQL | 0.66 | 0.75 | 0.70 | 500 |
UDPLag | 0.55 | 0.88 | 0.67 | 188 |
accuracy | 0.80 | 3707 | ||
macro avg | 0.81 | 0.80 | 0.76 | 3707 |
weighted avg | 0.83 | 0.80 | 0.77 | 3707 |
Accuracy for classifying each class from one of the 10 folds
Accuracy | |
BENIGN | 0.97687861 |
Portmap | 0.206 |
NetBIOS | 0.96 |
LDAP | 0.862 |
UDP | 0.138 |
Syn | 0.982 |
MSSQL | 0.498 |
UDPLag | 0.91489362 |
Conclusion
- Be it supervised or unsupervised SecureLayer7 could beat the threat beforehand through A.I.
- Yes, supervised way does have an upper hand over unsupervised but still, the performance was still remarkable with respect to unlabeled data.
- Even if you have very little labelled data compared to unlabeled data in a real-life scenario, there are techniques like semi-supervised learning and self-supervised learning to achieve remarkable performance.
- Model fairness indicator is also one of the TensorFlow tools that could also be used for better model evaluation and performance scaling.
Credits
- The UNIVERSITY OF New BRUNSWICK, for making the dataset available.
- TensorFlow, Scikit Learn, Matplotlib, for tools that have been used in this entire research.
- SecureLayer7 Team, for supporting this research.