Introduction
The orientation of emerging technologies on the Internet is moving toward decentralization. Botnets have always been one of the biggest threats to Internet security, and botmasters have adopted the robust concept of decentralization to develop and improve peer-to-peer (P2P) botnet tactics. This makes the botnets cleverer and more artful, although bots under the same botnet have symmetrical behaviour, which is what makes them detectable.
However, the literature indicates that the last decade has lacked research that explores new behavioural characteristics that could be used to identify P2P botnets. To address this gap, this study proposes two new methods to detect P2P botnets:
-
Exploring Behavioural Characteristics: We explored a new set of behavioural characteristics based on network traffic flow analyses that allow network administrators to more easily recognise a botnet’s presence.
-
Machine/Deep Learning-based Detection: We developed a new anomaly detection approach by adopting machine-learning (ML) and deep-learning (DL) techniques that have not yet been leveraged to detect P2P botnets using only the five-tuple static indicators as selected features.
The experimental analyses revealed new and important behavioural characteristics that can be used as Indicators of Compromise (IoCs) to identify P2P botnets. Additionally, the experimental results for the detection approach showed a high detection accuracy of 99.99% with no false alarms.
Behavioural Characteristics of P2P Botnets
Typically, the main part of a botnet is the command-and-control (C&C) channel. When we analysed network traffic, the behavioural indicators of C&C were also analysed. There may be some common features among the bots in network traffic, such as when botmasters are directly or indirectly informed about botnet detection or analysis activities. In addition, botmasters are required to periodically update the bots, which forces them to find a means of communication that, in the end, will be evidence of their presence. This kind of bot activity makes them recognisable and detectable.
However, large-scale networks with extensive Internet bandwidth and administrative restrictions make it harder to monitor the whole network and accurately detect intrusions. Thus, this paper presents a new set of behavioural characteristics that can be used as IoCs to recognise the presence of P2P botnets in a network environment.
Flow-based Behavioural Indicators
Unlike packet-based analysis, the behaviour level is related to higher-level features that are extracted from the traffic flow in order to help the network administrator recognise P2P botnets. In this study, we categorised the behavioural characteristics into two types:
-
Packets per Flow (PPF): The PPF refers to how many packets uniquely represent a single flow. The analysis revealed that the greatest numbers of packets were transmitted (Tx packets) and received (Rx packets) by the botmaster IP and the infected machines.
-
Bytes per Packet (BPP): The BPP revealed that the volume of data (Tx bytes, Rx bytes) sent to/from the botmaster was the greatest, followed by that to/from the infected machines.
These flow-based indicators can work with encrypted traffic because they do not rely on the packet payload.
Deviations from Standard Behavioural Indicators
The analysis of deviations from standard behavioural indicators is also known as protocol-based analysis. This analysis is based directly on the packet’s payload and has a low false-positive rate compared to other analyses.
The experimental findings showed deviations in two network layers: the transport layer (UDP protocol) and the application layer (HTTP protocol).
Deviations in the Transport Layer (UDP):
The botnet utilized the UDP protocol as the main carrier channel to infect computers. Compared to other protocols, UDP accomplishes this process in a simple fashion: it sends packets directly to a target computer without establishing a connection first and indicates the order of said packets or checks whether they have arrived as intended, unlike the TCP protocol, which completely relies on a handshaking-style connection.
Deviations in the Application Layer (HTTP):
Botmasters of P2P botnets might publish the commands on a certain website to update the bots. This process continues regularly at intervals predefined by the botmasters. With the HTTP protocol, bots hide their communication flows within the normal HTTP flows, making them stealthy and difficult to detect. Monitoring and inspecting HTTP packets can reveal valuable information that can help network administrators analyse botnets’ behaviour better and, ultimately, detect their presence in the network.
P2P Botnet Detection using ML/DL Techniques
The rapid extension rates for network bandwidth are one of the most significant challenges for botnet detection systems. Thus, one of the critical assessment norms for Intrusion Detection System (IDS) researchers is assessing the processing capability of IDSs. The well-known IDSs, such as Bro and Snort, nowadays consume large amounts of resources when they process a large amount of payload data over a high-speed network.
The orientation of the research shows the effectiveness of data mining and the adaptation of ML/DL techniques for detecting botnets. For many reasons, such as the growing sizes of payload information streaming on the network and increasing network speeds, solutions that rely on learning-based techniques are preferable because these techniques can automate the processing of huge amounts of data.
In this study, we experimentally examined two ML and DL techniques (NBTree and MLP) that have not previously been evaluated for the detection of P2P botnets using only the five-tuple features (source and destination IP addresses, source and destination port numbers, and protocol identifier number) as selected features.
The proposed approach consists of three major stages:
-
Data Preparation: This stage involves preparing the selected dataset for the next stages through various steps that make it readable by the ML and DL algorithms.
-
Feature Selection: We considered the five-tuple features as the selected features for the detection of the P2P botnets.
-
ML/DL-based Detection: We used NBTree as an ML classifier and MLP as a DL classifier to detect the P2P botnets.
The experimental results showed that the proposed approach using NBTree and MLP achieved higher detection accuracy compared to the related works by using only the five-tuple features. Specifically, NBTree achieved a detection accuracy of 99.99%, and MLP achieved a detection accuracy of 99.86%. The proposed approach outperformed the related works in terms of standard evaluation metrics, such as recall, precision, F-score, and false-positive rate.
Conclusion and Future Work
In this paper, we proposed two methods to detect P2P botnets. The first method explored a new set of behavioural characteristics as IoCs to recognise the presence of P2P botnets in a network environment. The second method proposed a new anomaly detection approach using ML/DL techniques that have not yet been leveraged to detect P2P botnets.
The experimental results showed that these two methods are efficient security countermeasures to recognise and detect the P2P botnets. The explored behavioural characteristics could be adopted as IoCs to detect P2P botnets, and the proposed ML/DL-based detection approach achieved high accuracy using only the five-tuple features.
To build upon this study, potential extensions of this research include:
- Dynamic Analysis Integration: Incorporate dynamic indicators analysis techniques alongside adopting the static indicators to create a hybrid detection approach.
- Feature Engineering Enhancement: Investigate more sophisticated feature selection or feature ranking techniques to identify the most relevant indicators for ML/DL techniques.
- Real-time Detection and Response: Optimize the detection systems/approaches/models/solutions for real-time operation and allowing for immediate response to emerging threats.
By continuing to explore innovative approaches to detect and mitigate P2P botnets, we can enhance the security and resilience of internet-connected systems and protect against these persistent threats.