IOT Network Intrusion Detection Analysis
INFO 526 - Fall 2024 - Project 01
Abstract
The rapid adoption of Internet of Things (IoT) devices has brought unprecedented connectivity but also significant cybersecurity challenges. This project analyzes IoT network traffic using the RT-IoT 2022 dataset to uncover patterns in attack behavior, focusing on protocol vulnerabilities, service exploitation, and resource usage. The dataset captures traffic from various IoT devices under normal and attack conditions, including DOS, DDoS, and Brute-Force attacks. Through exploratory data analysis (EDA) and advanced visualizations, distinct patterns in bandwidth, inter-arrival times, and packet metrics were identified, enabling differentiation between normal and malicious traffic. Key findings revealed TCP and UDP as the most exploited protocols, with services like MQTT frequently targeted. Correlation analysis and interactive animations further highlighted variable relationships and temporal trends. These insights contribute to the development of robust intrusion detection systems tailored to IoT networks, addressing critical vulnerabilities and enhancing network security.
Introduction and Data
Introduction
The Internet of Things (IoT) has changed modern technology by connecting devices and automating tasks in areas like healthcare, smart homes, and industry. However, the growing use of IoT has also made these networks a target for cyberattacks. IoT networks are especially at risk because devices often have limited computing power, lack strong security measures, and use different types of network setups. To tackle these challenges, it is important to understand how these networks behave and identify patterns linked to malicious activities.
This project focuses on studying and visualizing IoT network traffic using the RT-IoT2022 dataset. This dataset is specifically designed to capture real-world IoT network behaviors. It includes data from devices like ThingSpeak-LED, Wipro Bulb, and MQTT-Temp, as well as attack simulations such as Brute-Force SSH attacks, DDoS attacks using tools like Hping and Slowloris, and Nmap scans. The dataset provides a detailed view of both normal and malicious activities, recorded using the Zeek network monitoring tool and analyzed through the Flowmeter plugin for detailed traffic information.
The primary goal of this study is to uncover unique patterns and anomalies in metrics such as protocol usage, bandwidth, payload size, and inter-arrival times that distinguish various attack types. This analysis will address the following research questions:
Protocol Vulnerabilities: Which protocols, services, and port numbers are most exploited during different types of attacks?
Attack Characteristics: How do attacks differ in terms of bandwidth, payload, inter-arrival time, and flow patterns? Are these distinctions reliable for detecting specific attack types?
Dimensional Interactions: Which combinations of variables most significantly impact network behavior during attacks?
By answering these questions, this project aims to contribute to the development of robust intrusion detection systems (IDS) for IoT networks, enhancing their ability to identify and mitigate cyber threats.
This work is based on the RT-IoT2022 dataset, sourced from the UCI Machine Learning Repository, and is aligned with the methodologies outlined in the foundational paper, Quantized Autoencoder (QAE) Intrusion Detection System for Anomaly Detection in Resource-Constrained IoT Devices Using RT-IoT2022 Dataset by B. S. Sharmila and Rohini Nagapadma, published in Cybersecurity (2023).
Dataset Description
The RT-IoT 2022 dataset comprises 123,117 rows and 77 columns, capturing detailed network traffic flow data. This data represents both normal and malicious activities across various IoT devices, providing insights into real-world IoT network behaviors.
The dataset was collected using a real-time IoT infrastructure that included devices such as ThingSpeak-LED, Wipro Bulb, MQTT-Temp, and Amazon Alexa. To simulate real-world scenarios, both normal network traffic and various attack patterns were recorded. Attack scenarios included Brute-Force SSH attacks (executed using Metasploit), DDoS attacks (generated using tools like Hping and Slowloris), and network scans (conducted with Nmap).
Network traffic was captured using the Zeek network monitoring tool, which recorded bidirectional traffic flows. The data was then processed using the Flowmeter plugin, which extracted key flow attributes and converted the raw packet captures into a structured tabular format. This preprocessing ensured that the dataset provides a comprehensive view of network behavior, enabling in-depth analysis of IoT cybersecurity threats.
Key features of the dataset include:
Network Identifiers: Columns like id.orig_p (originating port) and id.resp_p (responding port) capture network communication endpoints.
Protocol and Service Information: The proto column records the communication protocol, while the service column indicates services like MQTT.
Traffic Statistics: Metrics such as packet counts (fwd_pkts_tot, bwd_pkts_tot), rates (fwd_pkts_per_sec), and header sizes provide insights into traffic behavior.
Flow Characteristics: Features like flow_duration and flags (flow_FIN_flag_count, flow_ACK_flag_count, etc.) describe the nature of network communication.
Attack Type: The Attack_type column labels the detected attack (e.g., MQTT_Publish or DDoS Slowloris).
Payload Information: Columns like fwd_pkts_payload.avg and fwd_pkts_payload.max capture packet size details.
Bandwidth Information: Variables such as fwd_pkts_tot and bwd_pkts_per_sec help analyze data flow during attacks.
Inter-arrival Time Information: Columns like fwd_iat.avg and flow_iat.min measure the time gap between packets.
Idle vs. In-use Information: Metrics like active.avg and idle.avg indicates whether IoT devices are actively transmitting data.
The RT-IoT 2022 dataset is already cleaned and ready for analysis. There are no missing values, and all the data is complete and consistent. The creators of the dataset have already handled any necessary preprocessing, such as removing irrelevant features and converting categorical data into numerical form. This makes the dataset easy to use without any extra cleaning steps needed.
Ethical Considerations
The dataset does not include sensitive information, ensuring minimal ethical concerns. It is suitable for academic research and model development for intrusion detection systems.
Exploratory Data Analysis (EDA)
Overview
The EDA focuses on visually exploring the RT-IoT 2022 dataset to uncover patterns and relationships relevant to IoT attack behaviors. Visualizations highlight the distribution of attack types, protocol and service usage, and key metrics like bandwidth and inter-arrival times.
Key Findings
Attack Distribution: A pie chart shows DOS_SYN_Hping accounts for 76.9% of all attacks, making it the most prevalent. This provides insights into the dominant threats in IoT networks.
Protocol Usage: A ranked bar chart reveals TCP and UDP as the most exploited protocols across different attack types, indicating potential areas of vulnerability.
Service Distribution: Stacked bar charts highlight MQTT and HTTP as common targets for attacks, demonstrating service-specific trends.
Variable Relationships: Heatmaps reveal strong correlations between forward and backward packet totals, while boxplots highlight variations in payload size and inter-arrival times, showing attack-specific traffic patterns.
Visual Focus
The use of clear, engaging visualizations ensures that insights are quickly understood, emphasizing the relationships between variables and their relevance to IoT network behaviors. This approach minimizes reliance on extensive text and maximizes reader engagement.
Methodology
Protocol, Service, and Port Analysis (Question 1)
This analysis examined the distribution and characteristics of network attacks across protocols, services, and ports to identify trends and vulnerabilities. The dataset was grouped by attack types, services, and protocols, and logarithmic scaling was applied to handle wide-ranging attack counts for improved interpretability.
Visualization Techniques:
Pie Chart: Illustrated the proportional distribution of attack types, highlighting DOS_SYN_Hping as the dominant attack (76.9%).
Attack Type Distribution Ranked Bar Chart: Grouped attack types by protocols (TCP, UDP, ICMP) to demonstrate protocol dominance across attacks.
Ranked Bar Chart of Protocols Stacked Bar Chart: Showed service distributions within each attack type for a clear view of overall and individual contributions.
Service Distribution by Attack Type
Tools: Python libraries, including Matplotlib and Seaborn, enabled precise, customizable visualizations to effectively address research questions on protocol and service vulnerabilities.
Advanced Analysis Using Animated Visualizations (Question 2)
This section focused on analyzing attack characteristics and bandwidth usage through interactive animations. Logarithmic transformations of bandwidth and flow duration ensured better interpretability of data spanning multiple scales.
Visualization Techniques:
Ridge Plot Animation: Highlighted bandwidth distribution variations across attack types dynamically.
Scatter Plot Animation: Explored payload vs. flow duration relationships to uncover unique attack patterns.
Bar Chart Race Animation: Showcased bandwidth dominance by attack types over time.
Time-Series Animation: Captured temporal bandwidth variations during attacks.
Tools: R Shiny dashboards enhanced interactivity, while ggplot2 and gganimate provided dynamic, engaging animations for deeper insights into attack behaviors.
Justification: The animations effectively communicated temporal and behavioral patterns, aligning with research goals to analyze attack characteristics and resource usage trends.
Dimensional Interactions in Attack Behavior (Question 3)
This analysis investigated variable interactions influencing IoT network behavior during attacks, focusing on metrics like payload, inter-arrival times, and packet flows. Correlation analysis identified significant relationships among key variables.
Visualization Techniques:
Correlation Heatmap: Highlights variable relationships for attack types, such as strong correlations between forward and backward packet totals, providing insights into interaction patterns.
Boxplot - Forward Packets Payload: Shows variations in average packet payload sizes across attacks, revealing differences in data transfer behavior.
Boxplot - Forward Inter-Arrival Times: Displays inter-arrival time distributions, capturing temporal variations between different attack types.
Boxplot - Flow Packets per Second: Illustrates bandwidth usage by analyzing flow packets per second, emphasizing resource variations across attacks.
Tools: ggplot2 enabled detailed visualizations, while correlation matrices provided insights into critical variable interactions.
Justification: The chosen visual methods effectively addressed how variable combinations impact network behavior and temporal attack patterns, aiding in intrusion detection system improvements.
Results
Findings from Protocol Vulnerabilities (Question 1)
Protocol Dominance: TCP and UDP emerged as the most exploited protocols across attack types, as demonstrated by the ranked bar chart. DOS_SYN_Hping dominated with 76.9% of all attacks, clearly visible in the pie chart.
Service Distribution: The stacked bar chart revealed significant usage of MQTT and HTTP services in attacks, with clear distinctions across various attack types, supporting insights into protocol vulnerabilities.
Findings from Attack Characteristics (Question 2)
Bandwidth Distribution: Ridge plot animations showed DOS_SYN_Hping attacks had the highest bandwidth usage, with other attacks like DDoS Slowloris showing distinct but lower patterns.
Temporal Patterns: Scatter plot and time-series animations illustrated how flow duration and bandwidth vary dynamically for each attack type, uncovering resource consumption trends over time.
Findings from Dimensional Interactions (Question 3)
Variable Relationships: Correlation heatmaps identified strong positive relationships, particularly between forward and backward packet totals, emphasizing their impact on network behavior.
Resource Utilization: Boxplots highlighted that ARP poisoning and DDoS Slowloris showed higher forward inter-arrival times and bandwidth variations, pointing to their unique network signatures.
Discussion
Summary of Insights
The analysis provided critical insights into IoT network attack patterns. Key findings revealed that TCP and UDP are the most exploited protocols, with attacks like DOS_SYN_Hping dominating bandwidth usage and network traffic. Service-level patterns showed MQTT and HTTP as frequent targets, reflecting vulnerabilities in widely used IoT protocols. Temporal and behavioral variations, captured through visualizations, highlighted resource utilization trends and attack-specific characteristics such as bandwidth, payload, and inter-arrival times.
Implications
These findings emphasize the need for targeted security measures focusing on frequently exploited protocols and services. The distinct patterns identified in inter-arrival times and packet behaviors can help enhance intrusion detection systems, enabling them to recognize attack signatures more effectively.
Limitations
The dataset, while comprehensive, is limited to specific IoT devices and attack types, which may not fully represent broader IoT network environments. Additionally, while logarithmic scaling and animations improved interpretability, they may obscure subtle variations in smaller attack patterns. The lack of real-time data or expanded device types limits the generalizability of the findings.
Suggestions for Improvement
Future analyses could benefit from integrating more diverse datasets to capture a wider range of devices and attack types. Improving real-time data collection methods and incorporating additional metrics, such as encrypted traffic behavior, could enhance analysis depth. Exploring advanced machine learning techniques for automated anomaly detection based on these visualized patterns would also be valuable.
Future Directions
Future work should focus on real-time visualization dashboards to monitor evolving attack patterns dynamically. Expanding the scope to include encrypted protocols and large-scale IoT ecosystems could provide more comprehensive insights. Additionally, integrating these findings with machine learning-based models could lead to adaptive intrusion detection systems tailored to complex IoT environments.