IOT Network Intrusion Detection Analysis

Proposal

Analysis and Visualisation of various types of attacks in IOT network
Author
Affiliation

ViZZards

School of Information, University of Arizona

if (!require(pacman)) 
  install.packages("pacman")

# use this line for installing/loading
pacman::p_load(tidyverse,
               glue,
               scales,
               ggplot2,
               countdown,
               grid,
               readr,
               dplyr) 

Dataset

data <- read_csv("data/data.csv")

glimpse(data)
Rows: 123,117
Columns: 77
$ id.orig_p                <dbl> 38667, 51143, 44761, 60893, 51087, 48579, 540…
$ id.resp_p                <dbl> 1883, 1883, 1883, 1883, 1883, 1883, 1883, 188…
$ proto                    <chr> "tcp", "tcp", "tcp", "tcp", "tcp", "tcp", "tc…
$ service                  <chr> "mqtt", "mqtt", "mqtt", "mqtt", "mqtt", "mqtt…
$ flow_duration            <dbl> 32.01160, 31.88358, 32.12405, 31.96106, 31.90…
$ fwd_pkts_tot             <dbl> 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, 9, …
$ bwd_pkts_tot             <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, …
$ fwd_data_pkts_tot        <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, …
$ bwd_data_pkts_tot        <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, …
$ fwd_pkts_per_sec         <dbl> 0.281148, 0.282277, 0.280164, 0.281593, 0.282…
$ bwd_pkts_per_sec         <dbl> 0.156193, 0.156821, 0.155647, 0.156440, 0.156…
$ flow_pkts_per_sec        <dbl> 0.437341, 0.439097, 0.435811, 0.438033, 0.438…
$ down_up_ratio            <dbl> 0.555556, 0.555556, 0.555556, 0.555556, 0.555…
$ fwd_header_size_tot      <dbl> 296, 296, 296, 296, 296, 296, 296, 296, 296, …
$ fwd_header_size_min      <dbl> 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 3…
$ fwd_header_size_max      <dbl> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 4…
$ bwd_header_size_tot      <dbl> 168, 168, 168, 168, 168, 168, 168, 168, 168, …
$ bwd_header_size_min      <dbl> 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 32, 3…
$ bwd_header_size_max      <dbl> 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 40, 4…
$ flow_FIN_flag_count      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ flow_SYN_flag_count      <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
$ flow_RST_flag_count      <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ fwd_PSH_flag_count       <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, …
$ bwd_PSH_flag_count       <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, …
$ flow_ACK_flag_count      <dbl> 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13, 1…
$ fwd_URG_flag_count       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ fwd_pkts_payload.min     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ fwd_pkts_payload.max     <dbl> 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 33, 3…
$ fwd_pkts_payload.avg     <dbl> 8.444444, 8.444444, 8.222222, 8.222222, 8.444…
$ fwd_pkts_payload.std     <dbl> 13.11594, 13.11594, 12.85280, 12.85280, 13.11…
$ bwd_pkts_payload.min     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ bwd_pkts_payload.max     <dbl> 23, 23, 21, 21, 23, 23, 23, 23, 23, 23, 23, 2…
$ bwd_pkts_payload.tot     <dbl> 32, 32, 30, 30, 32, 32, 32, 32, 32, 32, 32, 3…
$ bwd_pkts_payload.avg     <dbl> 6.4, 6.4, 6.0, 6.0, 6.4, 6.4, 6.4, 6.4, 6.4, …
$ bwd_pkts_payload.std     <dbl> 9.555103, 9.555103, 8.689074, 8.689074, 9.555…
$ flow_pkts_payload.avg    <dbl> 7.714286, 7.714286, 7.428571, 7.428571, 7.714…
$ flow_pkts_payload.std    <dbl> 11.61848, 11.61848, 11.22987, 11.22987, 11.61…
$ fwd_iat.min              <dbl> 761.9858, 247.0016, 283.9565, 288.9633, 387.9…
$ fwd_iat.max              <dbl> 29729183, 29855277, 29842149, 29913775, 29814…
$ fwd_iat.tot              <dbl> 32011598, 31883584, 32124053, 31961063, 31902…
$ fwd_iat.avg              <dbl> 4001450, 3985448, 4015507, 3995133, 3987795, …
$ fwd_iat.std              <dbl> 10403074, 10463456, 10442378, 10482528, 10447…
$ bwd_iat.min              <dbl> 4438.87711, 4214.04839, 2456.90346, 3933.9065…
$ bwd_iat.max              <dbl> 1511694, 1576436, 1476049, 1551892, 1632083, …
$ bwd_iat.tot              <dbl> 2026391, 1876261, 2013770, 1883784, 1935984, …
$ bwd_iat.avg              <dbl> 506597.8, 469065.2, 503442.5, 470946.0, 48399…
$ bwd_iat.std              <dbl> 680406.1, 741351.7, 660344.4, 724569.3, 76854…
$ flow_iat.min             <dbl> 761.98578, 247.00165, 283.95653, 288.96332, 3…
$ flow_iat.max             <dbl> 29729183, 29855277, 29842149, 29913775, 29814…
$ flow_iat.tot             <dbl> 32011598, 31883584, 32124053, 31961063, 31902…
$ flow_iat.avg             <dbl> 2462431, 2452583, 2471081, 2458543, 2454028, …
$ flow_iat.std             <dbl> 8199747, 8242459, 8230593, 8257786, 8230584, …
$ payload_bytes_per_second <dbl> 3.373777, 3.387323, 3.237450, 3.253959, 3.385…
$ fwd_subflow_pkts         <dbl> 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, …
$ bwd_subflow_pkts         <dbl> 1.666667, 1.666667, 1.666667, 1.666667, 1.666…
$ fwd_subflow_bytes        <dbl> 25.33333, 25.33333, 24.66667, 24.66667, 25.33…
$ bwd_subflow_bytes        <dbl> 10.66667, 10.66667, 10.00000, 10.00000, 10.66…
$ fwd_bulk_bytes           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ bwd_bulk_bytes           <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ fwd_bulk_packets         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ bwd_bulk_packets         <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ fwd_bulk_rate            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ bwd_bulk_rate            <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ active.min               <dbl> 2282415, 2028307, 2281904, 2047288, 2087657, …
$ active.max               <dbl> 2282415, 2028307, 2281904, 2047288, 2087657, …
$ active.tot               <dbl> 2282415, 2028307, 2281904, 2047288, 2087657, …
$ active.avg               <dbl> 2282415, 2028307, 2281904, 2047288, 2087657, …
$ active.std               <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ idle.min                 <dbl> 29729183, 29855277, 29842149, 29913775, 29814…
$ idle.max                 <dbl> 29729183, 29855277, 29842149, 29913775, 29814…
$ idle.tot                 <dbl> 29729183, 29855277, 29842149, 29913775, 29814…
$ idle.avg                 <dbl> 29729183, 29855277, 29842149, 29913775, 29814…
$ idle.std                 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ fwd_init_window_size     <dbl> 64240, 64240, 64240, 64240, 64240, 64240, 642…
$ bwd_init_window_size     <dbl> 26847, 26847, 26847, 26847, 26847, 26847, 268…
$ fwd_last_window_size     <dbl> 502, 502, 502, 502, 502, 502, 502, 502, 502, …
$ Attack_type              <chr> "MQTT_Publish", "MQTT_Publish", "MQTT_Publish…
summary(data)
   id.orig_p       id.resp_p        proto             service         
 Min.   :    0   Min.   :    0   Length:123117      Length:123117     
 1st Qu.:17702   1st Qu.:   21   Class :character   Class :character  
 Median :37221   Median :   21   Mode  :character   Mode  :character  
 Mean   :34639   Mean   : 1014                                        
 3rd Qu.:50971   3rd Qu.:   21                                        
 Max.   :65535   Max.   :65389                                        
 flow_duration       fwd_pkts_tot       bwd_pkts_tot      fwd_data_pkts_tot 
 Min.   :    0.00   Min.   :   0.000   Min.   :    0.00   Min.   :   0.000  
 1st Qu.:    0.00   1st Qu.:   1.000   1st Qu.:    1.00   1st Qu.:   1.000  
 Median :    0.00   Median :   1.000   Median :    1.00   Median :   1.000  
 Mean   :    3.81   Mean   :   2.269   Mean   :    1.91   Mean   :   1.471  
 3rd Qu.:    0.00   3rd Qu.:   1.000   3rd Qu.:    1.00   3rd Qu.:   1.000  
 Max.   :21728.34   Max.   :4345.000   Max.   :10112.00   Max.   :4345.000  
 bwd_data_pkts_tot  fwd_pkts_per_sec    bwd_pkts_per_sec    flow_pkts_per_sec  
 Min.   :    0.00   Min.   :      0.0   Min.   :      0.0   Min.   :      0.0  
 1st Qu.:    0.00   1st Qu.:     74.5   1st Qu.:     72.9   1st Qu.:    149.1  
 Median :    0.00   Median : 246723.8   Median : 246723.8   Median : 493447.5  
 Mean   :    0.82   Mean   : 351806.3   Mean   : 351762.0   Mean   : 703568.3  
 3rd Qu.:    0.00   3rd Qu.: 524288.0   3rd Qu.: 524288.0   3rd Qu.:1048576.0  
 Max.   :10105.00   Max.   :1048576.0   Max.   :1048576.0   Max.   :2097152.0  
 down_up_ratio    fwd_header_size_tot fwd_header_size_min fwd_header_size_max
 Min.   :0.0000   Min.   :    0.00    Min.   : 0.00       Min.   : 0.00      
 1st Qu.:1.0000   1st Qu.:   20.00    1st Qu.:20.00       1st Qu.:20.00      
 Median :1.0000   Median :   20.00    Median :20.00       Median :20.00      
 Mean   :0.8546   Mean   :   53.89    Mean   :19.78       Mean   :20.65      
 3rd Qu.:1.0000   3rd Qu.:   20.00    3rd Qu.:20.00       3rd Qu.:20.00      
 Max.   :6.0879   Max.   :69296.00    Max.   :44.00       Max.   :52.00      
 bwd_header_size_tot bwd_header_size_min bwd_header_size_max
 Min.   :     0.0    Min.   : 0.0        Min.   : 0.00      
 1st Qu.:    20.0    1st Qu.:20.0        1st Qu.:20.00      
 Median :    20.0    Median :20.0        Median :20.00      
 Mean   :    46.6    Mean   :17.7        Mean   :18.43      
 3rd Qu.:    20.0    3rd Qu.:20.0        3rd Qu.:20.00      
 Max.   :323592.0    Max.   :40.0        Max.   :44.00      
 flow_FIN_flag_count flow_SYN_flag_count flow_RST_flag_count fwd_PSH_flag_count
 Min.   : 0.0000     Min.   :0.0000      Min.   : 0.0000     Min.   :  0.0000  
 1st Qu.: 0.0000     1st Qu.:1.0000      1st Qu.: 1.0000     1st Qu.:  0.0000  
 Median : 0.0000     Median :1.0000      Median : 1.0000     Median :  0.0000  
 Mean   : 0.1156     Mean   :0.9509      Mean   : 0.7965     Mean   :  0.3513  
 3rd Qu.: 0.0000     3rd Qu.:1.0000      3rd Qu.: 1.0000     3rd Qu.:  0.0000  
 Max.   :10.0000     Max.   :8.0000      Max.   :10.0000     Max.   :864.0000  
 bwd_PSH_flag_count  flow_ACK_flag_count fwd_URG_flag_count
 Min.   :   0.0000   Min.   :    0.000   Min.   :0.00000   
 1st Qu.:   0.0000   1st Qu.:    1.000   1st Qu.:0.00000   
 Median :   0.0000   Median :    1.000   Median :0.00000   
 Mean   :   0.3936   Mean   :    2.678   Mean   :0.01629   
 3rd Qu.:   0.0000   3rd Qu.:    1.000   3rd Qu.:0.00000   
 Max.   :1446.0000   Max.   :11772.000   Max.   :1.00000   
 fwd_pkts_payload.min fwd_pkts_payload.max fwd_pkts_payload.avg
 Min.   :   0.00      Min.   :   0.0       Min.   :   0.0      
 1st Qu.: 120.00      1st Qu.: 120.0       1st Qu.: 120.0      
 Median : 120.00      Median : 120.0       Median : 120.0      
 Mean   :  96.26      Mean   : 120.7       Mean   : 100.5      
 3rd Qu.: 120.00      3rd Qu.: 120.0       3rd Qu.: 120.0      
 Max.   :1097.00      Max.   :1420.0       Max.   :1319.4      
 fwd_pkts_payload.std bwd_pkts_payload.min bwd_pkts_payload.max
 Min.   :  0.000      Min.   :   0.000     Min.   :   0.00     
 1st Qu.:  0.000      1st Qu.:   0.000     1st Qu.:   0.00     
 Median :  0.000      Median :   0.000     Median :   0.00     
 Mean   :  8.108      Mean   :   3.817     Mean   :  52.41     
 3rd Qu.:  0.000      3rd Qu.:   0.000     3rd Qu.:   0.00     
 Max.   :731.579      Max.   :1357.000     Max.   :5124.00     
 bwd_pkts_payload.tot bwd_pkts_payload.avg bwd_pkts_payload.std
 Min.   :       0     Min.   :   0.00      Min.   :   0.00     
 1st Qu.:       0     1st Qu.:   0.00      1st Qu.:   0.00     
 Median :       0     Median :   0.00      Median :   0.00     
 Mean   :     513     Mean   :  18.79      Mean   :  20.55     
 3rd Qu.:       0     3rd Qu.:   0.00      3rd Qu.:   0.00     
 Max.   :13610415     Max.   :1457.05      Max.   :1506.01     
 flow_pkts_payload.avg flow_pkts_payload.std  fwd_iat.min       
 Min.   :   0.00       Min.   :  0.00        Min.   :        0  
 1st Qu.:  60.00       1st Qu.: 50.22        1st Qu.:        0  
 Median :  60.00       Median : 84.85        Median :        0  
 Mean   :  65.01       Mean   : 76.04        Mean   :     8843  
 3rd Qu.:  60.00       3rd Qu.: 84.85        3rd Qu.:        0  
 Max.   :1156.08       Max.   :924.65        Max.   :300252571  
  fwd_iat.max         fwd_iat.tot         fwd_iat.avg       
 Min.   :        0   Min.   :0.000e+00   Min.   :        0  
 1st Qu.:        0   1st Qu.:0.000e+00   1st Qu.:        0  
 Median :        0   Median :0.000e+00   Median :        0  
 Mean   :  1721566   Mean   :3.780e+06   Mean   :   237357  
 3rd Qu.:        0   3rd Qu.:0.000e+00   3rd Qu.:        0  
 Max.   :300252571   Max.   :2.173e+10   Max.   :300252571  
  fwd_iat.std         bwd_iat.min        bwd_iat.max         bwd_iat.tot       
 Min.   :        0   Min.   :       0   Min.   :        0   Min.   :0.000e+00  
 1st Qu.:        0   1st Qu.:       0   1st Qu.:        0   1st Qu.:0.000e+00  
 Median :        0   Median :       0   Median :        0   Median :0.000e+00  
 Mean   :   577557   Mean   :    3765   Mean   :   407727   Mean   :1.780e+06  
 3rd Qu.:        0   3rd Qu.:       0   3rd Qu.:        0   3rd Qu.:0.000e+00  
 Max.   :212296532   Max.   :43196220   Max.   :300028179   Max.   :1.876e+10  
  bwd_iat.avg         bwd_iat.std         flow_iat.min       flow_iat.max      
 Min.   :        0   Min.   :        0   Min.   :       0   Min.   :        0  
 1st Qu.:        0   1st Qu.:        0   1st Qu.:       1   1st Qu.:        1  
 Median :        0   Median :        0   Median :       4   Median :        4  
 Mean   :    87652   Mean   :   147480   Mean   :    4283   Mean   :  1725999  
 3rd Qu.:        0   3rd Qu.:        0   3rd Qu.:       5   3rd Qu.:        5  
 Max.   :150148934   Max.   :211961260   Max.   :43510042   Max.   :299999988  
  flow_iat.tot        flow_iat.avg       flow_iat.std      
 Min.   :0.000e+00   Min.   :       0   Min.   :        0  
 1st Qu.:1.000e+00   1st Qu.:       1   1st Qu.:        0  
 Median :4.000e+00   Median :       4   Median :        0  
 Mean   :3.811e+06   Mean   :  139654   Mean   :   450136  
 3rd Qu.:5.000e+00   3rd Qu.:       5   3rd Qu.:        0  
 Max.   :2.173e+10   Max.   :72835758   Max.   :134122073  
 payload_bytes_per_second fwd_subflow_pkts  bwd_subflow_pkts  
 Min.   :        0        Min.   :  0.000   Min.   :   0.000  
 1st Qu.:     2581        1st Qu.:  1.000   1st Qu.:   1.000  
 Median : 29606852        Median :  1.000   Median :   1.000  
 Mean   : 41053452        Mean   :  1.552   Mean   :   1.338  
 3rd Qu.: 55924053        3rd Qu.:  1.000   3rd Qu.:   1.000  
 Max.   :125829120        Max.   :276.833   Max.   :1685.333  
 fwd_subflow_bytes bwd_subflow_bytes   fwd_bulk_bytes     bwd_bulk_bytes   
 Min.   :    0.0   Min.   :      0.0   Min.   :     0.0   Min.   :      0  
 1st Qu.:  120.0   1st Qu.:      0.0   1st Qu.:     0.0   1st Qu.:      0  
 Median :  120.0   Median :      0.0   Median :     0.0   Median :      0  
 Mean   :  136.5   Mean   :    217.5   Mean   :    19.2   Mean   :    155  
 3rd Qu.:  120.0   3rd Qu.:      0.0   3rd Qu.:     0.0   3rd Qu.:      0  
 Max.   :52067.8   Max.   :2268402.5   Max.   :465095.0   Max.   :6805208  
 fwd_bulk_packets   bwd_bulk_packets   fwd_bulk_rate      bwd_bulk_rate     
 Min.   :  0.0000   Min.   :   0.000   Min.   :       0   Min.   :       0  
 1st Qu.:  0.0000   1st Qu.:   0.000   1st Qu.:       0   1st Qu.:       0  
 Median :  0.0000   Median :   0.000   Median :       0   Median :       0  
 Mean   :  0.0241   Mean   :   0.131   Mean   :    3836   Mean   :   48415  
 3rd Qu.:  0.0000   3rd Qu.:   0.000   3rd Qu.:       0   3rd Qu.:       0  
 Max.   :343.0000   Max.   :5052.500   Max.   :46336283   Max.   :28300874  
   active.min          active.max          active.tot       
 Min.   :        0   Min.   :        0   Min.   :0.000e+00  
 1st Qu.:        1   1st Qu.:        1   1st Qu.:1.000e+00  
 Median :        4   Median :        4   Median :4.000e+00  
 Mean   :   133155   Mean   :   178590   Mean   :2.929e+05  
 3rd Qu.:        5   3rd Qu.:        5   3rd Qu.:5.000e+00  
 Max.   :312507974   Max.   :848097909   Max.   :2.945e+09  
   active.avg          active.std           idle.min        
 Min.   :        0   Min.   :        0   Min.   :        0  
 1st Qu.:        1   1st Qu.:        0   1st Qu.:        0  
 Median :        4   Median :        0   Median :        0  
 Mean   :   148135   Mean   :    23536   Mean   :  1616655  
 3rd Qu.:        5   3rd Qu.:        0   3rd Qu.:        0  
 Max.   :437493062   Max.   :477486236   Max.   :299999988  
    idle.max            idle.tot            idle.avg        
 Min.   :        0   Min.   :0.000e+00   Min.   :        0  
 1st Qu.:        0   1st Qu.:0.000e+00   1st Qu.:        0  
 Median :        0   Median :0.000e+00   Median :        0  
 Mean   :  1701956   Mean   :3.518e+06   Mean   :  1664985  
 3rd Qu.:        0   3rd Qu.:0.000e+00   3rd Qu.:        0  
 Max.   :299999988   Max.   :2.097e+10   Max.   :299999988  
    idle.std         fwd_init_window_size bwd_init_window_size
 Min.   :        0   Min.   :    0        Min.   :    0       
 1st Qu.:        0   1st Qu.:   64        1st Qu.:    0       
 Median :        0   Median :   64        Median :    0       
 Mean   :    45502   Mean   : 6119        Mean   : 2740       
 3rd Qu.:        0   3rd Qu.:   64        3rd Qu.:    0       
 Max.   :120802871   Max.   :65535        Max.   :65535       
 fwd_last_window_size Attack_type       
 Min.   :    0.0      Length:123117     
 1st Qu.:   64.0      Class :character  
 Median :   64.0      Mode  :character  
 Mean   :  751.6                        
 3rd Qu.:   64.0                        
 Max.   :65535.0                        

Description

The dataset consists of 123,117 rows and 77 columns, capturing network traffic flow data. Key features include:

  • Network Identifiers: Columns like id.orig_p and id.resp_p capture originating and responding port IDs.

  • Protocol and Service Information: Columns such as proto (protocol) and service (e.g., MQTT) identify the communication protocol and services in use.

  • Traffic Statistics: These include metrics like packet counts (fwd_pkts_tot, bwd_pkts_tot), packet rates (fwd_pkts_per_sec), and header sizes.

  • Flow Characteristics: Features like flow duration (flow_duration), and flags such as flow_FIN_flag_count, flow_ACK_flag_count,flow_SYN_flag_count flow_RST_flag_count capture communication patterns.

  • Attack Type: The Attack_type column labels the type of attack or event detected (e.g., MQTT_Publish).

  • Payload Information: This payload information describes the size of packets that is flowing through the network during an attack vs normal traffic for eg : fwd_pkts_payload.avg,fwd_pkts_payload.min, fwd_pkts_payload.max which will be higher during an attack.

  • Bandwidth Information: The amount of data flowing through IOT infrastructure varies during different type of attack for e.g. during DDOS slowloris is denial of service attack where the data flow is much higher in comparison to normal traffic. So these variables are used for bandwidth information fwd_pkts_tot, bwd_pkts_tot, fwd_data_pkts_tot, bwd_data_pkts_tot,fwd_pkts_per_sec, bwd_pkts_per_sec, flow_pkts_per_sec.

  • Inter-arrival time information: Variables like fwd_iat.min, fwd_iat.avg, flow_iat.min can be used to determine what is the time difference between two packets which corresponds to payload information as the payload gets bigger IAT and IAT flow time will be larger.

  • Idle vs In-use information: active.avg , idle.avg will provide the information about the IOT devices if it is forwarding or not forwarding the network traffic.

The dataset appears to be useful for studying network behavior, identifying attacks, and analyzing flow-based communication statistics.

Source of the data

The RT-IoT 2022 dataset, available from the UCI Machine Learning Repository, is designed for research on detecting attacks in IoT (Internet of Things) systems. It contains network flow data from various IoT devices and captures both normal and malicious traffic, making it valuable for studying cybersecurity in IoT environments. The dataset includes features such as packet counts, traffic flow statistics, and communication protocols, which are essential for intrusion detection and anomaly analysis in smart systems. Researchers often use it to train machine learning models to detect cyberattacks.

Dataset Generation

The RT-IoT2022 dataset was specifically created to train and test the IDS. The dataset comprises normal and attack traffic, captured using real-time IoT devices like ThingSpeak-LED, MQTT-Temp, Amazon Alexa, and Wipro Bulb. The authors used a router setup to connect both victim (IoT devices) and attacker devices, capturing network traffic through the open-source tool Wireshark, which recorded and converted traces into PCAP files.

Attack Simulation: SSH Brute-Force Attack: Metasploit’s modules were employed to launch SSH brute-force attacks after scanning for open ports using Nmap. DDoS Attack: The Hping3 tool from Kali Linux was utilized to generate DDoS attacks, transmitting thousands of packets to simulate high traffic.

Feature Engineering: The collected PCAP files were processed using the CICFlowmeter tool, converting the network traffic data into bidirectional flow features for analysis. Irrelevant information like source and destination addresses were removed, and categorical features were numerically encoded to prevent overfitting. This method ensured a realistic and comprehensive dataset, encompassing both benign and malicious IoT traffic, critical for developing and testing the QAE IDS model.

Questions

The following are the question will be used for our project:-

  • Which protocol, service and port number is used in different type of attack scenarios to avoid any future network cyber attacks ?

  • How do different type of attack show unique patterns across bandwidth, inter arrival time, payload and flow characteristics ? Are these patterns showing any reliable distinctions between attacks ?

  • Which combinations of dimensions is responsible for the different type of attack ?

Analysis plan

Question 1:

  • Variable proto , service and id.resp_p will be used and compared with different type of attacks vs when actual devices is talking over same protocol, service and port number.

Question 2:

  • A relationship between variables that corresponds to bandwidth information for e.g. fwd_pkts_tot, bwd_pkts_tot, fwd_data_pkts_tot, bwd_data_pkts_tot, fwd_pkts_per_sec, bwd_pkts_per_sec, flow_pkts_per_sec with attack type will be determined. This will define the clear relationship of bandwidth during an attack

  • Inter-arrival time information which will use variables like fwd_iat.min, fwd_iat.avg, flow_iat.min to make similar relationship during an attack vs normal operation.

  • Every attack type prohibit different payload behaviour. We will use the variables fwd_pkts_payload.avg,fwd_pkts_payload.min, fwd_pkts_payload.max to find out extreme large packet and empty packet

  • DDOS slowloris attack uses TCP SYNC message flooding through the server with the flow characterstics information we will compare number of TCP SYNC message with the number of other TCP messages

Question 3:

  • With comparison of different variables in attack scenarios we will determine who and how many dimensions are affected during attacks

Ethical concerns

Our dataset do not have any ethical concerns because basic information of internal organization is not included. We are visualising and analysing without source and destination specified which has its pros and cons.

Citations