During the last couple weeks I did some more research on my project. First I evaluated the machine learning algorithms I’d be using. I had intended to use a neural network to classify the network intrusion packets because of its current popularity and strong performance, though I have now decided to use a linear support vector machine instead, because neural networks are very difficult to interpret, and I would like to be able to investigate what features the learning methods deem most indicative of a network intrusion. Thus I will use a random forest learning algorithm, and a support vector machine algorithm in my analysis.

I have also decided that instead of building a packet reading script to build a csv, I will use the pre-existing csv’s for my analysis. I have two reasons for this change: first, because the packet reading script would be time consuming to create and I believe that the implementation of the algorithms will be very time consuming by themselves, and secondly, because the CSVs from the dataset I’ll be using contain information in the features section that cannot be found through analysis of the packets alone, and I more information will surely lead to better learning algorithms.

During these last two weeks I have researched the algorithms I’d be using, and written and implemented the code to read the csv files, as well as the code for using k-fold cross validation for my analysis.

Project Logbook – Feb 21

Feb 5 – Researched project Ideas – 2 hours

Feb 7 – Wrote Project proposal – 3 hours

Feb 17 – Further research of learning algorithms – 1 hour

Feb 19 – Wrote code for: – 3 hours

Reading the dataset
Partitioning the data into training and testing sets
Using K-fold cross validation

Feb 21 – Wrote biweekly update – 2 hours

Overview

As many students in our CSC 446 class discussions have agreed, network security is a top priority as the internet is becoming increasingly popular, and an increasing amount of personal information is being stored in online servers.

Network intrusion detection is widely accepted as an effective method for dealing with network threats[1], though traditional rule-based IDS systems (like Snort) struggle with new attack patterns. During my project I will investigate the effectiveness of Machine Learning Methods on intrusion detection datasets, and attempt to train a model that can effectively detect advanced threats, in order to gain a better understanding of network attacks, and machine learning.

Project Plan

I plan to use the UNSW-NB15 dataset[2] created by the University of Sydney which contains 9 types of attacks, and 49 features. The set contains over 2 million records, and has conveniently been partitioned into a test set and a training set.

For the first part of my project, I will design a program that can read pcap files and output their features. In the second part of my project I will train two models, using Random forests and Neural networks respectively, and then evaluate their performance and show the capabilities of my final model in my final report.

References

[1] Zhen Yang et. al., “A systematic literature review of methods and datasets for anomaly-based network intrusion detection”, Computers & Security, url: network intrusion detection – an overview | ScienceDirect Topics, accessed: Feb 7, 2025

[2] UNSW-NB15 Dataset, University of Sydney, url: The UNSW-NB15 Dataset | UNSW Research, accessed: Feb 7, 2025

Month: February 2025

Bi-Weekly Update #1

Project Logbook – Feb 21

Project Proposal – Network Intrusion Detection with Machine Learning

Overview

Project Plan

Schedule Dates

References