During the last couple weeks I did some more research on my project. First I evaluated the machine learning algorithms I’d be using. I had intended to use a neural network to classify the network intrusion packets because of its current popularity and strong performance, though I have now decided to use a linear support vector machine instead, because neural networks are very difficult to interpret, and I would like to be able to investigate what features the learning methods deem most indicative of a network intrusion. Thus I will use a random forest learning algorithm, and a support vector machine algorithm in my analysis.
I have also decided that instead of building a packet reading script to build a csv, I will use the pre-existing csv’s for my analysis. I have two reasons for this change: first, because the packet reading script would be time consuming to create and I believe that the implementation of the algorithms will be very time consuming by themselves, and secondly, because the CSVs from the dataset I’ll be using contain information in the features section that cannot be found through analysis of the packets alone, and I more information will surely lead to better learning algorithms.
During these last two weeks I have researched the algorithms I’d be using, and written and implemented the code to read the csv files, as well as the code for using k-fold cross validation for my analysis.
Project Logbook – Feb 21
Feb 5 – Researched project Ideas – 2 hours
Feb 7 – Wrote Project proposal – 3 hours
Feb 17 – Further research of learning algorithms – 1 hour
Feb 19 – Wrote code for: – 3 hours
- Reading the dataset
- Partitioning the data into training and testing sets
- Using K-fold cross validation
Feb 21 – Wrote biweekly update – 2 hours