During these last two weeks, I focussed on implementing SVMs. I again used Tom Mitchell’s “Machine Learning”[1] as a base for my research and understanding of SVMs. After some research I decided a linear kernel would be appropriate for my purposes, as the dataset is quite large so training multiple gaussian kernels SVMs, as is necessary for analyzing performance, would be extremely time consuming. After some research I decided to use Scikit learn’s SGDClassifier[2] as the base for my model, as it is the fastest option I found. I have now implemented the model, though I have yet to finish the analysis portion of my code.

During the next two weeks I will focus on completing the analysis portion of my code, then preparing the demo video, which will likely feature an explanation of my code, and my reasoning behind choosing my hyperparameter configurations.

Project Logbook – Mar 21

Feb 5 – Researched project Ideas – 2 hours

Feb 7 – Wrote Project proposal – 3 hours

Feb 17 – Further research of learning algorithms – 1 hour

Feb 19 – Wrote code for: – 3 hours

Reading the dataset
Partitioning the data into training and testing sets
Using K-fold cross validation

Feb 21 – Wrote biweekly update – 2 hours

Feb 28 – Researched Random Forests, and Implementation of Random Forests through Scikit Learn – 4 Hours

March 3 – Implementation of code for running Random Forests – 3 hours

March 7 – Wrote Biweekly update – 2 hours

March 9 – Reading/Researching SVMs in[1] – 3 hours

March 14 – Researched possible libraries to use for my base models – 2 hours

March 20 – Implemented SVMs through Scikit learn – 2 hour

March 21 – Wrote Biweekly update and responded to feedback on my project – 2 hours

References

[1] Machine Learning, Tom Mitchell, url: https://www.cs.cmu.edu/~tom/files/MachineLearningTomMitchell.pdf, accessed March 21, 2025

[2] SGD Classifier, Scikit Learn, url: SGDClassifier — scikit-learn 1.6.1 documentation, accessed March 21, 2025