During these last two weeks, I focussed on implementing SVMs. I again used Tom Mitchell’s “Machine Learning”[1] as a base for my research and understanding of SVMs. After some research I decided a linear kernel would be appropriate for my purposes, as the dataset is quite large so training multiple gaussian kernels SVMs, as is necessary for analyzing performance, would be extremely time consuming. After some research I decided to use Scikit learn’s SGDClassifier[2] as the base for my model, as it is the fastest option I found. I have now implemented the model, though I have yet to finish the analysis portion of my code.
During the next two weeks I will focus on completing the analysis portion of my code, then preparing the demo video, which will likely feature an explanation of my code, and my reasoning behind choosing my hyperparameter configurations.
Project Logbook – Mar 21
Feb 5 – Researched project Ideas – 2 hours
Feb 7 – Wrote Project proposal – 3 hours
Feb 17 – Further research of learning algorithms – 1 hour
Feb 19 – Wrote code for: – 3 hours
- Reading the dataset
- Partitioning the data into training and testing sets
- Using K-fold cross validation
Feb 21 – Wrote biweekly update – 2 hours
Feb 28 – Researched Random Forests, and Implementation of Random Forests through Scikit Learn – 4 Hours
March 3 – Implementation of code for running Random Forests – 3 hours
March 7 – Wrote Biweekly update – 2 hours
March 9 – Reading/Researching SVMs in[1] – 3 hours
March 14 – Researched possible libraries to use for my base models – 2 hours
March 20 – Implemented SVMs through Scikit learn – 2 hour
March 21 – Wrote Biweekly update and responded to feedback on my project – 2 hours
References
[1] Machine Learning, Tom Mitchell, url: https://www.cs.cmu.edu/~tom/files/MachineLearningTomMitchell.pdf, accessed March 21, 2025
[2] SGD Classifier, Scikit Learn, url: SGDClassifier — scikit-learn 1.6.1 documentation, accessed March 21, 2025