Machine Learning - Sachitt Arora - Level 1 Module 3

Sachitt_Arora · July 7, 2021, 6:18am

Technical

I created a new column which contained all of the data.
I used cosine similarity to create a basic recommender system which simply uses the cosine matrix to determine whether two posts are similar or not.
I used this recommender function to determine the top 10 most similar posts to any inputted post.
I then moved on to advanced data modeling functions and created different pipelines and determined which would get the highest accuracy on my data.
I experimented a few times with this by changing around data inputted in order to get the highest level of accuracy output.
I abstracted all of my functions and data files both in this module and in other modules in order to simplify the process.

Tools

VScode Numpy SKLearn Pandas

Soft Skills

A problem I had with this one was largely due to the way I did not structure my functions well enough in module 2. I then went back and edited a lot of parts to my module 2 so that it could be used more efficiently in the future and did the same to module 3 and I understood how to keep my code neat and organized. I also used youtube videos about math and other functions to better understand what I was doing.

Achievement Highlights

I was able to create a basic recommender system using cosine similarity
I was able to analyze different models on my data
Eventually got to almost a 90% accuracy with my models after inputting and outputting a lot of data
Scraped even more data and restructured my code from both modules

Detailed Statement of Tasks Completed

I first created two new csv files which each had a bag of words. One had it with only key words inputted into the bag of words whereas the other still had stopwords and was not too cleaned in order to see if one would work better than the other in modeling.
I then constructed a cosine similarity matrix for a basic recommender system. I reviewed a little bit of linear algebra at this time to because I wanted to try and better understand what was going on. I then tried inputting a few posts and seeing the outputs of the top 10 similar ones.
Next I moved on to the modeling and constructed pipelines for each of the different types of models. In these models I fed in different types of data (fully cleaned or semi-cleaned), different amounts of data, and different constraints on the models and decided on the best one.
During my data analysis and models training I plotted graphs to better understand the way the data was structured
Scraped more data and reorganized code structure and layout.

Problems Faced

The biggest problem I faced in doing this module was time constraints. I had an extremely busy schedule and was forced to do most of this in a very short period of time. However, I remained motivated even though the module looked tougher compared to earlier ones and was able to complete it before the deadline. Technical problems I faced I solved by using the mentors guide or by looking on websites such as StackOverflow for general solutions. I will continue tying to get even greater accuracy by changing around the data and attempting new techniques. One more thing I am slowly getting better at is my teamwork skills. Since I recently became a PM later into the program, I have needed to catch up and rise to the challenge even though I had a late start. I have been looking through the website and discussing with fellow PMs as for what could be done to maximize my potential as a project manager.