Aalap_Parimalkumar_R - Machine Learning Pathway

Concise overview of things learned. Break it up into Technical Area, Tools, Soft Skills:

Technical: Learned how to scrape a website to gather meaningful data using the Selenium and Beautifulsoup in Python. Learned about waterfall model.

Tools: Learned to collaborate on Slack. Utilization of version control tool like GitHub in project development. Installation and basic setups of environment and tools like Jupyter notebook and IDE like VS code for code development.

Soft Skills: Learned to effectively communicate within sub-team. Learned to collaborate with teammate in different time zones and with different backgrounds. Learned teamwork and how to do work allocation within sub-team.

Three achievement highlights:

Successfully scraped a discourse forum and organized the data efficiently in a pandas Data Frame.

Collaborated with my team and sub-teams to debug errors.

Optimization of the code to reduce runtime and memory usage.

Cleaned the data, stored it in csv files.

Used git concepts to manage team development

List of meetings/ training attended including social team event:

All team meetings: 6/1, 6/2, 6/9, 6/13

STEM casts: Overview of ML and project, Data Mining, Recommendation Models, Git

Goals for the upcoming week. Next self-assessment will be due on the following Tuesday 06/23

My goals for the next week are to study research paper on BERT and learn how to use the data scraped to train a model using BERT. Also, I would do an analysis of different layers found in BERT.

Detailed statement of tasks done. State each task, hurdles faced if any, and how you solved the hurdle. You need to mark whether the hurdles were solved with the help of training webinars, some help from project leads or significant help from project leads:

Task 1(Completed): Environment setup, Sub-team formation and project Introduction

Create an account in Slack, Asana, GSuite.

State the project goals and introduction.

Form sub-teams within team7.

Hurdles:

Faced issues signing up for Gsuite account.

Unaware of skills of other teammate thus, faced issue while selecting teammates.

Hurdles Solution:

The leads got it set up for me and I was able to login by the next day.

The leads split up the whole team into skill-balanced sub-teams.

Task 2 (Completed): Script to extract data from the Amazon Seller Discourse forum.

Created python script to extract data from “Fulfillment-by-amazon” category. Modeled two Data Frames, first with the following columns “title”, “category”, “sub-category”, “original post content”, and the “URL”. Other with the URL, and all responses for each post.

Hurdles:

I was new to web scrapping, thus, I had to refer books on web scrapping.

I also had to learn how a web page is designed (HTML and CSS).

Hurdles solution:

The technical lead did a demo at one of our meetings where she showed us a scraping example using Selenium, I used her code as reference and developed based on that.

Saw tutorial on web designing on YouTube.

Task 3 (Completed): Cleaned the data removing non-English characters.

Hurdles:

There were post which had non text data like images which were scraped and generated error

Hurdles solution:

I found that it can be removed using extract() function found in selenium.Teammate (keerthi) suggested an approach of forming a string by running a for loop on

tags.

Task 4 (Completed): Stored the clean data in two separate csv files in required format.

Task 5 (Completed) : Created a branch for sub-team and posted the team code with all required data.

Request change of role if it applies. You may request to become a task lead. Or switch between participant and observer roles.

I would like to be an active participant for now.