Shahkushal97 - Machine Learning Pathway

shahkushal97 · June 16, 2020, 9:01pm

Assessment 1

My accomplishments

Technical

Web Scraping with Beautiful Soup
EDA
Data Cleaning

Tools

Github
Slack
Asana
Jupyter

Soft Skills

Teamwork

Three achievement highlights

Went through the scraping tutorial, understood it and executed it.
Created a web scraping script to collect data from a website.
Perform data cleaning and EDA to output clean csv file.

List of meeting/training sessions attended

STEMCast: Overview of ML and project
Git Webinar
Weekly Team meeting 06/8

Goals for upcoming weeks

Study text pre-processing,NLP & text classification.
Explore data visualization.

shahkushal97 · June 26, 2020, 8:49pm

THINGS LEARNED:

Technical: BERT basics, TF-IDF analysis

ACHIEVEMENTS:

Obtain a basic understanding of encoders in BERT
Understand the implementation of TF-IDF
Calculate Term Frequency and Inverse Document Frequency for various article categories

MEETINGS ATTENDED:

Week 4 Team Meeting (Monday)

GOALS:

Implement a full-scale BERT model to surpass the competency of Word2Vec
Run tests to see which categories are best for classifying articles

TASKS COMPLETED:

Tasks that were completed this week were obtaining TF-IDF for the scraped data. First I went through various sources to understand how to implement and the correct way of the output that we need to have from the TF-IDF implementation.

I also did research to gain an understanding of BERT. Went through multiple blogs and youtube videos to understand the concept of BERT and presented my understandings in front of the group. Currently I am working on understanding the implementation of BERT that would help me use it in our project.

shahkushal97 · July 8, 2020, 1:50am

THINGS LEARNED:

Technical: BERT, Training and Validation

Tools: Model training, BertTokenizer

ACHIEVEMENTS:

Implement a BERT Model.
Gained valuable insights from the accuracy of the table and understood what modifications are needed.

GOALS:

Refine the BERT model to improve its prediction accuracy
Implement one-hot encoding to ensure dataset is not biased and then implement BERT model once again

TASKS COMPLETED:

After facing certain problems in executing BERT. I could finally implement the BERT model and could analyse the results and I understood where I need to work on. I understood that the dataset is biased so the accuracy of the current cannot be considered appropriate. I had to go through resources to study how can I remove this bias. Now I look forward to apply one-hot encoding and try to implement BERT on this new dataset.