Things learned:
Technical: Refreshed data visualization skills, learned how to do Web Scraping using BeautifulSoup, as well as Requests and Selenium methods for getting data from multiple pages, understood and implemented TF-IDF, and word embeddings. Gained experience inspecting web pages, searching for the needed tags and information.
Tools: GitHub, GoogleCollab.
Achievements:
-
Collected data from 4000+ posts, 50+ webpages and made a csv dataset with 7 features
-
Implemented TF-IDF on the data
-
Visualized the findings using matplotlib
Meetings attended:
Week 1 Team meetings, 2 hours (x2)
Week 2 Team meeting, 1 hour
Office Hour, 1
Week 3 Team meeting, 1 hour
Goals
- To implement word embeddings on the data set and plot the embeddings in a way reflecting their corresponding words’ meaning.
- Implement BERT.
- Get started with variations of BERT.
Tasks completed
This week I’ve been working on improving the quality of the dataset I out together last week, researched BERT, word embeddings and TF-IDF methods, and implemented the latter on the dataset. The biggest challenges were finding the appropriate format for the visualization of the results, as well as performing web scraping of additional few thousands posts. With the help of the coding demos, office hours and patience, I managed to overcome these obstacles. I have also learned to take initiative and meet extremely close deadlines.