Week: 7/27 – 8/01
Overview of Things Learned:
Technical Area:
Learned a lot on web scraping and using BeautifulSoup. I am still trying to understand how to use scrapy. I also learned how to save info into a CSV file.
Tools: Requests, BeautifulSoup, re, and Pandas libraries, scrapy
Achievement Highlights
- web scraping a website
- learning python
- learned how to use HTML and JSON
- learned how to use panda
Meetings attended
7/20 – ML team kick-off meeting
7/27 – Team-4 meeting
7/29 – web scraping check-in
7/31 – web scraping check-in
Goals for the Upcoming Week
- Learning TFIDF
- Exploring BERT library
Tasks Done
Scraped titles, usenames, latest updates, from the Hopscotch forum, preprocessed and stored the data in CSV files, still need to push it to github
Week: 8/10
Overview of Things Learned:
Technical Area: TF-IDF, BERT
Tools: transformers, torch, pandas
Achievement Highlights
Meetings Attended
8/10 - Present Pre-Processing and TF-IDF
8/12 - Check in an Implementing the BERT Model
8/14 - Present BERT Model implementations
Goals for the Upcoming Week
Tasks Done
- Create a TF-IDF and finished implementing BERT
Week: 8/3
Overview of Things Learned:
Technical Area: Pre- processing
Tools: Beautiful Soup, pandas, copy, re, io, markdown, string, requests, csv
Achievement Highlights:
-
Learned how to clean raw data by getting rid of foreign characters, emoji, and symbols
-
Created a new csv file with updated text
Meetings Attended
8/5 - Pre-processing Check in
8/7 Presenting Pr-processing and TF-IDF
Goals for the Upcoming Week
Task Done
Completed TF-IDF and preprocessing. Was able to clean the raw data and save it into a csv file and is ready for use in BERT.