Week: 7/27
Overview of Things Learned:
-
Technical Area: Web Scraping, Data Cleaning
-
Tools: Scrapy, Requests, Pandas, Beautiful Soup
-
Soft Skills: #communication #teamwork
#internationalcollaboration
Achievement Highlights -
Used Beautiful Soup and then Scrapy to scrape data from Community CarTalk forum from over 13,000 posts.
-
Familiarised myself with Collaboration tools such as Jupyter and Google Collab
-
Pre-processed my scraped data to clean it for applying further machine learning algorithms
-
Debugged for hours on end by checking out tens of sources for the errors of my code. Finally comfortable with Data scraping.
Meetings attended
- Introduction to Web Scraping
- Web Scraping Check-in
- Web Scraping and Preprocessing presentations
Goals for the Upcoming Week
- Refine my processed data
- Learn about TF-IDF and BERT
- Collaborate on ideas and techniques with fellow team members
Tasks Done
- Web Scraping: Scraped data from Community CarTalk forum. Had some issues with using XPath expressions, but I resolved it later by using the JSON module which made the tag-fetching much easier and convenient. Pushed these CSV files to the team repository on Github.
- Pre-processing: Cleaning my scraped data using pandas and other libraries like re. Still refining my data.