- Overview of Things Learned
Technical:
- Webscraping fundamentals
- Data pre-processing Technics
- ML models for NLP applications
Tools:
- Webscraping tools such as BeautifulSoup and Selenium
- Pandas, JSON
- scikit-learn ML library
- Github
Soft Skills:
- Improve presentation skill
- Better at searching resource online
- Achievement Highlights
- Successfully web scraped from the Amazon and Flowster Discourse forums
- Implemented the Logistic Regression algorithm to classify Discourse topics into the correct categories and got pretty good accuracy on Amazon dataset
List of Meetings/Training Attended:
- All team meetings except one and github workshop
Goals for the Coming Week:
- Continue to investigate methods to improve Logistic Regression classifier accuracy on Flowster dataset
- Learn more about data augmentation
Detailed Statement of Tasks Done:
- Used BeautifulSoup and Selenium to successfully scrape data from both Flowster and Amazon website and get 260 for Flowster and around 16k for Amazon.
- Trained Logistic Regression with one-vs-rest method on Flowster dataset and got only 0.48 accuracy. So, I scraped the data from Amazon website in order to prove the reason for low accuracy was about the small size of dataset. Luckily, I got around 90% accuracy on Amazon dataset.
Thanks a lot to Sara and Rohit for being great mentors and leaders!