Week: Weeks 1-3, 7/20-8/10
Overview of Things Learned
- Web Scraping using BeautifulSoup
- Data Analysis
- Data Pre-Processing
Technical Area
- Determining a forum suitable for scraping
- Web scraping data from a forum
Tools
Soft Skills
- Communication and working with a team using Slack
Achievement Highlights
- Performing web scraping on the Codeacademy website, creating a dataset containing title, comments, categories, tags, posts
- Worked together with the Codeacademy team
- Analyzed data for pre-processing
Meetings Attended
- Weekly ML Team 6 Meetings: 7/20, 7/23, 7/27, 7/29, 8/3, 8/5
Goals for the Upcoming Week
- Finish data pre-processing/ cleaning on the Stack Exchange data
- Gain a better understanding of nlp and machine learning techniques to be used on the dataset
Detailed Statement of Completed Tasks
- Worked with a team to determine that the Codeacademy online forum provided suitable data to be used in training the recommender system
- Created a plan for web scraping the website with team
- Performed web scraping on the Codeacademy online forum title, comments, categories, tags, posts and stored data in a csv file
Week: Weeks 4-6, 8/10-8/31
Overview of Things Learned
- Data Pre-processing
- Machine Learning Models: BERT and Simple Transformers
- Data Modeling
- Analyzing Results
Technical Area
- Data Pre-processing: tokenization, stopword removal, stemming, and lemmatization
- Implementing BERT, Simple Transformers, and TF-IDF
Tools
- Google Collaboratory
- Jupyter Notebook
- Microsoft Excel
Soft Skills
- Learned new software and machine learning concepts at a fast pace
- Problem solving
- Communicating research and results to a team
- Presenting technical research and results
Achievement Highlights
- Implemented and worked with machine learning models
- Researched methods of data modeling
- Collaborated with a team to successfully conduct data modeling
- Collaborated on a research presentation that displayed and analyzed results
Meetings Attended
- Weekly ML Team 6 Meetings: 8/10, 8/12, 8/13, 8/14, 8/15, 8/17, 8/19, 8/21, 8/24, 8/26, 8/27, 8/28
Detailed Statement of Completed Tasks
- Data Pre-processing: utilized tokenization, stopword removal, stemming, lemmatization, and removal of unwanted characters to clean data
- Data Modeling: researched and implemented BERT, Simple Transformers, and TF-IDF models to build a recommender system
- Presented and analyzed the final results of our finished NLP topic recommender system