Module 1 Self Assessment
Week 1 and 2: 28th June 2021
Overview of Things Learned:
Technical Area:
- I have past experience in machine learning so brushed up my concepts once again via the resources.
- Talking specifically about the NLP part, I went through the mentors webinars apart from the links
- I can comfortably use the ML project workflow
- I have prior experience in ML so this module helped me revise the concepts like Word Embedding, Logistic Regression etc. again in depth
Tools:
- Learned the basics of parsing HTML text using Beautiful Soup and Selenium WebDriver
- I have experience with Google Colab and Jupyter Notebook but the data is huge so I would be using Google Colab for the further modules
- Learned how to use the python library Spacy, this was a new library I came across.
- Sentence_transformers and Transformers helped me use BERT encoding in python
- I have knowledge of basics of Pytorch as well as TensorFlow 2.0 but revised again and that really helped.
Soft Skills:
- Being the project Lead I had meeting with the leads and we discussed about the plan for our team and we go about it
- Setup the Discord server for communication for the team
- Tried to have a conversation with all the participants apart from the leads as well.
- Planned the first meeting with the participants.
- Going to host the meeting and explain about the modules, self assessments, deadlines in this meeting
Achievements Highlights:
- Gained knowledge of more Python libraries
- Successfully scraped some web pages and got data from them
- Got familiar about how NLP is used nowadays, and how to use it in making projects
Meetings attended
- Leads only meeting with 3 fellow leads and planned the strategy for the upcoming week and rough outline on how things would go
- ML Level 1 with Sara along with fellow Project Lead and got insights on how we can progress further as a team and got my queries resolved.
- Team Meeting ML Level 1 with the all the participants with an icebreaking session
- Team Meeting 2 with the complete team and described the module 2
Goals for the upcoming week:
- Module 2 resources and tutorials to be started
- Choosing the forum to be worked upon as a team
- Attending 5-10 minute scrum meetings to check in on team progress.
Tasks Completed:
- Successfully scraped a webpage for data
- Using Beautiful Soup
- Used Selenium for the first time so a bit difficult.
- Used Transformers in python in order to convert text into either negative or positive
- Used the logistic regression machine learning model in order to train this program
- Hosted the 2 weekly meeting with the fellow leads and introduced the participants to the modules
1 Like
Module 2 Self Assessment
Week 3 and 4: 12th July 2021
Overview of Things Learned:
Technical Area:
- Familiarized myself with fundamentals of EDA
- Learned how to navigate HTML and extract certain components of the website
- Got more practice with scraping - extract text and dates
- Basic Clean the Data and Analyze the data
- I have chosen Pytorch to scrape data.
- I Scraped the data from these forum using Beautiful Soup & Selenium library and stored the data in a csv file.
- I have used different Data cleaning and EDA techniques to explore the scraped data.
- Understood the logic behind recommender systems and ML algorithms
Tools:
- Beautiful Soup , Selenium Webdriver, Numpy ,Pandas ,Matplotlib, Scikit-Learn, Wordcloud, NLTK library,GitHub, Spacy, CSV, Requests
- Git / GitHub
- STEM-Away Platform, Discord, Jira
- Jupyter Notebook, Google Colaboratory
Soft Skills:
- Communication with participants and Interleads of other teams
- Attended The Office Hours updated Sara with the team’s progress and our future goals and deadlines
- Going to host the meeting and explain about the modules 3
Achievement highlights
- Developed a good understanding of the web crawler.
- Experimented with beautiful soup with HTML structure using the colab environment. Learned how to perform EDA and data cleaning before the machine learning process.
- Successfully scraped the data from Pytorch forum using Beautiful Soup & Selenium Library
- Learned the basics of how to scrape a website and ways data can be formatted
Meetings attended
- Attended the Lead Help session to provide a status update on your team
- ML Level 1 - Sara along with fellow Project Lead
- Team Meeting 3 with the complete team
- Team Meeting 4 which was a Game Night
Goals for the Upcoming Week
- Experiment with the dataset with basic machine learning models to see the classification results.
- Module 3 resources and tutorials to be started
Tasks Completed
- Set up the Git repo for the team and invited all the participants to contribute.
- Got thoroughly familiar with the Discourse platform. Chose the forum I will working on i.e Pytorch forum
- Learn how to write the data crawled from the website to CSV file.
- Scraped the messages from one of the boards into a CSV and removed the HTML tags
- Done with basic data analysis and visualization
Module 2 code submission - ml-session2-team5/DeepikaRana at Module2 · mentorchains/ml-session2-team5 · GitHub
Module 3 Self Assessment
Week 5 and 6: 26th July 2021- 6th August 2021
Overview of Things Learned:
Technical Area:
-
I learned about some Basic Machine Learning models like Naive Bayes , Linear SVM, Logistic Regression, Decision Tree, Random Forest, XGBoost,LightGBM
-
I also incorporated Cross Validation with Linear SVM,Random Forest, XGBoost,LightGBM model to generate the results.
-
Build the Machine Learning model & pipeline and used doc2vec,tf-idf embeddings with 3 machine learning models(Random Forest, XGBoost & logistic regression) [found that these 3 models are giving better accuracy.]
-
Ran the TF-IDF text embedding with different models and achieved higher accuracy for logistic regression pushed results on the Github repo
Tools:
- Numpy ,Pandas ,Matplotlib, Scikit-Learn, Wordcloud, NLTK library,GitHub, Spacy, CSV, Git / GitHub, Google Colaboratory
Soft Skills:
- Communication with participants and fellow leads to plan on the team presentation
- Attended a meeting with the mentor Anubhav asked queries about the data imbalancing
- Going to co-host the presentation coming tuesday so planned for that.
- Prepared the basic ppt presentation for the ML Level 1 Team presentation
Achievement highlights
-
I experimented the dataset with basic machine learning models( Naive Bayes , Linear SVM, Logistic Regression, Decision Tree, Random Forest, XGBoost,LightGBM) to see the classification results.
-
I used tf-idf,doc2vec embeddings with 3 machine learning models(Random Forest, XGBoost & logistic regression) to see the classification results.
-
Learned the process of the BERT neural architecture read papers and understood the theoretical foundations of the BERT neural architecture.
-
Successfully learned how to perform text embedding along with machine learning models.
-
Learning deep learning frameworks and applied them to this classifier project.
Meetings attended
- Attended the Mentor’s meeting ML Level 1 with Anubhav to discuss and resolve queries regarding the final project
- Team Meeting 5 with the complete team to discuss about Module 3
- Team Meeting 6 for discussing the results of our models
- Team Meeting 7 for planning of the team presentation.
Goals for the Upcoming Week
- Experiment with the dataset to achieve higher accuracy and try more classification models
- Deploy the Pytorch Forum Classification System as a Web Application.
- Module 4 resources and tutorials to be started
Tasks Done
- Improved data imbalance in the categories by using the top 15 categories for the classifier system
- Achieved a accuracy of 64% using the Linear SVM without any hyper parameter tuning
- Basic Modeling and Advanced Embedding methods applied on the model and the corresponding notebooks pushed to the Team Repo.
Module 3 Code Submission - ml-session2-team5/DeepikaRana at Module3 · mentorchains/ml-session2-team5 · GitHub
Module 4 Self Assessment
Week 7 and 8: 7th August 2021- 20th August 2021
Overview of Things Learned
- I learned about how to train BERT, XLNet, roBERTa, distilbert models using the Simple Transformers library.
- I learned about how to combine an advanced model like BERT and a simple ML model like Logistic Regression.
- Learned the process of the BERT neural architecture, read papers and understood the theoretical foundations of the BERT neural architecture.
- Read articles on the optimised versions of BERT to know more details of which would be more suitable for our Classifier for Pytorch Forum.
Tools:
- simple transformers, tokenizers==0.9.4,sklearn,tarfile,html,css, Flask api, Docker
- Google Colaboratory
Soft Skills:
- Presented the team presentation for ML Level 1 Team 5 with the complete team
- Improved the earlier presentation based on the feedback from Debaleena and Mentors.
- Communication with participants and fellow leads to plan on the team final presentation
- Going to co-host the final presentation on Friday so planned for that.
Achievement highlights
- Learned the process of the BERT neural architecture, read papers and understood the theoretical foundations of the BERT neural architecture.
- Read articles on the optimised versions of BERT to know more details of which would be more suitable for our Classifier for Pytorch Forum.
- I successfully trained BERT,Roberta,xlnet,Distilbert models using the Simple Transformers library.
Meetings attended
- ML Level 1 Presentation for our Team 5
- Team Meeting 8 with the complete team to discuss about Module 4
- Team Meeting 9 for planning of the team final presentation.
Goals for the Upcoming Week
- Present the work and the final presentation to the mentors with the team
Tasks Done
- Compared the Accuracy,Evaluation_loss,F1_Score,MCC of the 4 advanced models.
- Improved data imbalance in the categories by using the top 15 categories for the classifier system
- Achieved a highest accuracy of 80.5 % using the XLNet
Module 4 Code Submission - ml-session2-team5/DeepikaRana at Module4 · mentorchains/ml-session2-team5 · GitHub