Jeff_Weng_04 - Machine Learning (Level 1) Pathway

Things I Learned

Technical Area:

  • Independently examined simple and multivariable regression, weights, bias, MSE, and gradient descent
  • Examined the weaknesses of one-hot encoding, regex, N-Gram counting, and pretrained word vectors, as well as the implications of RNNS and LSTM as we transition to attention-oriented systems
  • Strengthened understanding of Machine Learning fundamentals (through the webinars and NLP Basics Series) such as Deep Learning, similarity, and attention models.
  • Examined some of the applications of linear algebra and calculus in machine learning.
  • Studied and employed webscraping and the documentation for utilizing specific libraries.
  • Studied the application of PyTorch in storing embeddings as matrices through torch.nn.Embedding, N-Gram language modeling, and CBOW

Tools

  • VS Code programming environment
  • BeautifulSoup4/Selenium/raw urllib for webscraping in Python
  • Git and GitHub Desktop for distributing/organizing programs into repositories
  • Google Colab / Jupyter Notebooks for data visualization

Soft Skills

  • Coordinated group progress across STEM-away forum, Slack, Trello, WhatsApp, and Google Forms and reconsolidated communicative abilities
  • Developed logistical organization skills through categorizing announcements/changes
  • Read official documentation and public discussion boards in order to effectively debug programs
  • Examined ethical practices behind webscraping and robot.txt files of different webpages in order to understand what can and cannot be scraped

Achievements

  • Established Discord server to augment Slack communication with more intuitive voice channel accessibility
  • Experimented with Bs4 and urllib in scraping data from DiscourseHub communities: (https://github.com/JeffreyW2468/ML2)
  • Utilized Selenium to scrape the DiscourseHub forums and return reply volume
  • Organized program into a GitHub repository

Tasks Completed

  • Integrated Discord with team communication and consolidated member feedback through Google Form check-in
  • Studied the webinars, NLP Basics, and other materials to solidify understanding of Machine Learning
  • Installed important packages through terminal and source pages
  • Scraped Discourse forums with Python
  • Practiced implementation of Git/GitHub

Outcome

  • Implemented a more intuitive workspace for team members, established a firm understanding of where members were in understanding, and gained critical insight into machine learning theory and application

Jeffrey Weng - ML Module 2, Level 1

Things I Learned

Technical Area:

  • Honed proficiency in integration/implementation of bs4, selenium, and pandas, and expanded from utilization of chromedriver to geckodriver as well
  • Continued practicing finding uniquely identifiable tags with HTML inspector for scraping
  • Established technical familiarity with implementation of csv/json file for data writing

Tools

  • VS Code
  • BeautifulSoup4/Selenium/pandas/chromedriver/geckodriver
  • Git/GitHub Desktop
  • Jupyter Notebook

Soft Skills

  • Developed PM skills through logistical and technical means by thoroughly addressing team member questions on Slack/WhatsApp and directing members to appropriate resources

Achievements

Tasks Completed

  • Explored data from cartalk (team-established DiscourseHub forum)
  • Implemented and examined resource programs through both chromedriver/geckodriver methodologies
  • Streamlined/troubleshooted programming environment with Python interpreter configurations

Outcome

  • Streamlined both personal and team workspace in both technical and logistical aspects and obtained useful data from scraping team designated forum

Jeffrey Weng - ML Module 3, Level 1

Things I Learned

Technical Area:

  • Leveraged Pandas libraries to integrate scraped csv data from cartalk community
  • Learned how to preprocess textual data through textblob, PorterStemmer, stopwords, lemmatization, n-grams, etc.
  • Visualized data with matplotlib, seaborn, and wordcloud
  • Strengthened understanding of term and inverse document frequency through writing program calculations in jupyter (ipynb)

Tools

  • VS Code
  • Pandas/numpy/nltk/sklearn/matplotlib/textblob/seaborn/wordcloud/tfidfvectorizer/pyLDAvis
  • Git
  • Jupyter Notebook

Soft Skills

  • Continued PM coordination through Google Form surveys, check-ins, and troubleshooting

Achievements

Tasks Completed

  • Visualized data with matplotlib/seaborn/wordcloud
  • Implemented bag of words with sklearn
  • Performed sentiment analysis on data

Outcome

  • Maintained team organization moving forward
  • Explored category-specific data (Leading Comments specifically) thoroughly

Jeffrey Weng - ML Module 3, Level 1 (Continued)

Things I Learned

Technical Area:

  • Ran four different classification models on data: naive bayes, decision tree, linear support vector machine, logistic regression
  • Refined combinations of methodologies for cleaning data in combined csv of scraped data (testing 5 different strategies i.e. lowercase + removal of special symbols)
  • Tested different feature selections and observed respective effects on model accuracy → moved forward with second feature selection strategy (author + topic title + leading comment + other comments + tags), as it produced the highest overall accuracy (with logistic regression having the highest in all cases)

Tools

  • VS Code
  • Pandas, numpy, nltk, sklearn, etc
  • Git + GitHub
  • Jupyter Notebook (individual code cell testing with .ipynb files in VSCode)

Soft Skills

  • Monitored and managed group across Slack, Trello, and Google Forms for presentation preparation / technical troubleshooting

Achievements

  • Trained basic ML models and recorded a variance of accuracy depending on data cleaning methodology and model type *Determined most appropriate model/strategy moving forward with the project

Tasks Completed

  • Implemented four separate classification models for cartalk dataset
  • Found best method of cleaning data for optimization of model accuracy
  • Calculated F1 score, recall, and precision

Outcome

  • Streamlined project results and prepared data for further analysis

Jeffrey Weng - ML Module 4, Level 1

Things I Learned

Technical Area:

  • Augmented initial four classifications with an additional three → Random Forest, XG Boost, and Light GBM
  • Tested and recorded accuracy output for the new models for each data cleaning strategy / feature selection, with an emphasis on feature 2
  • Examined implications/implementation of BERT ML model

Tools

  • VS Code
  • Pandas, numpy, nltk, sklearn, etc
  • Brew → libomp (for macOS)
  • Git + GitHub
  • Jupyter Notebook

Soft Skills

  • Strengthened project coordination abilities across platforms like Google Slides for presentational purposes
  • Helped troubleshoot minor technical difficulties / library implementation issues
  • Consolidated independent concept studying skills

Achievements

  • Studied and implemented classification models of larger complexity - developed tentative methodologies for optimizing model runtime in an Jupyter coding environment
  • Familiarized myself with the concept of BERT
  • Studied implementations of xlnet, xlm, roberta, and distilbert models

Tasks Completed

  • Added an additional three classification models to NLP project → observed implications through analysis of respective accuracy
  • Consolidated and resolved runtime inefficiencies with increased complexity of new models

Outcome

  • Gained important insights on the distinction between separate approaches (model/strategy/feature selection) with regards to accuracy/data analysis
  • Realized the importance of balanced data as opposed to unbalanced