Sunnie_Wang - Machine Learning (Level 3) Pathway

Module1 - Overview:

  • Technical skills:
    • Understand and practice how to extract data from Medline and prepare the raw data through Pubmed Parser.
    • Read and understand scientific papers: deepen understanding of biometrical relationships between entities such as drugs, genes and phenotypes.
    • Python: learned python coding and how to use Python to access web data.
    • Dependency-parsing: learned what Dependency Parsing is and what is the difference between Pubmed Parser and Stanford Parser.
  • Tools/Libraries:
    • Medline: learned database of Medline and how to extract data from it.
    • EBC:Learned what Ensemble Biclustering for Classification (EBC) and hierarchical clustering algorithms are and how they work as well as what is the advantage of EBC compared to other classifiers that don’t account for the semantic relatedness of different dependency paths.
    • STEM-AWAY website: get more information for further study.
  • Soft Skills:
    • Getting more familiar with the STEM-AWAY website and communication skills, more interaction with other team members.
    • Independence: Finished the presentation for Journal club prompts along with other team members, and gained more confidence.

Achievement Highlights:

  • Learned how to use Medline to get the raw data and use Pubmed Parser to prepare the raw data.
  • Finished Journal club prompts and built up some fundamentals.
  • Learned how to use Github and Git.
  • Raised my interests in bioinformatics after I read and studied the papers listed in the Prerequisites forum.
  • Deepen understanding the background of bioinformatics and Data Visualization.

Tasks Completed:

  • Finished Journal club prompts.
  • Completed raw data preparation for the next module to use.

Goals for The Upcoming Week:

  • Raw data parsing.

Module2 - Overview:

  • Technical skills:
    • Understand and practice how to use the Stanford Parser.
    • Read and understand scientific papers: deepen understanding of biometrical relationships between entities such as drugs, genes, and phenotypes.
    • Dependency-parsing: learned more foundational knowledge of Dependency Parsing and why it’s so important to our project as well as mathematics basics on transition parser.
  • Tools/Libraries:
    • Java: Downloaded and implemented it with parsing the .txt file.
    • Panda & Docker: Installed it and tried to have a basic understanding of how to use it.
  • Soft Skills:
    • Teamwork: Did a presentation on Stanford Parser within the team, and had an interactive discussion on it with team members.
    • Virtual-collaboration: Actively participated in training/Q&A sessions held by colin.

Achievement Highlights:

  • Learned how Dependency parsing works and what the foundational knowledge of Neural Transition Parser is.
  • Finished reading the Stanford Parser Manual to have a deep understanding of grammatical relationships between words and different format/style for the output.
  • Learned from other team members about how to use Docker to combine the code to save time.

Tasks Completed:

  • Completed text implementation on Stanford Parser.

Goals for The Upcoming Week:

  • Combine the output from the PubMed parser to the Stanford parser and embed it with EBC.

Module3 - Overview:

  • Technical skills:
    • Helped build a pipeline for extracting abstracts and improved the algorithms for getting all drug-gene pairs sentences and then extract the dependency path that connects them from Medline and used for stanford parser.
    • Implemented Stanford parser to extract the dependency paths for next step’s use - EBC
    • Understand more how EBC and ITCC algorithms work. Read EBC files on how to use EBC and make sure it works in my environment.
    • Read Dask library, learned how to use Dask, especially Dask bags, and made it work in my environment.
  • Tools/Libraries:
    • Dask, AWS, stanford parser, EBC
  • Soft Skills:
    • Teamwork: Helped improve algorithms on pre-processing.
    • I felt overwhelmed when it came to Module3. I have learned that understanding concepts is far from enough, how to implement it is more critical but it was really hard for me at first since I didn’t have any experiences with ML or Python language. I am glad I didn’t give up, Colin’s meetings and videos are super helpful, team members helped each other out. I appreciated it.

Achievement Highlights:

  • Documentation.
  • Work as a team to get the task done.
  • Connecting dots.

Tasks Completed:

  • Dependency matrices extraction.
  • Implemented pre-processing and EBC process - Unsupervised.

Goals for The Upcoming Week:

  • Data visualizations