Wafaa - Machine Learning (Level 3) Pathway

Module 1- Week 1

Things I have Learned:

  • Technical skills:

    • I have been introduced to the idea of a journal club. Which is a discussion about a scientific paper in which each individual is responsible for analyzing the paper from a different perspective. My team and I had our journal club and it was very informative and engaging.
  • Soft Skills:

    • I was able to lead 3 meetings this week and get the team members to participate and share their ideas confidently.
    • I made sure to appreciate the efforts of the team members, people tend to do better when appreciated.
    • I tried to get everyone engaged and encouraged the team spirit, so whenever we have a lot of tasks to do, I delegate some tasks to the team members.
    • I tried to handle all the technical issues and report them as needed. the on-boarding process is usually not easy.

Three Achievement Highlights:

  • Conducting a journal club.
  • Leading meetings.
  • Accommodating members’ needs and issues.

Goals for The Upcoming Week:

  • Setting up a GitHub repository.
  • Raw data parsing.

Tasks Done:

  • Journal club.
  • Trello board creation.
  • google group creation.

Module1- Week2:

Things I have learned:

  • Technical skills:

    • I have learned how to use the pubmed parser to parse the Medline XML files in the form of python data frames in preparation for further preprocessing.
  • Soft skills:

    • The importance of task delegation among team members. When splitting the work into simple tasks and divide them among us, we were able to get things done quickly and efficiently.

Three Achievement Highlights:

  • Being able to learn new skills quickly
  • Ability to work with data without prior knowledge about the theoretical details of them.
  • I was able to hold weekly meetings for my team and got everyone to participate and share their progress

Goals for The Upcoming Week:

  • Extract entities from the sentences using Stanford parser

Tasks Done:

  • Learning about the format of Medline publications
  • learning to use Pubmed Parser
  • writing optimized and generic code to perform the parsing

Module2- Week3:

Things I have learned:

  • Technical skills:

    • I have filtered the publications according to which ones contain abstracts; because I only need to process the abstracts
    • I have learned to use string matching to extract the sentences that contain drug-gene pairs to be input to the stanford parser
    • I have learned how to use the Stanford parser to .extract the dependency paths of the drug-gene pairs.
  • Soft skills:

    • It’s okay to ask for help when you are stuck on something. This is the importance of working in teams and specially with people from different backgrounds

Three Achievement Highlights:

  • Collaboration with other teams
  • Still maintaining the team spirit
  • Documenting progress

Goals for The Upcoming Week:

  • Building a pipeline to process the entire database in parallel using Dask

Tasks Done:

  • Data filtering
  • String matching
  • dependency path extraction

Module2- Week4:

Things I have learned:

  • Technical skills:

    • I have been introduced to Dask library, which is a library that is used for parallel processing of data
    • we used Dask bags to create a pipeline that can take data files in chunks and parse them; this way we will not have to store all the data in the RAM
    • I learned how to use an AWS cluster and how you can use jupyter notebooks in it, we transferred everything into AWS cloud
    • I used the parallel processing pipeline to extract the final dependency matrix; which consists of rows of drug-gene pairs and columns of dependency paths
  • Soft skills:

    • I got used to my team members and we have friendly relationships now, which is very important for motivation and encouragement

Three Achievement Highlights:

  • Learned about a new library (Dask)
  • Learned about AWS clusters
  • A Concrete team bond

Goals for The Upcoming Week:

  • Passing the processed data into the EBC algorithm, the core machine learning part of the project

Tasks Done:

  • creating dask bags
  • moving things to AWS
  • generating the final dependency matrix

Module3- Week5:

Things I have learned:

  • Technical skills:

    • I have gone through the paper thoroughly to understand the theoretical concepts of the Ensemble Biclustering Algorithm
    • I have learned that the algorithm has two steps (supervised, unsupervised) and decided to dedicate this week to the first one, i.e. the unsupervised step
    • I have used the ITCC algorithm to co-cluster each drug-gene pair and run it 100 times, the result is a co-occurrence matrix with rows and columns of drug-gene pairs, and values which correspond to how often two pairs are clustered in the same cluster
  • Soft skills:

    • I experienced the importance of sharing knowledge between team members, and having people from different experience levels help each other
    • I also learned the importance of documenting work, both for team member and for external audience

Three Achievement Highlights:

  • Understanding the intuition of the EBC algorithm
  • Implementing the unsupervised step of EBC
  • Generating the co-occurrence matrix

Goals for The Upcoming Week:

  • Implementing the supervised step of EBC

Tasks Done:

  • Unsupervised step of EBC
  • team members’ progress tracking

Module3- Week6:

Things I have learned:

  • Technical skills:

    • I have implemented the second step of the biclustering algorithm which is the supervised step.
    • I obtained known drug-gene relationships from DrugBank and constructed seed sets and test sets
    • I used the EBC scoring rule to determine how often drug-gene pairs are co-clustered with a ground truth relationship
  • Soft skills:

    • Started preparing for the final presentation of our work
    • Assigned tasks to team members to work on the presentation

Three Achievement Highlights:

  • Implementing the supervised step of EBC
  • Generating the final co-occurrence scores
  • Initiating the work on the final presentation

Goals for The Upcoming Week:

  • Showcasing the entire workflow

Tasks Done:

  • supervised step of EBC

Module4- Week7:

Things I have learned:

  • Technical skills:

    • I created a final version of the code that includes comments clarifying what each part of the code is doing
    • prepared the final presentation of the team and presented it to our mentors
  • Soft skills:

    • Organizing a team presentation, smooth transition between slides, and perfect use of the allocated time

Three Achievement Highlights:

  • Showcasing of out entire workflow
  • Getting very helpful comments from mentors
  • Giving credit to each team member on what they did

Goals for The Upcoming Week:

  • WE ARE DONE :smiley:

Tasks Done:

  • Final presentation