Tanishk - Machine Learning (Level 3) Pathway

Items Learned

  1. Technical Skills
  • Understanding of Drug-Gene Relationships
  • Reading on Vector Spaces and uses in Semantics
  • Handling large amounts of Data
  1. Tools
  • Google Suite, Discord, PyCharm
  1. Soft Skills
  • Team Communication and Collaboration
  • Setting Up Deadlines and executing tasks before them
  • Presentation Skills
  • Reading/Understanding Academic Papers

Three Achievement Highlights

  1. Familiarized myself with provided paper: " Learning the Structure of Biomedical Relationships from Unstructured Text"
  2. Presented on medical aspect of Drug-Gene Relationships
  3. Setup of Google Email, Discord, Google Colaboratory,

Goals for Next Week

  1. Prepare for journal club
  2. Begin the process of downloading .xml files

Tasks Completed

  • Watched/read provided materials and presented on the biological aspect of drug-gene relationships
  • Acclimatized with Stem-Away website and profile
  • Participated in Team Meetings

Items Learned

  1. Technical Skills
  • Downloading large amounts of data
  • Data Type conversions
  • Writing good code → well-documented, intuitive, easy-to-understand, etc.
  1. Tools
  • Google Suite, Discord, PyCharm, Medline
  1. Soft Skills
  • Team Communication and Collaboration
  • Setting Up Deadlines and executing tasks before them
  • Reading/Understanding Academic Papers
  • Documenting Code for collaboration

Three Achievement Highlights

  1. Wrote Python Script to pull .xml files from Medline and select abstracts
  2. Continued to read through related paper
  3. Understood basics/framework for data cleaning

Goals for Next Week

  1. Prepare for journal club
  2. Continue working on the data cleaning

Tasks Completed

  • Python script → pulling multiple .xml files, converting to .csv/Pandas dataframe, filtering abstracts,
  • Participated in Team Meetings

Items Learned

  1. Technical Skills
  • Understanding basics of Stanford parser → input, output, and methods
  • Continued to work on abstract filter using the parser → finding sentences that are between 4 and 50 words, etc.
  • Handling large amounts of Data
  1. Tools
  • Google Suite, Discord, PyCharm, Medline, Stanford Parser, Terminal
  1. Soft Skills
  • Team Communication and Collaboration
  • Setting Up Deadlines and executing tasks before them
  • Presentation Skills
  • Reading/Understanding Academic Papers
  • Troubleshooting

Three Achievement Highlights

  1. Got Stanford Parser to give appropriate output using Terminal and Jython
  2. Setup preliminary filter for abstracts
  3. Began to understand EBC algorithm

Goals for Next Week

  1. Migrate the Stanford Parser to Python such that it can be integrated with the rest of the pipeline
  2. Continue to work on filter

Tasks Completed

  • Worked with Terminal, Jython, and Python to utilze the Stanford Parser to break down sentences from abstracts to extract drug-gene relationships
  • Used GitHub repository of common English words to remove insignificant terms in sentences
  • Participated in Team Meetings

Items Learned

  1. Technical Skills
  • Understood EBC algorithm → input/co-occurence matrix, output, and scoring function
  • Read supplementary materials on hierarchical clustering
  • Handling large amounts of Data
  1. Tools
  • Google Suite, Discord, PyCharm, Stanford Parser, Terminal, Jython, Python, spaCy
  1. Soft Skills
  • Team Communication and Collaboration
  • Setting Up Deadlines and executing tasks before them
  • Presentation Skills
  • Reading/Understanding Academic Papers
  • Troubleshooting

Three Achievement Highlights

  1. Understood EBC algorithm
  2. Began to use outputs from EBC for dendrogram creation
  3. Prepared R for next week

Goals for Next Week

  1. Finish Dendrograms
  2. Setup GitHub Repository for all work

Tasks Completed

  • Read supplementary materials on EBC and hierarchical clustering
  • Began researching/preparing R for dendrograms
  • Exploration of AWS and Dask for computing
  • Participated in Team Meetings

Items Learned

  1. Technical Skills

*Finished creating sets of dendrograms for preliminary and finished data (two from a few .xml files and two from our own results)

  • Setup the GitHub to exhibit our work
  • Worked on running some .xml files locally
  1. Tools
  • Google Suite, Discord, PyCharm, Medline, Stanford Parser, Terminal, R, GitHub, RStudio, ape, purrr, data.table, protoclust
  1. Soft Skills
  • Team Communication and Collaboration
  • Setting Up Deadlines and executing tasks before them
  • Presentation Skills
  • Reading/Understanding Academic Papers
  • Troubleshooting

Three Achievement Highlights

  1. Created 4 separate dendrograms
  2. Setup group GitHub
  3. Used pipeline from AWS to run a couple of files locally

Goals for Next Week

  1. Improve dendrograms → add frequency bars/dots
  2. Put finishing touches on group GitHub

Tasks Completed

  • Watched/read provided materials
  • Familiarized myself with R and RStudio. Used aforementioned tools to build dendrograms
  • Setup GitHub and used markdown text to make it visually appealing.
  • Participated in Team Meetings

Paper Data - Dendrogram #1

Paper Data - Dendrogram #2 Group Data - Dendrogram #1 Group Data - Dendrogram #2 (This dendrogram looks slightly odd as the EBC algorithm works well with larger datasets)