Items Learned
- Technical Skills
- Understanding of Drug-Gene Relationships
- Reading on Vector Spaces and uses in Semantics
- Handling large amounts of Data
- Tools
- Google Suite, Discord, PyCharm
- Soft Skills
- Team Communication and Collaboration
- Setting Up Deadlines and executing tasks before them
- Presentation Skills
- Reading/Understanding Academic Papers
Three Achievement Highlights
- Familiarized myself with provided paper: " Learning the Structure of Biomedical Relationships from Unstructured Text"
- Presented on medical aspect of Drug-Gene Relationships
- Setup of Google Email, Discord, Google Colaboratory,
Goals for Next Week
- Prepare for journal club
- Begin the process of downloading .xml files
Tasks Completed
- Watched/read provided materials and presented on the biological aspect of drug-gene relationships
- Acclimatized with Stem-Away website and profile
- Participated in Team Meetings
Items Learned
- Technical Skills
- Downloading large amounts of data
- Data Type conversions
- Writing good code → well-documented, intuitive, easy-to-understand, etc.
- Tools
- Google Suite, Discord, PyCharm, Medline
- Soft Skills
- Team Communication and Collaboration
- Setting Up Deadlines and executing tasks before them
- Reading/Understanding Academic Papers
- Documenting Code for collaboration
Three Achievement Highlights
- Wrote Python Script to pull .xml files from Medline and select abstracts
- Continued to read through related paper
- Understood basics/framework for data cleaning
Goals for Next Week
- Prepare for journal club
- Continue working on the data cleaning
Tasks Completed
- Python script → pulling multiple .xml files, converting to .csv/Pandas dataframe, filtering abstracts,
- Participated in Team Meetings
Items Learned
- Technical Skills
- Understanding basics of Stanford parser → input, output, and methods
- Continued to work on abstract filter using the parser → finding sentences that are between 4 and 50 words, etc.
- Handling large amounts of Data
- Tools
- Google Suite, Discord, PyCharm, Medline, Stanford Parser, Terminal
- Soft Skills
- Team Communication and Collaboration
- Setting Up Deadlines and executing tasks before them
- Presentation Skills
- Reading/Understanding Academic Papers
- Troubleshooting
Three Achievement Highlights
- Got Stanford Parser to give appropriate output using Terminal and Jython
- Setup preliminary filter for abstracts
- Began to understand EBC algorithm
Goals for Next Week
- Migrate the Stanford Parser to Python such that it can be integrated with the rest of the pipeline
- Continue to work on filter
Tasks Completed
- Worked with Terminal, Jython, and Python to utilze the Stanford Parser to break down sentences from abstracts to extract drug-gene relationships
- Used GitHub repository of common English words to remove insignificant terms in sentences
- Participated in Team Meetings
Items Learned
- Technical Skills
- Understood EBC algorithm → input/co-occurence matrix, output, and scoring function
- Read supplementary materials on hierarchical clustering
- Handling large amounts of Data
- Tools
- Google Suite, Discord, PyCharm, Stanford Parser, Terminal, Jython, Python, spaCy
- Soft Skills
- Team Communication and Collaboration
- Setting Up Deadlines and executing tasks before them
- Presentation Skills
- Reading/Understanding Academic Papers
- Troubleshooting
Three Achievement Highlights
- Understood EBC algorithm
- Began to use outputs from EBC for dendrogram creation
- Prepared R for next week
Goals for Next Week
- Finish Dendrograms
- Setup GitHub Repository for all work
Tasks Completed
- Read supplementary materials on EBC and hierarchical clustering
- Began researching/preparing R for dendrograms
- Exploration of AWS and Dask for computing
- Participated in Team Meetings
Items Learned
- Technical Skills
*Finished creating sets of dendrograms for preliminary and finished data (two from a few .xml files and two from our own results)
- Setup the GitHub to exhibit our work
- Worked on running some .xml files locally
- Tools
- Google Suite, Discord, PyCharm, Medline, Stanford Parser, Terminal, R, GitHub, RStudio, ape, purrr, data.table, protoclust
- Soft Skills
- Team Communication and Collaboration
- Setting Up Deadlines and executing tasks before them
- Presentation Skills
- Reading/Understanding Academic Papers
- Troubleshooting
Three Achievement Highlights
- Created 4 separate dendrograms
- Setup group GitHub
- Used pipeline from AWS to run a couple of files locally
Goals for Next Week
- Improve dendrograms → add frequency bars/dots
- Put finishing touches on group GitHub
Tasks Completed
- Watched/read provided materials
- Familiarized myself with R and RStudio. Used aforementioned tools to build dendrograms
- Setup GitHub and used markdown text to make it visually appealing.
- Participated in Team Meetings
Paper Data - Dendrogram #1
Paper Data - Dendrogram
#2
Group Data - Dendrogram
#1
Group Data - Dendrogram
#2
(This dendrogram looks slightly odd as the EBC algorithm works well with larger datasets)