Bioinformatics (Level 1)- Shreya Vora

Module 1:

Technical Area:

  • Downloading R and R studio and also the associated packages
  • Understanding why R is an important language
  • Reading a research article, and learning how to understand the figures in a research paper

Tools:

  • 7-zip
  • R
  • R Studio

Soft Skills:

  • Time management
  • Communication with my teammates for the journal club.

Hurdles:

  • I had a hard time figuring out how to download R and downloading the right version.
  • I had a difficult time navigating through the STEM-Away site to really figure out what I had to do. But going into the help sessions really helped

Achievements:

  • Downloaded R and R studio
  • Read a research paper and did a journal club

Module 2:

Technical:

  • I created a metadata using the GEO database.
  • I used excel to get some data from the larger database and then imported it into R Studio
  • I downloaded several packages for module 3
  • I also built an app in R Studio using R Shiny

Soft Skills:

  • Communication
  • Debugging and problem solving

Tools:

  • R/ R Studio
  • 7 zip
  • Excel
  • GEO Database
  • Youtube (for help)
  • R Shiny

Achievements:

  • Successfully downloaded data from the GEO database
  • Created a metadata file
  • Imported my metadata into r Studio using read.csv
  • I created a very simple R Shiny app

Hurdles:

  • I had a difficult time figuring out what data to really use in my metadata. I ended up looking at all the different data variables and I tried figuring out which ones would be the most efficient and needed.
  • I also had a hard time figuring out R Shiny but watching some videos on youtube really helped and I was able to create something on my own with that knowledge

Module 3:

Technical Skills:

  • Exporting the raw and metadata into R studio.
  • Normalizing the raw data using the rma() and fixing the raw data using the express()
  • Learning how to use the pcomp() to get the pca for the data
  • Learned how to plot the pca using the ggplot()

Soft Skills:

  • Time management
  • Collaboration with my teammates

Tools:

  • R/R studio
  • Slack
  • Youtube
  • GEO Database
  • Excel

Hurdles:

  • I was having a hard time with getting my raw data right in order to plot the pca plot, but once I used the exprs() it helped me get the plot correctly
  • I was also have a difficult time plotting the plot because of the way that the data was set up but once I used the cbind() it was a lot easier and it help organize my data for constructing the plot
  • I had issues with the length of the data itself which was messing up the pca data I was getting so I had the get a certain length of the data only (ex: norData = [1:120, ]

Achievements:

  • I was able to successfully normalize my raw data
  • I was able to get the pca for the normalized and the raw data
  • I was able to successfully plot the normalized data and raw data pca plots
1 Like

Module 4

Technical Skills:

  • Manually removed the outliers from both the raw and metadata
  • Annotated the data by removing the duplicate ProbeIDS, NA, and Duplicate symbols
  • I filtered out the genes when the mean was less than 2% using the quantile() and the rowmeans()
  • I did limma analysis using the limma package. We used the model.matrix() and lmFit() to do limma analysis.
  • Created a top table of of the top 10 and 50 DEGs using the toptable()
  • Created a heatmap using the pheatmap() of the top 10 and 50 DEGS

Soft Skills:

  • Time management
  • Collaboration with my teammates
  • Asking leads for help

Tools:

  • R/R studio
  • Slack
  • Packages: hgu133plus2.db, limma, pheatmap

Hurdles:

  • I had a lot of errors when doing the pheatmap. I asked Anya for help and one of the problems was with the dimensions of the data. As I debugged the errors, I was able to graph the pheatmap
  • Another problem I faced was deleting the outliers, but eventually we decided to delete them manually

Achievements:

  • We were able to successfully remove the outliers
  • Successfully problem solve by way through being able to graph the heatmap
  • Understood what the R shiny app project is really about and started to work on it (B1 and B2 teams)

Module 5

Technical Skills:

  • Created a vector of the logFC column and determined the threshold to be two so that we can get the upregulated genes
  • We connected our data with the logFC and the symbols to a new data and added the entrenId
  • We were able to successfully make three different bar plots for Cellular components, Biological Processes, and molecular function

BP

CC

MF

Soft Skills:

  • Time management
  • Collaboration with my teammates

Tools:

  • R/R studio
  • Slack
  • Packages: clusterProfiler, enrichplot, dplyr, org.Hs.eg.db, hgu133plus2.db
  • GitHub

Hurdles:

  • Me and my teammate had a bit of a struggle determining how to actually set the threshold value and what value to set it to. We were trying to figure out which ones to use, because the downregulated genes were more significant. But eventually we settled on a threshold value of 2
  • While working on the rShiny Application I fell into various types of errors. One of the errors dealt with the permissions on a given file. Eventually I was able to figure out how to solve the problem and run my app successfully!
  • We also had a challenge when we were trying to create the DEG Vector, but after talking it out we were able to resolve the problem.

Achievements:

  • Made a simple rShiny application in which the user can import any excel file and can see their data in a table format
  • Successfully create three box plots for gene ontology.

Module 6

Technical Skills:

  • After importing the necessary parameters and data into the applications we were able to visualize the KEGG pathways that were seen
  • Analysing the different methods of how the data was collected and what the data meant was important. For each application it was a little different.

Soft Skills:

  • Time management
  • Collaboration with my teammates

Tools:

  • Slack
  • EnrichR
  • DAVID
  • Metascape
  • Youtube

Hurdles:

  • While using the various tools I was having a difficult time because I was importing my file that had additional information other than the symbols. However I was able to problem solve the issue and was able to get the gene symbols alone and analyse the data
  • When it came to analysing the results that were presented by each application was difficult but looking at various youtube videos and as Ananya and I discussed the various results

Achievements:

  • We were able to successfully understand and analyse the results for KEGG analysis from the three difference applications
  • Rshiny App: I finished the data importation part of the application to fix it for the series matrix and the .csv/.txt files. Me and Disha also worked on the normalization of the data. We also got started on how to visualize the data using volcano plots and a heatmap.

Week 8: Rshiny App → Group B1

Technical:

  • I fixed the code that had the data importation part so that there is a more clear idea
  • Me and Disha worked on the normalization and were able to successfully complete it
  • I did the documentation for Group 1 and B2 for the rshiny project for all the different functions of our app and the different plots the user can see
  • I added more details and descriptions to the app so that the user can see the background of what is going on in the server end

Soft Skills:

  • Communication
  • Debugging and problem solving
  • Time Management

Tools:

  • R/ R Studio
  • Excel
  • GEO Database
  • Youtube (for help)
  • R Shiny
  • github

Achievements:

  • I was able to successfully add the data importation for csv and txt files
  • I was able to successfully make the documentation for Group A and B1
  • I added more details and descriptions for the user interface in the rshiny app

Hurdles:

  • One of the biggest hurdles was creating the heatmap. For some reason it was not working for me, and my team could not figure it out either. So eventually we decided to move on and work on the functional analysis.

My contributions are all on GitHub under Individual Contributions.