Maryam - Bioinformatics (Level 2) Pathway

Module 1:

Technical Area

  • Installing R and R Studio
  • Learning fundamentals of working with R like variables, different structures, functions, and packages
  • Drawing plots by general functions, ggplot2, and EnhancedVolcano in R
  • Getting familiar with papers, projects, and tools in the field of transcriptome analysis
  • Expanding knowledge about bioinformatics
  • Learning how to read and understand scientific papers

Tools:

  • R and R Studio
  • ggplot2 package
  • EnhancedVolcano package
  • CRAN and Bioconductor repositories

Soft Skills

  • Self-learning: I learned to work with R by myself.
  • Problem-solving: I learned to find the cause of errors in R by searching the problem on Youtube, Stack overflow, R Studio Community, so on.
  • Communication: I communicate with Anya to solving my questions
  • Learning to work with the STEM-AWAY website

Achievement Highlights

  • Installing and using R packages and functions
  • Understanding the relation between transcriptome analysis and finding biomarkers and hub genes of diseases
  • Understanding that how this internship is implemented.

Tasks Completed

  • I installed R and R Studio
  • I installed ggplot2 and EnhancedVolcano and drew with them different plots.
  • I constructed a Github account
1 Like

Module 2:

Tools:

  • GEO Database
  • Git hub
  • Git

Technical Area:

I was able to:

  • Catch raw data from GEO
  • Acquire metadata file
  • Install Bioconductor packages and general packages in R
  • Load data into the R environment
  • Working with GitHub

Soft skills:

  • Communicating by the oral and verbal English language
  • Obtaining teamwork skills
  • Developing self-study skills

Three achievement highlights:

  • Communicating with people internationally
  • Fetching data from databases and entring them to R software
  • Using Git hub

module 3:

Technical area:

  • Learning how to work with different quality control packages and finding outliers.
  • Learning how to Normalizing, background correcting, and log transforming data.
  • Being familiar with biomaRt annotation package in collaboration with group A for the R Shiny app.
  • Learning adding professional features to arrayQualitymetrix results for the R Shiny app.
  • Learning how to work with GitHub by R Studio
  • Trying to build my first Shiny app

Soft Skills:

  • Elevating writing and verbal English skills in communicating with mentors and teammates.
  • Attending several meetings and reporting work progress
  • Time management
  • Elevating self-learning skills

Tools:

  • RMA package
  • SVA package
  • ArrayQualityMetrix package
  • Simpleaffy package
  • affyPLM package
  • QCReport package
  • Git in RStudio
  • R Shiny package

Achievements:

  • Finding outliers with a high degree of confidence
  • Successfully removing batch effect and preprocessing microarray data
  • Mastering working with Git functions
  • Finding the best quality control packages for finding outliers for the R Shiny app

Hurdles:

  • The color of annotations in the heatmap was not shown. After several searches and reading different tutorials I found that the row names of data fram that we build for annotation should be identical with columns names of the correlation matrix.
  • Some quality control packages need a lot of memory that was fixed by the memory.limit() function.
  • We didn’t know where we should put the deliverables of module 3. After speaking with Dravie (my teammate) about it we asked Anya and she added us to mentorchain repository on Github for doing that.

Deliverables:

  • Boxplot comparing RawData, Preprocessed data, and Batch corrected data
  • heatmap comparing RawData, Preprocessed data, and Batch corrected data
  • PCA comparing RawData, Preprocessed data, and Batch corrected data
  • Outlier report
  • Quality control codes
  • Results of Quality control packages

The above deliverables can be caught here: (https://github.com/mentorchains/bioinformatics-pathway/tree/main/Maryam%20Momeni/Module%203)

Module 4:

Technical area:

  • Collaborating in testing the Shiny app
  • Removing outliers from datasets
  • Getting the annotations from Databases
  • Removing duplicated Probe IDs and the genes which don’t have any corresponding Probe IDs
  • Selecting a representative row from the rows with duplicated Symbols
  • Filtering out genes below the 2nd percentile of the expression distribution of the dataset
  • Differential expression analysis by Limma package
  • Plotting a volcano plot in which the differentially expressed genes are identified
  • Plotting a Heatmap with expression values of the top 50 most significant genes

Soft skills:

  • Presenting a presentation on teaching the codes in detail to the others
  • Troubleshooting the Errors that I was being got from implementing the codes

Tools:

  • hgu133plus2.db
  • limma package
  • EnhancedVolcano package

Achievements:

  • Obtaining the list of differentially expressed genes of Colorectal cancer
  • Plotting Volcano plot and heatmap on the expression values of differentially expressed genes
  • Preparing a presentation on module 4

Hurdles:

I had some problems in removing duplicated Prob IDs, removing genes that were not mapped to any symbols, and selecting a representative for rows with duplicated Symbols. That was fixed by searching for different tutorials and different sites on the net.

Deliverables:

  • Heatmap of 50 top DEGs
  • Volcano plot
  • Differential expression analysis, annotation, and filtering codes

The above deliverables can be caught here: (https://github.com/mentorchains/bioinformatics-pathway/tree/main/Maryam%20Momeni/Module%204)

Module 5:

Technical area:

  • Implementing Gene ontology for colorectal cancer and plotting
  • Implementing KEGG pathways enrichment analysis for up-regulated genes of colorectal cancer and showing that in barplot and dot plot
  • Plotting Gene-concept network for KEGG pathways
  • Plotting Gene-concept network for TFs
  • Implementing GSEA

Soft skills:

  • Finding a study design and suitable datasets for the case study of R Shiny app’s article
  • Troubleshooting the errors that I was being got from implementing the codes

Tools:

  • Enrichplot
  • org.Hs.eg.db
  • msigdbrmagrittr
  • clusterProfiler
  • enrichplot
  • tidyr
  • clusterProfiler

Achievements:

  • Obtaining the pathways involved in colorectal cancer
  • Obtaining GSEA plot

Hurdles:

I was getting an error while wanted to perform Gene-concept-analysis. So I searched about the problem and find the answer in one of the forums. I updated the clusterProfiler package and the problem was fixed.

Deliverables:

  • Gene ontology plots of up-regulated DEGs
  • GSEA plot
  • Gene-concept network of transcription factors
  • Gene-concept network of enriched pathways of up-regulated genes
  • KEGG pathway analysis plot of up-regulated DEGs
  • Module 5 R codes

The above deliverables can be caught here: (https://github.com/mentorchains/bioinformatics-pathway/tree/main/Maryam%20Momeni/Module%205)

Module 6‎:

Technical Area

• Export the up-regulated genes from R

• Further analysis on up-regulated genes

‎ Tools

• Enrichr

• Metascape

• GEPIA

Soft Skills

• Analysis of the differentially expressed genes, enriched pathways by them, analysis of PPI ‎networks, survival analysis, exploration about pathways.‎

• Also, we are writing a case study for the R Shiny app’s (sMAP) article. I am improving my ‎scientific reporting skills as well as teamwork skills.‎

Tasks completed

• I performed an Enrichment analysis by Enrichr and obtained all the pathways, gene ‎ontologies, TFs, drugs, and diseases that were enriched for the up-regulated genes. Then I imported all that genes to Metascape and compared the pathways that were ‎enriched in Metascape and Enrichr with each other.‎

After that, I did a survival analysis for five genes and implemented some further analysis ‎like normal/cancer comparison and analysis of genes by the stage of cancer by GEPIA

Deliverables:

  • GO of up-regulated genes from Enrichr
  • KEGG pathways of up-regulated genes from Enrichr
  • Survival analysis of MEF2C in GEPIA
  • WikiPathways of up-regulated genes from Enrichr

The above deliverables can be caught here:https://github.com/mentorchains/bioinformatics-pathway/tree/main/Maryam%20Momeni/Module%206

Module 7:‎ (Capstone project)

Technical Area

• Reading datasets and meta-data in R

• Quality control

• Normalization and background correction

• Batch effect removal

• Annotation and gene filteration

• KEGG pathway and GO analysis

• GSEA

• Gene concept network analysis

• TFs analysis

• PPI network analysis by Cytoscape

• Survival analysis

‎ Tools

• R packages: Affy, arrayQualityMetrics , sva, ggplot2, pheatmap, WGCNA, limma, ‎EnhancedVolcano, hgu133plus2.db, enrichplot, org.Hs.eg.db, msigdbr, magrittr, ‎clusterProfiler, enrichplot, tidyr, clusterProfiler, Rcpp

• Cytoscape (STRING and Cytohubba plugins)‎

• GEPIA

Soft Skills

• I prepared a presentation of my Capstone project. So I worked on my presentation ‎skills

• Preparing Powerpoint for the presentation

Tasks completed

I merged two datasets containing 70 samples of lung cancer and removed the batch effect ‎between them and then implemented differential expression analysis for them. After ‎obtaining DEGs, I found enriched KEGG pathways and GO enriched terms for them. Then, I ‎plotted a gene-concept network and TF network for them and performed a GSEA. To find ‎hub genes I plotted a PPI network and found the 10 key genes in that network. After that, I ‎implemented the survival analysis for those 10 genes and found 5 genes that the value of ‎their expression was effective in the survival of patients with lung cancer.‎

The presentation video of the capstone project and the R codes and all deliverables can be caught here: https://github.com/mentorchains/bioinformatics-pathway/upload/main/Maryam%20Momeni/Capstone%20project