Maryam - Bioinformatics (Level 2) Pathway

Maryam · July 8, 2021, 5:48am

Module 1:

Technical Area

Installing R and R Studio
Learning fundamentals of working with R like variables, different structures, functions, and packages
Drawing plots by general functions, ggplot2, and EnhancedVolcano in R
Getting familiar with papers, projects, and tools in the field of transcriptome analysis
Expanding knowledge about bioinformatics
Learning how to read and understand scientific papers

Tools:

R and R Studio
ggplot2 package
EnhancedVolcano package
CRAN and Bioconductor repositories

Soft Skills

Self-learning: I learned to work with R by myself.
Problem-solving: I learned to find the cause of errors in R by searching the problem on Youtube, Stack overflow, R Studio Community, so on.
Communication: I communicate with Anya to solving my questions
Learning to work with the STEM-AWAY website

Achievement Highlights

Installing and using R packages and functions
Understanding the relation between transcriptome analysis and finding biomarkers and hub genes of diseases
Understanding that how this internship is implemented.

Tasks Completed

I installed R and R Studio
I installed ggplot2 and EnhancedVolcano and drew with them different plots.
I constructed a Github account

Maryam · September 7, 2021, 6:03pm

Module 2:

Tools:

GEO Database
Git hub
Git

Technical Area:

I was able to:

Catch raw data from GEO
Acquire metadata file
Install Bioconductor packages and general packages in R
Load data into the R environment
Working with GitHub

Soft skills:

Communicating by the oral and verbal English language
Obtaining teamwork skills
Developing self-study skills

Three achievement highlights:

Communicating with people internationally
Fetching data from databases and entring them to R software
Using Git hub

Maryam · September 7, 2021, 6:06pm

module 3:

Technical area:

Learning how to work with different quality control packages and finding outliers.
Learning how to Normalizing, background correcting, and log transforming data.
Being familiar with biomaRt annotation package in collaboration with group A for the R Shiny app.
Learning adding professional features to arrayQualitymetrix results for the R Shiny app.
Learning how to work with GitHub by R Studio
Trying to build my first Shiny app

Soft Skills:

Elevating writing and verbal English skills in communicating with mentors and teammates.
Attending several meetings and reporting work progress
Time management
Elevating self-learning skills

Tools:

RMA package
SVA package
ArrayQualityMetrix package
Simpleaffy package
affyPLM package
QCReport package
Git in RStudio
R Shiny package

Achievements:

Finding outliers with a high degree of confidence
Successfully removing batch effect and preprocessing microarray data
Mastering working with Git functions
Finding the best quality control packages for finding outliers for the R Shiny app

Hurdles:

The color of annotations in the heatmap was not shown. After several searches and reading different tutorials I found that the row names of data fram that we build for annotation should be identical with columns names of the correlation matrix.
Some quality control packages need a lot of memory that was fixed by the memory.limit() function.
We didn’t know where we should put the deliverables of module 3. After speaking with Dravie (my teammate) about it we asked Anya and she added us to mentorchain repository on Github for doing that.

Deliverables:

Boxplot comparing RawData, Preprocessed data, and Batch corrected data
heatmap comparing RawData, Preprocessed data, and Batch corrected data
PCA comparing RawData, Preprocessed data, and Batch corrected data
Outlier report
Quality control codes
Results of Quality control packages

The above deliverables can be caught here: (https://github.com/mentorchains/bioinformatics-pathway/tree/main/Maryam%20Momeni/Module%203)

Maryam · September 7, 2021, 6:07pm

Module 4:

Technical area:

Collaborating in testing the Shiny app
Removing outliers from datasets
Getting the annotations from Databases
Removing duplicated Probe IDs and the genes which don’t have any corresponding Probe IDs
Selecting a representative row from the rows with duplicated Symbols
Filtering out genes below the 2nd percentile of the expression distribution of the dataset
Differential expression analysis by Limma package
Plotting a volcano plot in which the differentially expressed genes are identified
Plotting a Heatmap with expression values of the top 50 most significant genes

Soft skills:

Presenting a presentation on teaching the codes in detail to the others
Troubleshooting the Errors that I was being got from implementing the codes

Tools:

hgu133plus2.db
limma package
EnhancedVolcano package

Achievements:

Obtaining the list of differentially expressed genes of Colorectal cancer
Plotting Volcano plot and heatmap on the expression values of differentially expressed genes
Preparing a presentation on module 4

Hurdles:

I had some problems in removing duplicated Prob IDs, removing genes that were not mapped to any symbols, and selecting a representative for rows with duplicated Symbols. That was fixed by searching for different tutorials and different sites on the net.

Deliverables:

Heatmap of 50 top DEGs
Volcano plot
Differential expression analysis, annotation, and filtering codes

The above deliverables can be caught here: (https://github.com/mentorchains/bioinformatics-pathway/tree/main/Maryam%20Momeni/Module%204)

Maryam · September 7, 2021, 6:11pm

Module 5:

Technical area:

Implementing Gene ontology for colorectal cancer and plotting
Implementing KEGG pathways enrichment analysis for up-regulated genes of colorectal cancer and showing that in barplot and dot plot
Plotting Gene-concept network for KEGG pathways
Plotting Gene-concept network for TFs
Implementing GSEA

Soft skills:

Finding a study design and suitable datasets for the case study of R Shiny app’s article
Troubleshooting the errors that I was being got from implementing the codes

Tools:

Enrichplot
org.Hs.eg.db
msigdbrmagrittr
clusterProfiler
enrichplot
tidyr
clusterProfiler

Achievements:

Obtaining the pathways involved in colorectal cancer
Obtaining GSEA plot

Hurdles:

I was getting an error while wanted to perform Gene-concept-analysis. So I searched about the problem and find the answer in one of the forums. I updated the clusterProfiler package and the problem was fixed.

Deliverables:

Gene ontology plots of up-regulated DEGs
GSEA plot
Gene-concept network of transcription factors
Gene-concept network of enriched pathways of up-regulated genes
KEGG pathway analysis plot of up-regulated DEGs
Module 5 R codes

The above deliverables can be caught here: (https://github.com/mentorchains/bioinformatics-pathway/tree/main/Maryam%20Momeni/Module%205)

Maryam · September 7, 2021, 6:17pm

Module 6‎:

Technical Area

• Export the up-regulated genes from R

• Further analysis on up-regulated genes

‎ Tools

• Enrichr

• Metascape

• GEPIA

Soft Skills

• Analysis of the differentially expressed genes, enriched pathways by them, analysis of PPI ‎networks, survival analysis, exploration about pathways.‎

• Also, we are writing a case study for the R Shiny app’s (sMAP) article. I am improving my ‎scientific reporting skills as well as teamwork skills.‎

Tasks completed

• I performed an Enrichment analysis by Enrichr and obtained all the pathways, gene ‎ontologies, TFs, drugs, and diseases that were enriched for the up-regulated genes. Then I imported all that genes to Metascape and compared the pathways that were ‎enriched in Metascape and Enrichr with each other.‎

After that, I did a survival analysis for five genes and implemented some further analysis ‎like normal/cancer comparison and analysis of genes by the stage of cancer by GEPIA

Deliverables:

GO of up-regulated genes from Enrichr
KEGG pathways of up-regulated genes from Enrichr
Survival analysis of MEF2C in GEPIA
WikiPathways of up-regulated genes from Enrichr

The above deliverables can be caught here:https://github.com/mentorchains/bioinformatics-pathway/tree/main/Maryam%20Momeni/Module%206

Maryam · September 7, 2021, 6:19pm

Module 7:‎ (Capstone project)

Technical Area

• Reading datasets and meta-data in R

• Quality control

• Normalization and background correction

• Batch effect removal

• Annotation and gene filteration

• KEGG pathway and GO analysis

• GSEA

• Gene concept network analysis

• TFs analysis

• PPI network analysis by Cytoscape

• Survival analysis

‎ Tools

• R packages: Affy, arrayQualityMetrics , sva, ggplot2, pheatmap, WGCNA, limma, ‎EnhancedVolcano, hgu133plus2.db, enrichplot, org.Hs.eg.db, msigdbr, magrittr, ‎clusterProfiler, enrichplot, tidyr, clusterProfiler, Rcpp

• Cytoscape (STRING and Cytohubba plugins)‎

• GEPIA

Soft Skills

• I prepared a presentation of my Capstone project. So I worked on my presentation ‎skills

• Preparing Powerpoint for the presentation

Tasks completed

I merged two datasets containing 70 samples of lung cancer and removed the batch effect ‎between them and then implemented differential expression analysis for them. After ‎obtaining DEGs, I found enriched KEGG pathways and GO enriched terms for them. Then, I ‎plotted a gene-concept network and TF network for them and performed a GSEA. To find ‎hub genes I plotted a PPI network and found the 10 key genes in that network. After that, I ‎implemented the survival analysis for those 10 genes and found 5 genes that the value of ‎their expression was effective in the survival of patients with lung cancer.‎

The presentation video of the capstone project and the R codes and all deliverables can be caught here: https://github.com/mentorchains/bioinformatics-pathway/upload/main/Maryam%20Momeni/Capstone%20project