Bioinformatics - Level 1 - Module 7 (Weeks 7-8) - Session 1 - Kelly Zhang


Technical Area:

  • Performed the following on GEO dataset GSE4107:
    • Statistical analysis
    • Quality control report analysis
    • DGE analysis


  • R/RStudio
  • GitHub
  • Bioconductor packages: affy-packages, ggplot2, pheatmap, limma, EnhancedVolcano, clusterProfiler, enrichplot, msigdbr
  • GSEA database
  • GEO/GEOquery

Soft Skills:

  • Project Management/Task Management - During our meetings, I recorded what tasks we had to finish and assigned various members to different tasks.
  • Time Management - We had a little over a week to perform analyses on GSE4107 samples, so I managed my time by dividing up my jobs and scheduling time for myself to work on specific tasks (ex: literature review, plot analysis, code review, etc).
  • Teamwork - I worked with my team members @ivanlam27 @veyssi @Ananya_Kaushik @Roman_Ramirez @Leila to analyze GSE4107 for significant genes to colorectal cancer.
  • Virtual Collaboration - My teammates and I met over Zoom calls, shared ideas and papers over Slack, and worked together on a Google Document report and our Google Slides final presentation. We shared/collaborated on our code via GitHub.
  • Literature review - I looked over multiple papers to research any correlations between the FOS gene and colorectal cancer.
  • Presentation - My team working on the Capstone project and I presented a full report of our analysis of GSE4107 and our identified genes of interest in correlation to colorectal cancer to mentors Anya and Ali.

Achievement Highlights (3):

  • I am very proud of my teammates and I for completing a review/report of GSE4107’s sample’s significance to colorectal cancer within a week. We worked together to create data visualization plots, analyze our output and draw significant conclusions.
  • During my literature review of FOS, I found an interesting paper with two polymorphisms that enhanced expression of the FOS gene, leading to cell differentiation/tumor formation and a higher risk of colorectal cancer (Chen et al. 2019).
  • I have a really good understanding of reading data analysis plots: Normalization boxplots, PCA plots, Heatmaps, and Volcano Plots.

Difficulties Completing Tasks:

  • Difficulties completing tasks include working around time zones, as this was a highly collaborative project and our team was working across 4 different time zones
  • Without the structure of the modules, a difficulty I encountered was figuring out what to do for my final project/presentation. Luckily, I had a great team working with me and they helped me find the motivation and urgency to complete my tasks for our capstone project.
