Data Curation and Processing Assessment
Things Learned
- Throughout this week I was able to learn many things. Since an exact script of what had to be done was not given, I was able to learn and troubleshoot when I came across issues using the skills learned when training for the program.
- I learned how to analyze the plots in both a technical and scientific manner.
- I was also able to collaborate with my team members to complete the project and present the results.
Achievement Highlights
- Working collaboratively with my team members
- Learning how to work with different packages in R
- Troubleshooting issues that arose during the project
List of Meetings
- The team meeting that I attended was a great introduction into the program since I could talk to and hear from other members.
- The biology webinar was extremely helpful as it helped put the project into perspective from a biological point of view since the portion that I did was mainly technical.
- However, I was unable to attend the Happy Hour due to poor internet connection.
Goal for the Week
- My goal for the upcoming week is to have more collaboration with all of my team members and to see how the tasks performed can be applied to other projects.
Tasks and Challenges
- Most of the code that I ran took a long time to process which slowed down my progress to a certain extent.
- In addition, since this was my first time working by myself to code R, I often had to look up what was being done and the reasoning behind it.
- The data was loaded in and merged.
- Quality control using simpleaffy and arrayQualityMetrics was done on the raw data.
- The results were visualized using various plots to identify possible outliers and to determine the quality of the data.
- The data was then normalized using RMA.
- Once again, the resulting data was visualized to determine the outliers which were then removed.
- The data that no longer had the outliers was normalized and batch correction was done on the clean data to prevent sample clustering.
- Once again, the results were visualized using plots. The plots used included heatmaps, pca plots, boxplots, RUSE plots, and NUSE plots.
1 Like
Differential Gene Analysis Assessment
First I just wanted to say sorry about the late submission for the self assessment. There was a hurricane and we lost internet right before the morning meeting ended so I’m using my data to upload this.
Things Learned
- This week I learned how to annotate genes, perform limma analysis, and map genes. I also learned how to use EnhancedVolcano to visualized the results.
- I also learned the importance of networking in the talk we had on 07/28 which was very helpful to me.
Achievement Highlights
- Cross checking the code with my team and making an effort to work with them by checking in on Slack.
- Reading external research on colorectal cancer cells to further understand the biological side of the project.
- Completed the weekly project quickly and efficiently
List of Meetings
07/27: Team Meeting, 07/28: Group Presentations, Watched Recording for 07/29: BI GitHub Webinar, 07/30: Office Hours, 07/31: Team Discussion of Deliverables
Goals for the Week
- Working efficiently as the task lead with my team members on the weekly deliverables
- I’m hoping to make more of an effort to network and get to know my team members
Tasks Completed
- I annotated, mapped, and filtered the genes for differential gene analysis. I then visualized the results using the packages pheatmap and EnhancedVolcano. I then worked with my team to add that information into the presentation
- I only had one technical issue which I was able to resolve by going to office hours.
Functional Analysis Assessment
Things I Learned
- I learned how to utilize external databases for analysis
- I was also able to improve my leadership skills by being task lead
Achievement Highlights
- One achievement was networking more with my team members and other people in group
- Another highlight was being task lead and being able to work with my team members effectively to help one another
- Understanding the biological meaning behind the analysis performed
List of Meetings
08/03: GeneTech Meeting, 08/04: GeneTech Presentations, watched the recording for 08/05: Deliverables Webinar, 08/06: Office Hours, 08/07: Happy Hour
Goals for the Week
- One goal that I have for the week is to network with more people
- Another goal that I have for the week is get a head start on the final project.
Tasks and Challenges
- Performed Gene Ontology analysis by defining the significant DEG vector, using enrichGo, and setReadable.
- Visualized the results from Gene Ontology using a barplot
- Used groupGO to perform GO to see where genes were located in cell
- Performed the Kyoto Encyclopedia of Genes and Genomes analysis using enrichKegg
- Visualized the results from KEGG analysis using dotplot
- Utilized cnetplot to visualize the results from KEGG
- Used String Database to find the hub genes
- I had some problems with creating the vectors for the upregulated and downregulated genes. However, I plan on attending office hours to solve my problem
Final Project Assessment
Things I Learned
- I learned how to choose different databases based on the datasets
- I learned how to create a metadata file from Series Matrix Files
- I learned how to determine which analysis tools would work best with my pipeline
Achievement Highlights
- In addition to my project, I tried to analyze the datasets for breast cancer that utilized the Oligo package instead of the affy package
- Completed a bioinformatics pipeline on breast cancer successfully
- I reached out for help when I needed it but made sure to first try and solve the issue on my own
List of Meetings
08/10: GeneTech Group Meeting, 08/11: Office Hours, 08/12: Functional Analysis Webinar, 08/14: GeneTech Group Presentations, 08/17: GeneTech Team Meeting, 08/17: Office Hours, 08/18: Webinar on How to Make a Professional Presentation, 08/19: Office Hours, 08/21: Final Presentation, 08/24 GeneTech Final Meeting
Goals for the Future
I look forward to applying the technical skills I learned to different projects.
Tasks and Challenges
- I had trouble choosing and using the correct database for my datasets. After going to office hours I was able to get the code for the database to work. However, after removing the NA’s, I was left with only 143 observations out of the 30,000 observations. This made the rest of my project give poor results. I once again attended office hours where the leads helped me find a database that was better suited to my datasets which gave me better results for my code.
- Performed quality control using affyPLM package
- Visualized the results from quality control using boxplots and histograms
- Performed normalization using affy package
- Performed batch correction on the datasets
- Visualized the differences that occurred in data in the process of data curation and processing using boxplots
- Visualized the data using principal component analysis plots
- Performed gene filtering and limma analysis on the data
- Performed gene ontology analysis using enrichGO and groupGO
- Used String Database to locate the hub genes.
- Performed Global Gene Set Enrichment Analysis on data
- Performed Transcriptional Factor Analysis on data to determine a target gene of interest
- Used the website GEPIA for Survival Analysis in which the potential target gene was studied