Overview
Introduction to the R programming language with a focus on using it for biological data analysis. The purpose of this course is to teach scientists (students, postdocs, PIs) in the biological and medical sciences to use R for typical data analysis tasks they might encounter routinely. This includes sequence analysis and other bioinformatics tasks. No prior knowledge of R is expected and workshop attendees can expect to come away with a skill set that is immediately translatable to their respective data tasks.
Learning Objectives
At the end of the workshop you will able to:
- Install and update R
- Use the Rstudio IDE
- Understand what CRAN and Bioconductor are and what the differences are between them
- Install and update R packages from CRAN and Bioconductor
- Import a wide variety of data types into R
- Understand the basic data types: integer, numeric, logical, character
- Understand R’s basic data structures: vector, matrix, list, data.frame
- Understand basic programming concepts: functions, objects, loops, vectorization, conditionals
- Manipulate data structures by subsetting and indexing
- Understand key base R functions: seq, apply (and friends)
- Manipulate data with dplyr and friends
- Make plots with ggplot
- Find help about any function
- Understand some common R errors and how to deal with them
- Find and evaluate R packages needed for a particular analysis
- Understand the difference between
<-
and =
and make your own choice about which one to use
Preparation
Attendees are expected to come with their own laptops and have already installed R and RStudio as well as completed at least one of the following online tutorials.
This small bit of preparation will allow us to move quickly through the basics and get to the good stuff.
Course Materials
Understanding R
- R History
- Packages
- Assignment
- Environments
- Other important key R functions including basic statistics
- Errors and getting help
Data Structures
- Vectors
- Lists
- Factors
- Matrices
- Data frames
Subsetting
- Ways to subset
- Subsetting operators
Programming Concepts]
- Functions
- Conditionals
- Loops
Practical data management
- Tidy data
- Pipes: Ceci n’est pas un pipe
- Intro to dplyr and tidyr
- Restructuring data and doing stuff to it
- Regular expressions and stringr
Data Visualization
- ggplot
- heatmaps
- What makes effective visualizations
- Building up a complex visualization
Introduction to Bioconductor
- Finding and installing biocondutor packages
- Learning what packages do and how to evaluate them
- Intro to some key data structures: XStringSet, IRanges, expressionset, etc.
Reproducibility
- Rapid introduction to managing and reproducing your analysis: Rmarkdown, git and github, best practices
- Writing your own functions