Overview

Introduction to the R programming language with a focus on using it for biological data analysis. The purpose of this course is to teach scientists (students, postdocs, PIs) in the biological and medical sciences to use R for typical data analysis tasks they might encounter routinely. This includes sequence analysis and other bioinformatics tasks. No prior knowledge of R is expected and workshop attendees can expect to come away with a skill set that is immediately translatable to their respective data tasks.

Learning Objectives

At the end of the workshop you will able to:

  • Install and update R
  • Use the Rstudio IDE
  • Understand what CRAN and Bioconductor are and what the differences are between them
  • Install and update R packages from CRAN and Bioconductor
  • Import a wide variety of data types into R
  • Understand the basic data types: integer, numeric, logical, character
  • Understand R’s basic data structures: vector, matrix, list, data.frame
  • Understand basic programming concepts: functions, objects, loops, vectorization, conditionals
  • Manipulate data structures by subsetting and indexing
  • Understand key base R functions: seq, apply (and friends)
  • Manipulate data with dplyr and friends
  • Make plots with ggplot
  • Find help about any function
  • Understand some common R errors and how to deal with them
  • Find and evaluate R packages needed for a particular analysis
  • Understand the difference between <- and = and make your own choice about which one to use

Preparation

Attendees are expected to come with their own laptops and have already installed R and RStudio as well as completed at least one of the following online tutorials.

This small bit of preparation will allow us to move quickly through the basics and get to the good stuff.

Course reference

Main text:
R for Data Science by Garret Grolemund and Hadley Wickham

Secondary reading:
Advanced R by Hadley Wickham

Course Materials

Understanding R

  • R History
  • Packages
  • Assignment
  • Environments
  • Other important key R functions including basic statistics
  • Errors and getting help

Data Structures

  • Vectors
  • Lists
  • Factors
  • Matrices
  • Data frames

Subsetting

  • Ways to subset
  • Subsetting operators

Programming Concepts]

  • Functions
  • Conditionals
  • Loops

Practical data management

  • Tidy data
  • Pipes: Ceci n’est pas un pipe
  • Intro to dplyr and tidyr
  • Restructuring data and doing stuff to it
  • Regular expressions and stringr

Data Visualization

  • ggplot
  • heatmaps
  • What makes effective visualizations
  • Building up a complex visualization

Introduction to Bioconductor

  • Finding and installing biocondutor packages
  • Learning what packages do and how to evaluate them
  • Intro to some key data structures: XStringSet, IRanges, expressionset, etc.

Reproducibility

  • Rapid introduction to managing and reproducing your analysis: Rmarkdown, git and github, best practices
  • Writing your own functions