Learn @ HSLS: Applying Probability and Data with R to All of Us Datasets

Enjoy this self-paced module at a time and place that works best for you.
This module was developed by Alexis Cenname
Access Applying Probability and Data with R to All of Us Datasets*

Are you interested in using All of Us datasets, and perhaps in enrolling in other HSLS-run All of Us workshops, but need an introduction or review of skills in probability and R? You’re in the right place!

We built this HSLS module as a companion to the Massive Open Online Course (MOOC) “Introduction to Probability and Data with R” offered through Coursera and developed by Professor Mine Çetinkaya-Rundel of Duke University. This Duke course is perfect for those looking to learn R programming skills, with a focus on probability and data analysis. However, this course was not designed specifically for application to All of Us workflows – that’s where HSLS’s companion module comes in. By working through HSLS’s companion module in parallel with the Duke course, you will learn to apply the concepts from the course specifically to All of Us datasets.

Upon completing this class you should be able to:

  • Recall simple functions in R for exploring any dataset.
  • Define the five main data domains in the All of Us database.
  • Classify All of Us variables as categorical or numerical.
  • Calculate summary statistics for both categorical and numerical variables.
  • Apply data cleaning procedures to datasets in R, including filtering, removing missing values (NA), and managing duplicates.
  • Derive new variables from existing datasets.
  • Generate a new dataframe by merging various datasets.
  • Use random sampling techniques to accurately represent target distributions.
  • Describe the data preparation processes involved in the All of Us code templates developed by HSLS.

Level: Novice

This module is part of our lineup of HSLS self-paced modules. Visit Learn @ HSLS to explore upcoming live class offerings and find more learning opportunities that you can complete on your own time.

*HSLS self-paced modules require a Pitt username and password (to log in via Pitt Passport). UPMC residents and fellows who do not have a Pitt username can request access to a self-paced module. HSLS live classes are open to University of Pittsburgh faculty, staff, and students, as well as UPMC residents and fellows.