1  Course syllabus

Reproducibility and open scientific practices are increasingly in demand and needed by scientists and researchers in modern research environments. Our work often require or involve a high level of hands-on collaboration on scientific projects throughout all stages of the research lifecycle. We are faced with obstacles that we have no training for nor knowledge on how to address. Obstacles, that can be as simple as not having a shared way of writing code or managing files, can impede effective collaboration. Or they can be complex, like documenting software dependencies or steps in an analysis pipeline in a way that makes it easy to resolve issues and get the most recent set of results after collaborators have worked on a project. Aside from the impact on collaboration, these barriers can even affect projects with just one primary researcher. Ultimately, this can have consequences on the reliability and reproducibility of scientific results, especially considering that the measures taken to address these barriers are often not explicitly shared in classical science output (like a publication).

With this course, we aim to begin addressing this gap. By using a highly practical, hands-on approach that revolves around code-along sessions (instructor and learner coding together), reading activities, and hands-on exercises, our overarching learning outcome is that at the end of the course, participants will be able to:

Describe what an open, collaborator-friendly, and nearly-automated reproducible data analysis pipeline and workflow looks like, and then create a project that follows these concepts by using R.

Our specific learning objectives are to:

  1. Identify potential actions to streamline collaboration on a data analysis project and create projects that apply many of these actions using R.

  2. Describe and define the distinct steps involved in a pipeline that goes from raw data to final results, and to use R to build this pipeline in an automated and explicit way.

  3. Apply functional programming concepts to run statistical analyses that fit within the conceptual framework of automated pipelines and that can be used regardless of what statistical method is used.

And we will not learn:

Because learning and coding is ultimately not just a solo activity, during this course we also aim to provide opportunities to chat with fellow participants, learn about their work and how they do analyses, and to build networks of support and collaboration.

The specific software and technologies we will cover in this course are R, RStudio, Git, GitHub, and Quarto, while the specific R packages are styler, targets, and several of the tidymodels packages.

One goal of the course is to teach about open science, and true to our mission, we practice what we preach. The course material is publicly accessible (all on this website) and openly licensed so you can use and re-use it for free!

1.1 Is this course for you?

This course is designed in a specific way and is ideal for you if:

  • You are a researcher, preferably working in the biomedical field (ranging from experimental to epidemiological). Specifically, this course targets those working on topics in diabetes and metabolism.
  • You currently do quantitative data analysis.
  • You preferably:

Considering that this is a natural extension of the introductory and intermediate r-cubed courses, this course builds on the knowledge and skills learned during those courses, including Git, RStudio R Projects, functions, functional programming, and Quarto / R Markdown. If you do not have familiarity with these tools, you will need to go over the material from the introductory and intermediate courses beforehand (more details about pre-course tasks will be sent out a couple of weeks before the course).

While having these assumptions help to focus the content of the course, if you have an interest in learning R but don’t fit any of the above assumptions, you are still welcome to attend the course! We welcome everyone, that is until the course capacity is reached.