1 Syllabus

Reproducibility and open scientific practices are increasingly in demand and needed by scientists and researchers in modern research environments. Our work often require or involve a high level of hands-on collaboration on scientific projects throughout all stages of the research lifecycle. We are faced with obstacles that we have no training for nor knowledge on how to address. Obstacles that can be as simple as not having a shared way of writing code or managing files and that can impede effective collaboration. Or they can be complex, like documenting software dependencies or steps in an analysis pipeline in a way that makes it easy to resolve issues and get the most recent set of results after collaborators have worked on a project.

Aside from the impact on collaboration, these barriers can even affect projects with just one primary researcher. Ultimately, this can have consequences on the reliability and reproducibility of scientific results, especially considering that the measures taken to address these barriers are often not explicitly shared in traditional scientific outputs (like publications). With this workshop, we aim to begin addressing this gap.

This workshop lasts 3 days and is split into the following sessions, listed in the schedule, which will be covered in order:

1.1 Learning outcome and objectives

The overall aim of this workshop is to enable you to:

Describe what an open, collaboration-friendly, and nearly-automated reproducible data analysis pipeline and workflow looks like, and then create a project that follows these practices using R.

Broken down into specific objectives for each session, we’ve designed the workshop to enable you to do the following:

Identify potential actions to streamline collaboration on a data analysis project and create projects that apply many of these actions using R.
Describe and define the distinct steps involved in a pipeline that goes from raw data to final results, and to use R to build this pipeline in an automated and explicit way.
Apply functional programming concepts to run statistical analyses that fit within the conceptual framework of automated pipelines and that can be used regardless of what statistical method is used.

And we will not learn:

Any details on or about specific statistical methods or models (these are already covered by most university curriculum). We cover how to run statistical methods, but not which statistical methods to use for your data or project.
Making figures or plots (data visualization could be a whole workshop on its own).

Because learning and coding is ultimately not just a solo activity, during this workshop we also aim to provide opportunities to chat with fellow participants, learn about their work and how they do analyses, and to build networks of support and collaboration.

The specific software and technologies we will cover in this workshop are R, RStudio, Quarto, Git, and GitHub, while the specific R packages are styler, targets, as well as more advanced uses of purrr.