Advanced Reproducible Research in R

An advanced workshop on creating collaborative and automated analysis pipelines

Authors

Luke W. Johnston

Anders Askeland

Welcome!

Three people working together to brainstorm, design, and develop a project.

Reproducibility and open scientific practices are increasingly in demand and needed by scientists and researchers in modern research environments. More frequently, our work, as researchers, includes a high level of collaboration on scientific projects. Consequently, many new challenges arise that we lack the training or knowledge to resolve.

These challenges include:

Establishing common coding styles and standards to make it easier to read or review each other’s code;
Documenting the software dependencies of a project to synchronize computing environments among collaborators and potentially with servers;
Documenting (and automating) the steps taken to process, analyze, and present data and findings in a way that allows collaborators to regenerate the most recent results.

Training and awareness of the skills and knowledge necessary to create reproducible and transparent data analysis pipelines are still significantly lacking among researchers. Partly due to this gap, how exactly an analysis is done (including data processing and wrangling) to produce a given result are often poorly, if at all, described in scientific studies. This can have a major impact on the reproducibility and, ultimately, the reliability of studies.

This course is designed to address these issues through code-along sessions (instructor and learners coding together), reading activities, discussion activities, and exercises using a real-world dataset. This website contains all the course materials, from reading content to exercises. It is structured like a book, with “chapters” serving as lessons presented in order of appearance. We make extensive use of the website throughout the course, with code-along sessions closely following the website material (with slight modifications for time constraints or more detailed explanations).

The course material was created using Quarto to write the lessons and create the book format, GitHub to host the Git repository of the material, and GitHub Actions with Netlify to create and host the website. The original source material for this course is found on the r-cubed-advanced GitHub repository.

Want to contribute to this course? Check out the README file as well as the CONTRIBUTING file on the GitHub repository for more details. The main way to contribute is by using GitHub and creating a new Issue to make comments and give feedback for the material.

Re-use and licensing

The course is licensed under the Creative Commons Attribution 4.0 International License so the material can be used, re-used, and modified, as long as there is attribution to this source.

Acknowledgements

Illustration cover is by Storyset.

The Danish Diabetes and Endocrinology Academy hosts, organizes, and sponsors this course. A huge thanks to them for their involvement and support! Steno Diabetes Center Aarhus and Aarhus University employs Luke Johnston, who is the lead instructor and curriculum developer.