Advanced Reproducible Research in R

An advanced workshop on creating collaborative and automated analysis pipelines

Authors

Luke W. Johnston

Anders Askeland

Welcome!

Three people working together to brainstorm, design, and develop a project.

Reproducibility and open scientific practices are increasingly in demand and needed by scientists and researchers in our modern research environments. More frequently, our work include a high level collaboration on scientific projects, and consequently many new challenges arise that we as researchers do not have the training for nor knowledge on how to resolve. Challenges include:

  • Establishing common coding styles and standards to make it easier to read or review each others code;

  • Documenting the software dependencies of a project in order to synchronize computing environments among the collaborators and potentially servers;

  • Documenting (and automating) the steps taken to process, analyze, and present the data and findings in a way that allows collaborators to re-generate the most recent results.

Training and awareness of the skills and knowledge necessary to create reproducible and transparent data analysis pipelines are still very much lacking for researchers. Partly due to this gap, how exactly an analysis is done to produce a given result is poorly, if at all, described in scientific studies. This could have major impact on the reproducibility an ultimately the reliability of studies.

This course is designed to address these issues in code-along sessions (instructor and learner coding together), reading activities, some discussion activities, and exercises using a real-world dataset. This website contains all of the material for the course, from reading material to exercises to images. It is structured as a book, with “chapters” as lessons, given in order of appearance. We make heavy use of the website throughout the course where code-along sessions follow the material on the website nearly exactly (with slight modifications for time or more detailed explanations).

The course material was created using Quarto to write the lessons and create the book format, GitHub to host the Git repository of the material, and GitHub Actions with Netlify to create and host the website. The original source material for this course is found on the r-cubed-advanced GitHub repository.

Want to contribute to this course? Check out the README file as well as the CONTRIBUTING file on the GitHub repository for more details. The main way to contribute is by using GitHub and creating a new Issue to make comments and give feedback for the material.

Re-use and licensing

Creative Commons License

The course is licensed under the Creative Commons Attribution 4.0 International License so the material can be used, re-used, and modified, as long as there is attribution to this source.

Acknowledgements

Illustration cover is by Storyset.

The Danish Diabetes and Endocrinology Academy hosts, organizes, and sponsors this course. A huge thanks to them for their involvement and support! Steno Diabetes Center Aarhus and Aarhus University employs Luke Johnston, who is the lead instructor and curriculum developer.

Logo for Steno Diabetes Center Aarhus

Logo for Danish Diabetes and Endocrinology Academy