14  Introduction to workshop

Introduction slides

The slides contain speaking notes that you can view by pressing β€˜S’ on the keyboard.

14.1 πŸ“– Reading task: The big picture

Time: ~10 minutes.

This section provides a bigger-picture view of what we will be doing, why we want to do it, how we will go about doing it, and what it will look like in the end.

Our big picture aim is to create a data analysis project that:

  1. Makes it easier for collaborators and others to contribute directly
  2. Explicitly includes the processing and analysis steps (as code), so they are reproducible
  3. Incorporates general-purpose tools that simplify the use or switching of statistical analysis methods

All of this will be exemplified through a simple analysis of a lipidomics dataset during the workshop.

Where will we start and where will we end, in a more β€œtangible” way? The most tangible things are the folders and files on our computers. The folder and file structures below show where we start and where we end, so you can hopefully get a better understanding of how things will look.

Initial project structure

Right now, your initial project structure should look like this:

LearnR3
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ lipidomics.csv
β”‚   └── README.md
β”œβ”€β”€ data-raw/
β”‚   β”œβ”€β”€ README.md
β”‚   β”œβ”€β”€ nmr-omics/
β”‚   β”‚  β”œβ”€β”€ lipidomics.xlsx
β”‚   β”‚  └── README.txt
β”‚   └── nmr-omics.R
β”œβ”€β”€ doc/
β”‚   β”œβ”€β”€ README.md
β”‚   β”œβ”€β”€ learning.qmd
β”‚   └── report.Rmd
β”œβ”€β”€ R/
β”‚   β”œβ”€β”€ functions.R
β”‚   └── README.md
β”œβ”€β”€ .gitignore
β”œβ”€β”€ DESCRIPTION
β”œβ”€β”€ LearnR3.Rproj
β”œβ”€β”€ README.md
└── TODO.md

Final project structure

At the end of this workshop, it should look something like:

LearnR3
β”œβ”€β”€ _targets/
β”‚   β”œβ”€β”€ meta/
β”‚   β”‚   └── meta
β”‚   β”œβ”€β”€ objects/
β”‚   β”œβ”€β”€ user/
β”‚   └── workspaces/
β”œβ”€β”€ data/
β”‚   β”œβ”€β”€ lipidomics.csv
β”‚   └── README.md
β”œβ”€β”€ data-raw/
β”‚   β”œβ”€β”€ README.md
β”‚   β”œβ”€β”€ nmr-omics/
β”‚   β”‚  β”œβ”€β”€ lipidomics.xlsx
β”‚   β”‚  └── README.txt
β”‚   └── nmr-omics.R
β”œβ”€β”€ doc/
β”‚   β”œβ”€β”€ _targets.yaml
β”‚   β”œβ”€β”€ README.md
β”‚   β”œβ”€β”€ learning.html
β”‚   └── learning.Rmd
β”œβ”€β”€ R/
β”‚   β”œβ”€β”€ functions.R
β”‚   └── README.md
β”œβ”€β”€ .gitignore
β”œβ”€β”€ _targets.R
β”œβ”€β”€ DESCRIPTION
β”œβ”€β”€ LearnR3.Rproj
└── README.md

Why do we structure it this way?

  • To follow β€œproject-oriented” workflows (covered in ).
  • To follow some standard conventions in R, like having a DESCRIPTION file (which is important for ).
  • To keep types of files separate, like raw data raw and in the data-raw/ folder, R scripts/functions in the R/ folder, and documents like R Markdown / Quarto files in doc/.

This structure also supports our workflow and processes throughout the workshop, which will be to:

  • Track package dependencies in the DESCRIPTION file.
  • Follow a β€œfunction-oriented” workflow, where we use R Markdown / Quarto (doc/learning.qmd) to write and test out code, convert it into a function, test it, and then move it into R/functions.R.
    • We develop functions in the learning.qmd file to make it a bit easier to quickly test the code and make sure it works before moving it over into a more formal location and structure. Think of this file as a sandbox to test out and play with code, without fear of messing things up.
    • We also test the code in learning.qmd because, from a teaching and learning perspective, it’s easier to integrate text and comments with the code during the code-alongs in Markdown files.
    • We keep functions in a separate functions.R file because we will frequently source() from it as we prototype and test out code in the learning.qmd file. This also creates a clear separation between β€œfinalized” code and prototype code.
  • Use a combination of restarting R with Ctrl-Shift-F10 or with the Palette (Ctrl-Shift-P, then type β€œrestart”) (or Session -> Restart R) and using source() (Ctrl-Shift-S or with the Palette (Ctrl-Shift-P, then type β€œsource”) while in R/functions.R) to run the functions inside of R/functions.R.
    • We restart R to ensure that the R workspace is completely clear. For reproducibility, we should always aim to work from a β€œclean plate”.
  • Keep code readable by having the formatting/styling of our code fixed automatically.
  • For each β€œoutput” (like a figure or a table) in a paper, write one or more functions to generate it and include each function as steps or β€œtargets” in a pipeline. Use the pipeline to track and order the steps in the data analysis.
  • Write accompanying text (which outside this workshop could be a full paper) for the analysis in Markdown so we can easily and quickly regenerate reports for rapid dissemination.
  • Automatically reformat Markdown text into a standard, more readable format.
  • Whenever we complete a task, we add and commit those file changes to save them in the Git history with Ctrl-Alt-M or with the Palette (Ctrl-Shift-P, then type β€œcommit”).
    • We use Git to keep track of what changes were made to the files, when, and why. This keeps our work transparent and makes it easier to share the code by uploading it to GitHub. Version control aligns with the philosophy of reproducible science and should be a standard practice (it usually isn’t, which is why we practice it here).

Many of these β€œproject setup” tasks can be time-consuming, difficult and confusing - and this is before you’ve even gotten to the analysis phase of your work.

A good analogy for these first steps is when skyscrapers are built: Watching construction on these projects makes it feel like it takes forever for them to finally start going up and adding floors. But once they start adding floors, it goes up so fast! That’s because a lot of the main work is in building up the foundation of the building, so that it is strong and stable. This is the same with analysis projects, the first phase feels like nothing is β€œmoving” but you are building the foundation to everything that comes after.

Throughout the many times we’ve taught this and other workshops we get asked a lot of questions. We have a Frequently Asked Questions page for keeping track of some of these questions. Check out this page, maybe your question has already been answered!

Important

During the workshop, we will be writing and coding mostly in the doc/learning.qmd file. We will also be regularly deleting the content within the file to keep things clean and easier for you, but importantly for the instructors. If or when you encounter an error or problem and there is a lot of the code kept in the file, often the problem is due to the left over code rather than an actual problem with the code you are writing. So when the helpers or instructors come to help, it makes it easier for us to help you when there is less code to look through and debug.

But, we know you may want to keep some notes as you work in the workshop. So we suggest you create a new file called notes.qmd or something similar in the doc/ folder and either:

  • Write notes in the doc/learning.qmd file and then copy them over to your notes.qmd file when we tell you to delete everything, or
  • Write notes directly in your doc/notes.qmd file and keep it open while you work in the doc/learning.qmd file.
CautionSticky/hat up!

When you’re ready to continue, place the sticky/paper hat on your computer to indicate this to the teacher πŸ‘’ 🎩

14.2 πŸ’¬ Discussion activity: How do you exactly collaborate or contribute to your own or others’ projects?

Time: ~10 minutes.

Reflect on when you work on a project (for your thesis or a manuscript), how exactly do you and your collaborators contribute to the project:

  • Is it mostly verbal contributions?
  • Do you use a shared folder that the files are on?
  • How do you keep track of who’s changed what?
  • Do you mostly work on your own with contributions being mostly verbal or written feedback (like in a meeting or through an email)?
  • If you collaborate directly on a project, how do you coordinate things? Does one collaborator work on one section or analysis, so your files are separate?
  • Do you ever have to go in and contribute your own code to theirs (and vice versa)?

Consider these questions as we do the following steps.

  1. Take about 1 minute to reflect on these questions.
  2. For 6 minutes, discuss these questions with your neighbour, and talk about your own experiences.
  3. For the remaining time, we will briefly share with everyone.