Want to help out or contribute?

If you find any typos, errors, or places where the text may be improved, please let us know by providing feedback either in the feedback survey (given during class) or by using GitHub.

On GitHub open an issue or submit a pull request by clicking the " Edit this page" link at the side of this page.

4  Introduction to course

Introduction slides

The slides contain speaking notes that you can view by pressing ‘S’ on the keyboard.

4.1 The Big Picture

Reading task: ~10 minutes

This section provides a bigger picture view of what we will be doing, why we want to do it, how we will be going about doing it, and what it will look like in the end.

Our big picture aim is:

We want to create a data analysis project that:

  1. Makes it easier for direct contributions from collaborators and others,
  2. Allows us to write processing and analysis steps (as code) in a more explicit and ultimately reproducible way,
  3. Incorporates general-purpose tools that simplify using (or switching) statistical analysis methods, and
  4. Enables quick dissemination of results by creating a project-specific website.

All of which will be exemplified through a simple analysis of a lipidomics study.

How will it look like in the end, in a more “tangible” way? The most tangible thing are the folders and files on our computers. The folder and file structures below show where we start and where we end, so you can hopefully get a better understanding of how things look. Right now, everyone’s initial project structure should look like:

LearnR3
├── data/
│   ├── lipidomics.csv
│   └── README.md
├── data-raw/
│   ├── README.md
│   ├── nmr-omics/
│   │  ├── lipidomics.xlsx
│   │  └── README.txt
│   └── nmr-omics.R
├── doc/
│   ├── README.md
│   ├── learning.qmd
│   └── report.Rmd
├── R/
│   ├── functions.R
│   └── README.md
├── .gitignore
├── DESCRIPTION
├── LearnR3.Rproj
├── README.md
└── TODO.md

Why do we structure it this way?

  • To follow “project-oriented” workflows (covered in Chapter 5).
  • To follow some standard conventions in R, like having a DESCRIPTION file (which is important for Chapter 5).
  • To keep types of files separate, like raw data raw and in the data-raw/ folder, R scripts/functions in the R/ folder, and documents like R Markdown / Quarto files in doc/.

This also supports our workflow and processes, which will be something like:

  • Install packages in a project-specific environment to track package dependencies.
  • Follow “function-oriented” workflows, where we use R Markdown / Quarto (doc/learning.qmd) to write and test out code, convert it into a function, test it, and then move it into R/functions.R.
    • Developing functions in the R Markdown document makes it a bit easier to quickly test code out and make sure it works before moving the code over into a more formal location and structure. Think of using the R Markdown file as a sandbox to test out and play with code, without fear of messing things up.
    • We also test code out in the R Markdown from a teaching and learning perspective because it’s easier to show and weave in text and comments with code as we do the code-alongs. It also forces you to practice working within R Markdown documents, which are key components to a reproducible workflow in R.
    • We keep the functions in a separate file because we will frequently source() from it as we prototype and test out code in the R Markdown file. It also creates a clear separation between “finalized” code and prototyping code.
  • Use a combination of restarting R with or with the Palette (, then type “restart”) (or Session -> Restart R) and using source() ( or with the Palette (, then type “source”) while in R/functions.R) to run the functions inside of R/functions.R.
    • We restart R because it is the only certain way to ensure that the R workspace is completely clear. For reproducibility, we should always aim to work from a “clean plate”.
  • Keeping code readable by frequently (automatically) fixing the formatting/styling of our code.
  • For each “output” in a paper (like a figure or table), write one or more functions to complete the output and include those functions as “targets” or steps in a pipeline. Use this explicit pipeline to track which steps need to be re-run and in which order for your data analysis.
  • Write accompanying text (or full paper) for the analysis in Markdown so we can easily and quickly build websites (or Word docs) of our work, for rapid dissemination.
  • Automatically reformat Markdown text into a standard, more readable format.
  • Whenever we complete a task, we add and commit those file changes and save them into the Git history with or with the Palette (, then type “commit”).
    • We use Git because it is the best way of keeping track of what was done to your files, when, and why. It keeps your work transparent and makes it easier to share your code by uploading to GitHub. Using version control should be a standard practice to doing better science since it fits with the philosophy of doing science.

Many of these “project setup” tasks can take a while to set up, can often be very difficult and confusing. This is before you’ve even gotten to the analysis phase of your work. A good analogy for this first step is when skyscrapers are built. Watching construction on these projects makes it feel like it takes forever for them to finally start going up and adding floors. But once they start adding floors, it goes up so fast! That’s because a lot of the main work is in building up the foundation of the building, so that it is strong and stable. This is the same with analysis projects, the first phase feels like nothing is “moving” but you are building the foundation to everything that comes after.

Tip

If you finish exercises faster than others in the course, try your hand at working through the Extra Exercises section. It is still a work-in-progress, so it is a bit small right now.

4.2 Small fixes

Before we start, let’s fix up a few things:

  1. Delete the doc/report.Rmd file, since we will be working with the Quarto doc/learning.qmd file instead.
  2. Run usethis::use_git_ignore(".Rbuildignore") to stop tracking the .Rbuildignore file. This file is related to building packages, which we aren’t doing, and is sometimes created when using the functions in {usethis}.
  3. Open up the DESCRIPTION file and add a line with Title: "Learning advanced R!" as well as Version: 0.0.1. Both of these fields are often needed by {usethis}. The {prodigenr} package should add them automatically, but doesn’t (I need to fix it).