Introduction to Quarto

DSA 220 - Introduction to Data Science and Analytics

Author

Andrew DiLernia

Learning Objectives

  • Render Quarto to different formats

  • Understand fundamental Markdown syntax

  • Use code chunks in Quarto

  • Customize code chunk options

  • Include plots & tables in Quarto

Quarto

Quarto is a modern, open-source data science software tool for creating dynamic documents, reports, websites, and presentations. The power of Quarto for data science is in its integration of written text and executable code in a single source document for reproducible and streamlined analyses. Quarto is also compatible with multiple languages, including R and Python, facilitating its use in data science more broadly.

Projects in RStudio are organizations of all the files associated with a particular analysis, such as R scripts, data, and outputs, into a single folder. Projects facilitate an organized, portable, and reproducible workflow.

Let’s open RStudio and create a new project.

Tip

Click File > New Project… > New Directory > New Project. In the Directory name: field, type DSA-220. Then click the Create Project button.

Next, we will open our first Quarto file.

Tip

Open the template Quarto document from Blackboard.

Tip

Render it by clicking the Render button (top left).

  • Also render the document by using the appropriate keyboard shortcut (Mac: Command + Shift + K, Windows: Ctrl + Shift + K).
Tip

Create new documents for two of the most-common formats: HTML and Word. Render each of the documents.

  • For the two new files, does the output differ?

  • Looking at the YAML at the top of each document, how does the input differ?

Note: It is possible to render to a PDF, but we will not use the PDF format in this course. If using PDF as the output format, you may need to install LaTeX if rendering to PDF causes an error. This can be done by installing the tinytex package by submitting install.packages("tinytex") to the Console by pressing return / enter, and then submitting tinytex::install_tinytex() to the Console after the package has been successfully installed.

For the rest of the activity, we will focus on rendering to an HTML format.

Markdown

Markdown is a lightweight set of conventions for formatting plain text files. It is designed to be easy to read and write, so it is not very customizable but can be learned quickly.

Headings

Headings and subheadings can be included in a file to organize the document.

  • # 1st Level Header

  • ## 2nd Level Header

  • ### 3rd Level Header

Tip

Include a first-level header, second-level header, and third-level header in your document naming them Section 1, Subsection 1.1, and Subsubsection 1.1.1, respectively.

Text Formatting

Basic formatting of text, such as italicizing, bolding, superscripts, and subscripts can be implemented as well.

Raw text input

Rendered output

*italic*

italic

**bold**

bold

`code`

code

superscript^2^

superscript2

subscript~2~

subscript2

You try

Tip
  1. Include an italicized word or phrase.

  2. Include a bold word or phrase.

  3. Include a word styled showing it is code.

  4. Include a superscript and a subscript.

Code Chunks

The real power of Quarto comes from the ability to add code chunks to the file, which allows us to run R code inside our document:

# Sophisticated calculation
sqrt(49)
[1] 7

There are several ways to insert a code chunk:

  1. The keyboard shortcut Cmd/Ctrl + Alt + I (recommended)

  2. The “Insert” button icon in the editor toolbar (top right).

  3. By manually typing the chunk delimiters ```{r} and ```.

Tip

Add a new code chunk at the bottom of the Quarto document, naming the chunk uptown-chunk by including #| label: uptown-chunk as the first line inside the code chunk.

Tip

In the code chunk, use the include_graphics() function from the knitr package to include the image at the following URL: https://github.com/dilernia/DSA220/blob/main/images/uptownFunk2.png?raw=true. Note that the URL should be enclosed in quotes when calling the include_graphics() function.

Code chunks have multiple options which control how they behave. Some more commonly used code chunk options are below:

Table 1: Commonly used code chunk options
Option Description
eval Determines if the code should be evaluated (true / false)
echo Controls whether the code is shown in the output (true / false)
include Specifies if the result should be included in the output (true / false)
warning Indicates if warnings should be shown (true / false)
message Indicates if messages should be shown (true / false)
error Include errors in the output. When set to true, runtime errors will not halt processing of the document. (true / false)
out-width Width of the plot in the output document
out-height Height of the plot in the output document
fig-align Figure horizontal alignment (default, left, right, or center)
Tip

Modify uptown-chunk by setting the echo chunk option to be false by including #| echo: false immediately below the line with #| label: uptown-chunk.

Specify and toggle the eval chunk option for the uptown-chunk code chunk and see what happens. Note that some of the code chunk options for controlling the dimensions of figures only apply for graphics produced in code chunks, but not externally included images.

To explore more code chunk options and details, see https://quarto.org/docs/reference/cells/cells-knitr.html?form=MG0AV3.

We can specify default code chunks options for the entire document by customizing the YAML (at the top of the Quarto document) as below.

---
title: "Document title"
author: "Author name"
date: today
format:
  html:
    self-contained: true
    embed-resources: true
toc: true
execute:
  eval: true
  echo: true
  warning: false
  message: false
---

Including Plots

Using R code inside chunks that produce plots allows graphics to be directly included in the output document.

For example, let’s explore a data set on storms ⛈️ between 1975 and 2021 from the National Oceanic and Atmospheric Administration (NOAA). In particular, let’s visualize the relationship between the pressure (in millibars) of a storm and its maximum sustained wind speed (in knots).

library(tidyverse)

# Load storms data set
data(storms)

storms |> ggplot(aes(x = pressure, y = wind)) + 
  geom_point()

Note: If this code has an error such as Error in library(tidyverse) : there is no package called ‘tidyverse’, you may need to submit install.packages('tidyverse') to the Console first to install the package. If you do not encounter an error, then it is best not to reinstall the package.

In general, you may need to install an R package by submitting install.packages("package_name") to the Console first, replacing "package_name" with the name of the R package to install in quotes. Typically, you only install R packages as you need them.

Tip

Include the code to create a scatter plot above in a new chunk at the bottom of the Quarto document.

Inline Code

Quarto also allows you to use R code outside of code chunks for dynamic text:

xbar <- 2
se <- 1.3

The following Markdown syntax:

The 95% confidence interval for the mean is (`r xbar - 1.96*se`, `r xbar + 1.96*se`)

produces:

The 95% confidence interval for the mean is (-0.548, 4.548).

Tip

Include a chunk of R code and after that in-line R code at the bottom of the document to reproduce the sentence containing the confidence interval results above.

Tables

Quarto allows you to display nicely formatted tables. Let’s explore a few of the most common R functions for nicely displaying tables in Quarto. To do so, let’s continue with the storms data set looking at the four largest sustained wind speeds recorded for different storms 💨.

# Largest sustained wind speeds recorded
storms_summary <- storms |> 
  group_by(name, year) |> 
  summarize(max_wind_knots = max(wind)) |> 
  ungroup() |> 
  slice_max(max_wind_knots, n = 4)

flextable() from the flextable package

library(flextable)

storms_summary |> 
  flextable() |> 
  colformat_double(big.mark = "", digits = 0) |> 
  autofit()
Table 2: Storm summary statistics with flextable()

name

year

max_wind_knots

Allen

1980

165

Dorian

2019

160

Gilbert

1988

160

Wilma

2005

160

The flextable package displays tables for all output formats in a relatively consistent manner, with customization options that work for all output formats.

There are multiple ways to customize table captions. A versatile approach is to customize the label and tbl-cap code chunk options as below.

#| label: 'tbl-a_table_name'

#| tbl-cap: 'A custom table caption'

Note that the label option must begin with "tbl" prefix to properly include the table caption. For more details regarding customization of included tables, see https://quarto.org/docs/authoring/tables.html.

gt() from the gt package

library(gt)

storms_summary |> 
  gt()
Table 3: Storm summary statistics with gt()
name year max_wind_knots
Allen 1980 165
Dorian 2019 160
Gilbert 1988 160
Wilma 2005 160

The gt package has a wide range of formatting options, including conditional formatting and group-wise summary statistics. It is most easily used with HTML as the output format for Quarto, but not as easily for PDF or Word output when further customizing the table.

Table themes

There are many options for finely customizing aspects about tables displayed in Quarto. We can also apply complete themes to customize the appearance of tables with minimal code. The flextable package comes with preset themes available.

# Displaying flextable with tron theme
storms_summary |> 
  flextable() |> 
  autofit() |> 
  theme_tron()

name

year

max_wind_knots

Allen

1,980

165

Dorian

2,019

160

Gilbert

1,988

160

Wilma

2,005

160

For gt tables one must load the gtExtras package to use complete themes. Unfortunately gtExtras does not fully customize the tables when rendering to Word or PDF.

library(gtExtras)

# Displaying gt with ESPN theme
storms_summary |> 
  gt() |> 
  gt_theme_espn()
name year max_wind_knots
Allen 1980 165
Dorian 2019 160
Gilbert 1988 160
Wilma 2005 160

You try

Tip
  1. Display the storms_summary table applying a complete theme different than the one used above using the gt() function and the gtExtras R package by viewing the available themes here: https://jthomasmock.github.io/gtExtras/reference/index.html

  2. Display the storms_summary table applying a complete theme different than the one used above using the flextable() function by viewing the available themes here: https://davidgohel.github.io/flextable/reference/index.html#flextable-themes.

Multilingual support

Quarto supports multiple programming languages within the same document including R, Python, SQL, Bash, and Julia among others. This facilitates reproducible workflows and collaborations involving multiple languages.

Below are full code chunks for R, Bash, and Python, including the code chunk headers and delimiters. R code chunks start with ```{r}, Bash with ```{bash}, and Python with ```{python}.

Bash

Bash, short for “Bourne Again SHell,” is a command-line interpreter commonly used in Unix-based systems that enables users to automate tasks, navigate file systems, and execute scripts. Bash chunks are useful for running shell commands, like checking a file’s contents or printing the current date. A Bash code chunk begins with ```{bash}.

Here’s an example of a Bash code chunk that prints a message and the current date.

```{bash}
# Print a simple message to the screen
echo "Hello from a Bash code chunk!"

# Print the current date and time
echo "The current date is:"
date
```
Hello from a Bash code chunk!
The current date is:
Thu Aug 28 14:49:01 EDT 2025

Python

Quarto can execute Python code directly in chunks that start with ```{python}. When working in an R-focused environment like RStudio, we can use the R package reticulate to help working with Python.

Setting Up Python Packages

The R code chunk below makes use of reticulate to install Python libraries for us to use. For this to work, you must have Python installed on your computer and configured correctly with RStudio. If necessary, one can install Python using reticulate::install_miniconda() or reticulate::install_python(version = "3.9.6") for a specific version of Python.

```{r}
# Load the reticulate package
library(reticulate)

# A list of Python packages we need
python_packages <- c("matplotlib", "pandas")

# Install packages if they are not already available
for(pack in python_packages) {
  if(!py_module_available(pack)) {
    py_install(pack)
  }
}
```

Running Python Code

Now that we have the necessary libraries available, we can use the Python code chunk below to make a scatter plot for the same storms data set we used previously.

```{python}
# Loading Python libraries matplotlib and pandas
import matplotlib.pyplot as plt
import pandas as pd

# Load the storms dataset using pandas
storms = pd.read_csv('https://raw.githubusercontent.com/tidyverse/dplyr/main/data-raw/storms.csv')

# Creating scatter plot using matplotlib
plt.scatter(storms['pressure'], storms['wind'])

# Adding labels
plt.xlabel('Pressure (mb)')
plt.ylabel('Wind Speed (knots)')
```

Word Templates

  • Organizations, such as federal agencies like the US Department of Agriculture, can have weekly or monthly reports that change as data / other inputs are updated.

  • When rendering to Word, Quarto can use Word document templates for consistent formatting, headers, etc. while updating charts & tables.

Much More!

  • This activity was made using Quarto.

  • This Quarto Cheat Sheet describes additional features and fundamentals of Quarto.

  • HTML Theming is available when rendering to HTML with Quarto to customize the overall appearance of the output document

Bonus (Optional)

Tip

Toggle the Source and Visual editor buttons in the top left of the editor pane, , to explore the difference between the two modalities in Quarto.

Quarto is the most modern tool for reproducible documents in RStudio that Posit introduced in 2020, whereas its widely used predecessor R Markdown was introduced in 2012.

The syntax for Quarto is very similar to that of R Markdown, but the main differences are that Quarto has a simplified YAML, chunk options are specified slightly differently, and Quarto has more of a multilingual focus than R Markdown.