Introduction to Quarto
DSA 220 - Introduction to Data Science and Analytics
Learning Objectives
Render Quarto to different formats
Understand fundamental Markdown syntax
Use code chunks in Quarto
Customize code chunk options
Include plots & tables in Quarto
Quarto
Quarto is a modern, open-source data science software tool for creating dynamic documents, reports, websites, and presentations. The power of Quarto for data science is in its integration of written text and executable code in a single source document for reproducible and streamlined analyses. Quarto is also compatible with multiple languages, including R and Python, facilitating its use in data science more broadly.
Projects in RStudio are organizations of all the files associated with a particular analysis, such as R scripts, data, and outputs, into a single folder. Projects facilitate an organized, portable, and reproducible workflow.
Let’s open RStudio and create a new project.
Click File > New Project… > New Directory > New Project. In the Directory name: field, type DSA-220. Then click the Create Project button.
Next, we will open our first Quarto file.
Open the template Quarto document from Blackboard.
Render it by clicking the Render button (top left).
- Also render the document by using the appropriate keyboard shortcut (Mac: Command + Shift + K, Windows: Ctrl + Shift + K).
Create new documents for two of the most-common formats: HTML and Word. Render each of the documents.
For the two new files, does the output differ?
Looking at the YAML at the top of each document, how does the input differ?
Note: It is possible to render to a PDF, but we will not use the PDF format in this course. If using PDF as the output format, you may need to install LaTeX if rendering to PDF causes an error. This can be done by installing the tinytex package by submitting install.packages("tinytex") to the Console by pressing return / enter, and then submitting tinytex::install_tinytex() to the Console after the package has been successfully installed.
For the rest of the activity, we will focus on rendering to an HTML format.
Markdown
Markdown is a lightweight set of conventions for formatting plain text files. It is designed to be easy to read and write, so it is not very customizable but can be learned quickly.
Headings
Headings and subheadings can be included in a file to organize the document.
# 1st Level Header## 2nd Level Header### 3rd Level Header
Include a first-level header, second-level header, and third-level header in your document naming them Section 1, Subsection 1.1, and Subsubsection 1.1.1, respectively.
Links
Hyperlinks can be included in Quarto documents as well.
<http://example.com>[linked phrase](http://example.com)
Include a hyperlink linking to the Google homepage.
Text Formatting
Basic formatting of text, such as italicizing, bolding, superscripts, and subscripts can be implemented as well.
Raw text input | Rendered output |
|---|---|
*italic* | italic |
**bold** | bold |
`code` | code |
superscript^2^ | superscript2 |
subscript~2~ | subscript2 |
You try
Include an italicized word or phrase.
Include a bold word or phrase.
Include a word styled showing it is code.
Include a superscript and a subscript.
Code Chunks
The real power of Quarto comes from the ability to add code chunks to the file, which allows us to run R code inside our document:
# Sophisticated calculation
sqrt(49)[1] 7
There are several ways to insert a code chunk:
The keyboard shortcut Cmd/Ctrl + Alt + I (recommended)
The “Insert” button icon
in the editor toolbar (top right).
By manually typing the chunk delimiters
```{r} and ```.
Add a new code chunk at the bottom of the Quarto document, naming the chunk uptown-chunk by including #| label: uptown-chunk as the first line inside the code chunk.
In the code chunk, use the include_graphics() function from the knitr package to include the image at the following URL: https://github.com/dilernia/DSA220/blob/main/images/uptownFunk2.png?raw=true. Note that the URL should be enclosed in quotes when calling the include_graphics() function.
Code chunks have multiple options which control how they behave. Some more commonly used code chunk options are below:
| Option | Description |
|---|---|
| eval | Determines if the code should be evaluated (true / false) |
| echo | Controls whether the code is shown in the output (true / false) |
| include | Specifies if the result should be included in the output (true / false) |
| warning | Indicates if warnings should be shown (true / false) |
| message | Indicates if messages should be shown (true / false) |
| error | Include errors in the output. When set to true, runtime errors will not halt processing of the document. (true / false) |
| out-width | Width of the plot in the output document |
| out-height | Height of the plot in the output document |
| fig-align | Figure horizontal alignment (default, left, right, or center) |
Modify uptown-chunk by setting the echo chunk option to be false by including #| echo: false immediately below the line with #| label: uptown-chunk.
Specify and toggle the eval chunk option for the uptown-chunk code chunk and see what happens. Note that some of the code chunk options for controlling the dimensions of figures only apply for graphics produced in code chunks, but not externally included images.
To explore more code chunk options and details, see https://quarto.org/docs/reference/cells/cells-knitr.html?form=MG0AV3.
We can specify default code chunks options for the entire document by customizing the YAML (at the top of the Quarto document) as below.
---
title: "Document title"
author: "Author name"
date: today
format:
html:
self-contained: true
embed-resources: true
toc: true
execute:
eval: true
echo: true
warning: false
message: false
---
Including Plots
Using R code inside chunks that produce plots allows graphics to be directly included in the output document.
For example, let’s explore a data set on storms ⛈️ between 1975 and 2021 from the National Oceanic and Atmospheric Administration (NOAA). In particular, let’s visualize the relationship between the pressure (in millibars) of a storm and its maximum sustained wind speed (in knots).
library(tidyverse)
# Load storms data set
data(storms)
storms |> ggplot(aes(x = pressure, y = wind)) +
geom_point()Note: If this code has an error such as Error in library(tidyverse) : there is no package called ‘tidyverse’, you may need to submit install.packages('tidyverse') to the Console first to install the package. If you do not encounter an error, then it is best not to reinstall the package.
In general, you may need to install an R package by submitting install.packages("package_name") to the Console first, replacing "package_name" with the name of the R package to install in quotes. Typically, you only install R packages as you need them.
Include the code to create a scatter plot above in a new chunk at the bottom of the Quarto document.
Inline Code
Quarto also allows you to use R code outside of code chunks for dynamic text:
xbar <- 2
se <- 1.3The following Markdown syntax:
The 95% confidence interval for the mean is (`r xbar - 1.96*se`, `r xbar + 1.96*se`)
produces:
The 95% confidence interval for the mean is (-0.548, 4.548).
Include a chunk of R code and after that in-line R code at the bottom of the document to reproduce the sentence containing the confidence interval results above.
Tables
Quarto allows you to display nicely formatted tables. Let’s explore a few of the most common R functions for nicely displaying tables in Quarto. To do so, let’s continue with the storms data set looking at the four largest sustained wind speeds recorded for different storms 💨.
# Largest sustained wind speeds recorded
storms_summary <- storms |>
group_by(name, year) |>
summarize(max_wind_knots = max(wind)) |>
ungroup() |>
slice_max(max_wind_knots, n = 4)flextable() from the flextable package
library(flextable)
storms_summary |>
flextable() |>
colformat_double(big.mark = "", digits = 0) |>
autofit()name | year | max_wind_knots |
|---|---|---|
Allen | 1980 | 165 |
Dorian | 2019 | 160 |
Gilbert | 1988 | 160 |
Wilma | 2005 | 160 |
The flextable package displays tables for all output formats in a relatively consistent manner, with customization options that work for all output formats.
There are multiple ways to customize table captions. A versatile approach is to customize the label and tbl-cap code chunk options as below.
#| label: 'tbl-a_table_name'
#| tbl-cap: 'A custom table caption'
Note that the label option must begin with "tbl" prefix to properly include the table caption. For more details regarding customization of included tables, see https://quarto.org/docs/authoring/tables.html.
gt() from the gt package
library(gt)
storms_summary |>
gt()| name | year | max_wind_knots |
|---|---|---|
| Allen | 1980 | 165 |
| Dorian | 2019 | 160 |
| Gilbert | 1988 | 160 |
| Wilma | 2005 | 160 |
The gt package has a wide range of formatting options, including conditional formatting and group-wise summary statistics. It is most easily used with HTML as the output format for Quarto, but not as easily for PDF or Word output when further customizing the table.
Table themes
There are many options for finely customizing aspects about tables displayed in Quarto. We can also apply complete themes to customize the appearance of tables with minimal code. The flextable package comes with preset themes available.
# Displaying flextable with tron theme
storms_summary |>
flextable() |>
autofit() |>
theme_tron()name | year | max_wind_knots |
|---|---|---|
Allen | 1,980 | 165 |
Dorian | 2,019 | 160 |
Gilbert | 1,988 | 160 |
Wilma | 2,005 | 160 |
For gt tables one must load the gtExtras package to use complete themes. Unfortunately gtExtras does not fully customize the tables when rendering to Word or PDF.
library(gtExtras)
# Displaying gt with ESPN theme
storms_summary |>
gt() |>
gt_theme_espn()| name | year | max_wind_knots |
|---|---|---|
| Allen | 1980 | 165 |
| Dorian | 2019 | 160 |
| Gilbert | 1988 | 160 |
| Wilma | 2005 | 160 |
You try
Display the
storms_summarytable applying a complete theme different than the one used above using thegt()function and thegtExtrasR package by viewing the available themes here: https://jthomasmock.github.io/gtExtras/reference/index.htmlDisplay the
storms_summarytable applying a complete theme different than the one used above using theflextable()function by viewing the available themes here: https://davidgohel.github.io/flextable/reference/index.html#flextable-themes.
Multilingual support
Quarto supports multiple programming languages within the same document including R, Python, SQL, Bash, and Julia among others. This facilitates reproducible workflows and collaborations involving multiple languages.
Below are full code chunks for R, Bash, and Python, including the code chunk headers and delimiters. R code chunks start with ```{r}, Bash with ```{bash}, and Python with ```{python}.
Bash
Bash, short for “Bourne Again SHell,” is a command-line interpreter commonly used in Unix-based systems that enables users to automate tasks, navigate file systems, and execute scripts. Bash chunks are useful for running shell commands, like checking a file’s contents or printing the current date. A Bash code chunk begins with ```{bash}.
Here’s an example of a Bash code chunk that prints a message and the current date.
```{bash}
# Print a simple message to the screen
echo "Hello from a Bash code chunk!"
# Print the current date and time
echo "The current date is:"
date
```Hello from a Bash code chunk!
The current date is:
Thu Aug 28 14:49:01 EDT 2025
Python
Quarto can execute Python code directly in chunks that start with ```{python}. When working in an R-focused environment like RStudio, we can use the R package reticulate to help working with Python.
Setting Up Python Packages
The R code chunk below makes use of reticulate to install Python libraries for us to use. For this to work, you must have Python installed on your computer and configured correctly with RStudio. If necessary, one can install Python using reticulate::install_miniconda() or reticulate::install_python(version = "3.9.6") for a specific version of Python.
```{r}
# Load the reticulate package
library(reticulate)
# A list of Python packages we need
python_packages <- c("matplotlib", "pandas")
# Install packages if they are not already available
for(pack in python_packages) {
if(!py_module_available(pack)) {
py_install(pack)
}
}
```Running Python Code
Now that we have the necessary libraries available, we can use the Python code chunk below to make a scatter plot for the same storms data set we used previously.
```{python}
# Loading Python libraries matplotlib and pandas
import matplotlib.pyplot as plt
import pandas as pd
# Load the storms dataset using pandas
storms = pd.read_csv('https://raw.githubusercontent.com/tidyverse/dplyr/main/data-raw/storms.csv')
# Creating scatter plot using matplotlib
plt.scatter(storms['pressure'], storms['wind'])
# Adding labels
plt.xlabel('Pressure (mb)')
plt.ylabel('Wind Speed (knots)')
```Word Templates
Organizations, such as federal agencies like the US Department of Agriculture, can have weekly or monthly reports that change as data / other inputs are updated.
When rendering to Word, Quarto can use Word document templates for consistent formatting, headers, etc. while updating charts & tables.
Much More!
This activity was made using Quarto.
This Quarto Cheat Sheet describes additional features and fundamentals of Quarto.
HTML Theming is available when rendering to HTML with Quarto to customize the overall appearance of the output document
Bonus (Optional)
Toggle the Source and Visual editor buttons in the top left of the editor pane, , to explore the difference between the two modalities in Quarto.
Quarto is the most modern tool for reproducible documents in RStudio that Posit introduced in 2020, whereas its widely used predecessor R Markdown was introduced in 2012.
The syntax for Quarto is very similar to that of R Markdown, but the main differences are that Quarto has a simplified YAML, chunk options are specified slightly differently, and Quarto has more of a multilingual focus than R Markdown.