R Lab: Exploratory Data Analysis

DSA 220 - Introduction to Data Science and Analytics

Author

Andrew DiLernia

Once completed, submit a zipped file (.zip) containing two total documents via Blackboard: a .qmd Quarto file and the corresponding HTML document.

1.

First, add a new code chunk containing the code below to load the tidyverse, ggthemes, gt, and the corrr R packages, and the example clinical trial data from the (Collecting and Preparing Data)[https://dilernia.github.io/DSA220/Activities/collecting-and-preparing-data.html#sec-example-trial] activity.

# Load necessary packages
library(tidyverse)
library(ggthemes)
library(gt)
library(corrr)

# Importing clinical trial data from GitHub
trial_data <- read_csv("https://raw.githubusercontent.com/dilernia/DSA220/refs/heads/main/Data/trial_data.csv")

2.

  1. Create and print the table of groupwise summary statistics below for the UPDRS improvement scores from the clinical trial data.
treatment_group Minimum Q1 Median Q3 Maximum Mean Range SD
Neuroquil -4 0 1.5 3 7 1.570513 11 2.167538
SOC -8 -4 -2.0 -1 4 -2.354167 12 2.175736
  1. What is the IQR for the Neuroquil group’s improvement scores?

  2. What is the 50th percentile for the improvement scores for the SOC group?

3.

  1. Reproduce and display the frequency table below.
treatment_group sex n
Neuroquil Female 88
Neuroquil Male 68
SOC Female 70
SOC Male 74
  1. Are there more participants who were assigned to the Neuroquil group or the SOC group?

4.

  1. Reproduce and display the visualization below. Make sure to match the plot exactly, including all labels and the coloring.

  1. What is the median UPDRS improvement for the SOC group?

  2. What is the 75th percentile for the improvement score for participants who received Neuroquil?

  3. What is the maximum UPDRS improvement for the SOC group?

5.

  1. Reproduce and display the scatter plot below. Make sure to match the plot exactly, including all labels and the coloring of points.

  1. What do we observe in the scatter plot? Describe the relationship in terms of strength, direction, and linearity.

6.

Reproduce and display the faceted scatter plot below. Make sure to match the plot exactly, including all labels and the coloring of points.

7.

Reproduce the bar chart below.