R Workout 8 - R Proficiency: Delving into Data Analytics

EnterpriseDNA · September 5, 2023, 9:00am

R, with its extensive package ecosystem and powerful analytical capabilities, stands as a pillar in the data analytics world. Dive into this workout to sharpen your R skills and elevate your data analysis prowess.

Scenario:

Imagine you’re a data scientist analyzing customer feedback for a product launch. You have a dataset with customer demographics and their satisfaction scores. How would you leverage R to extract insights and understand key drivers of customer satisfaction?

Objectives:

By the end of this workout, you should be able to:

Load and inspect data in R.
Conduct basic data wrangling and transformation using R.
Perform exploratory data analysis (EDA) to glean insights.

Interactive Task:

Given your understanding of R, answer the following:

You have a CSV file named “customer_feedback.csv”. How would you load this dataset into an R dataframe named “feedback_df”?
- Your Code: ________________________
If you want to get a quick summary of each column in “feedback_df”, which function would you use?
- Your Code: ________________________
To visualize the distribution of satisfaction scores, which type of plot would you consider, and how might you code it in R?
- Your Answer: ________________________
- Your Code: ________________________

Questions:

In R, which package is widely recognized for its data wrangling capabilities, offering functions like mutate(), select(), and filter()?
- i) ggplot2
- ii) shiny
- iii) dplyr
- iv) lattice
When dealing with missing data in an R dataframe, which function can be used to remove any rows that contain NA values?
- i) drop_na()
- ii) remove_na()
- iii) na.omit()
- iv) exclude_na()

Duration: 20 minutes

Difficulty: Intermediate

Period
This workout will be released on Tuesday, September 5, 2023, and will end on Thursday, September 28, 2023. But you can always come back to any of the workouts and solve them.

Keith · October 3, 2023, 9:43pm

Hi @EnterpriseDNA ,

Here is the solution to this workout.

Questions:

In R, which package is widely recognized for its data wrangling capabilities, offering functions like mutate(), select(), and filter()?
Answer:

iii) dplyr

When dealing with missing data in an R dataframe, which function can be used to remove any rows that contain NA values?
Answer:

iii) na.omit()

Interactive Task:

You have a CSV file named “customer_feedback.csv”. How would you load this dataset into an R dataframe named “feedback_df”?

Code:
feedback_df ← read.csv(“customer_feedback.csv”)

If you want to get a quick summary of each column in “feedback_df”, which function would you use?

Your Code:
summary(feedback_df)

To visualize the distribution of satisfaction scores, which type of plot would you consider, and how might you code it in R?

Answer:
A) Histogram
Code

Create a histogram of satisfaction scores

hist(feedback_df$satisfaction_score,
main = “Distribution of Satisfaction Scores”,
xlab = “Satisfaction Score”,
ylab = “Frequency”,
col = “blue”,
border = “black”)

B) Density Plot:
B Code:

Create a density plot of satisfaction scores

plot(density(feedback_df$satisfaction_score),
main = “Density Plot of Satisfaction Scores”,
xlab = “Satisfaction Score”,
ylab = “Density”,
col = “red”)

You can choose between a histogram and a density plot based on your preference and the level of detail you want to convey in your visualization. Histograms provide a binned representation of the data, while density plots provide a smooth estimate of the distribution.

Thanks for the workout.
Keith