R Workout 5 - Relational Data

Welcome in R challange 5.

This is 2 stars challange.

So far we have been working with single tables. In this task, we will practice merging datasets so that in the future we will have the power to work on complex sets.

datasets.docx (13.3 KB)

Task 1.
You have 2 tables at your disposal: in the first one you have the names of the employees and their ages. Second contains data about employees and employers.

image

image

Add company to People table. Leave all values from left table:

Expected result:

image

Task 2. Add company to People table. Leave only matches.

Expected result:

image

Task 3. Display only matches between tables. Do not add columns.

Expected result:

image

Task 4. Display mismatches. Do not add colums

image

Dataset 2.
Level: Medium

Kids table

image

Parents table:

image

Task 5. Knowing that mother is always mentioned first, bring only fathers names to table Kids.
Do not use filtering options, only parameters available in relational data functions. Solve problem of different headers with kids’ names.

Expected results:

image

Feel free to use R online: https://rdrr.io/

Submission

Simply post your code and a screenshot of your results.

Please format your R code and blur it or place it in a hidden section.

Solution to this challenge will be posted on May 28

Hello All, Please see solutions.

library(tidyverse)

#Task 1

In this task, we have two datasets, “People” and “Companies”. We want to combine these datasets based on the matching values in the columns “First_Name” and “Last_Name”. The left join operation left_join() is used to merge the datasets. The result will include all rows from the “People” dataset, and for each matching row, it will add the corresponding columns from the “Companies” dataset. If there is no match, the columns from the “Companies” dataset will have missing values.

People %>%
left_join(Companies,by = c(“First_Name”,“Last_Name”))

#Task 2

In Task 2, we have the same datasets, “People” and “Companies”. However, this time we are performing an inner join using the inner_join() function. An inner join returns only the rows where there is a match in both datasets based on the specified columns, in this case, “First_Name” and “Last_Name”. The result will include only the rows where there is a match, and it will combine the columns from both datasets.

People %>%
inner_join(Companies,by = c(“First_Name”,“Last_Name”))

#Task 3

For Task 3, we are again working with the “People” and “Companies” datasets. The semi join operation semi_join() is used. A semi join returns only the rows from the first dataset (in this case, “People”) that have a match in the second dataset (“Companies”). It keeps all columns from the first dataset and discards the columns from the second dataset. The matching is based on the specified columns, “First_Name” and “Last_Name”.

People %>% semi_join(Companies,by = c(“First_Name”,“Last_Name”))

#Task 4

In Task 4, we are performing an anti join operation using the anti_join() function. The anti join returns only the rows from the first dataset (“People”) that do not have a match in the second dataset (“Companies”). It keeps all columns from the first dataset and discards the columns from the second dataset. The matching is based on the specified columns, “First_Name” and “Last_Name”.

People %>% anti_join(Companies,by = c(“First_Name”,“Last_Name”))

#Task 5

In Task 5, we have two datasets, “Kids” and “Parents”. We want to merge these datasets based on a multiple match scenario. The left join operation left_join() is used, but with an additional argument multiple = “last”. This argument specifies that in case of multiple matches, only the last match should be included. The matching is done between the column “Kids” in the “Kids” dataset and the column “Children” in the “Parents” dataset. The result will include all rows from the “Kids” dataset, and for each matching row, it will add the corresponding columns from the “Parents” dataset, considering the last match when there are multiple matches.

Kids %>%
left_join(Parents, by = c(“Kids” = “Children”),multiple = “last”)

Answer:

Hi @KrzysztofNowak

thanks for the workout. I’m certaining learning something new.

Task 1

People %>% left_join(Companies, by = c(“First_Name”, “Last_Name”))
First_Name Last_Name Age Current_Employer
1 Emma Smith 45 Google
2 Liam Johnson 23 Apple
3 Olivia Williams 54 Amazon
4 Noah Jones 34 Microsoft
5 Ava Brown 65 Facebook
6 Isabella Davis 34 Tesla
7 Sophia Miller 21 Netflix
8 Mia Wilson 34
9 Charlotte Moore 54
10 Amelia Taylor 32

Task 2
People %>% inner_join(Companies, by = c(“First_Name”, “Last_Name”))
First_Name Last_Name Age Current_Employer
1 Emma Smith 45 Google
2 Liam Johnson 23 Apple
3 Olivia Williams 54 Amazon
4 Noah Jones 34 Microsoft
5 Ava Brown 65 Facebook
6 Isabella Davis 34 Tesla
7 Sophia Miller 21 Netflix

Task 3
People %>% semi_join(Companies, by = c(“First_Name”, “Last_Name”))
First_Name Last_Name Age
1 Emma Smith 45
2 Liam Johnson 23
3 Olivia Williams 54
4 Noah Jones 34
5 Ava Brown 65
6 Isabella Davis 34
7 Sophia Miller 21

Task 4
People %>% anti_join(Companies,by = c(“First_Name”,“Last_Name”))
First_Name Last_Name Age
1 Mia Wilson 34
2 Charlotte Moore 54
3 Amelia Taylor 32

Task 5
Note getting the right results using the R programming that was above
Kids %>% left_join(Parents, by = c(“Kids” = “Children”),multiple = “last”)
Kids Parent Description
1 John Elizabeth Jones Mother
2 John Patric Stewart Father
3 Natalie Jessica Clarke Mother
4 Natalie John Smith Father
5 Patricia Donna May Mother
6 Patricia Thomas Mann Father

Thanks for the workout.
Keith