Hello All, Please see solutions.
library(tidyverse)
#Task 1
In this task, we have two datasets, “People” and “Companies”. We want to combine these datasets based on the matching values in the columns “First_Name” and “Last_Name”. The left join operation left_join() is used to merge the datasets. The result will include all rows from the “People” dataset, and for each matching row, it will add the corresponding columns from the “Companies” dataset. If there is no match, the columns from the “Companies” dataset will have missing values.
People %>%
left_join(Companies,by = c(“First_Name”,“Last_Name”))
#Task 2
In Task 2, we have the same datasets, “People” and “Companies”. However, this time we are performing an inner join using the inner_join() function. An inner join returns only the rows where there is a match in both datasets based on the specified columns, in this case, “First_Name” and “Last_Name”. The result will include only the rows where there is a match, and it will combine the columns from both datasets.
People %>%
inner_join(Companies,by = c(“First_Name”,“Last_Name”))
#Task 3
For Task 3, we are again working with the “People” and “Companies” datasets. The semi join operation semi_join() is used. A semi join returns only the rows from the first dataset (in this case, “People”) that have a match in the second dataset (“Companies”). It keeps all columns from the first dataset and discards the columns from the second dataset. The matching is based on the specified columns, “First_Name” and “Last_Name”.
People %>% semi_join(Companies,by = c(“First_Name”,“Last_Name”))
#Task 4
In Task 4, we are performing an anti join operation using the anti_join() function. The anti join returns only the rows from the first dataset (“People”) that do not have a match in the second dataset (“Companies”). It keeps all columns from the first dataset and discards the columns from the second dataset. The matching is based on the specified columns, “First_Name” and “Last_Name”.
People %>% anti_join(Companies,by = c(“First_Name”,“Last_Name”))
#Task 5
In Task 5, we have two datasets, “Kids” and “Parents”. We want to merge these datasets based on a multiple match scenario. The left join operation left_join() is used, but with an additional argument multiple = “last”. This argument specifies that in case of multiple matches, only the last match should be included. The matching is done between the column “Kids” in the “Kids” dataset and the column “Children” in the “Parents” dataset. The result will include all rows from the “Kids” dataset, and for each matching row, it will add the corresponding columns from the “Parents” dataset, considering the last match when there are multiple matches.
Kids %>%
left_join(Parents, by = c(“Kids” = “Children”),multiple = “last”)