Data Analysis Workout 06 - Roller Coaster Analysis

Author

Balogun Tomiwa

Published

May 7, 2023

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
dataset = pd.read_csv("C:/Users/Dechamp/Downloads/coaster_db.csv")
dataset.head(5)
coaster_name Length Speed Location Status Opening date Type Manufacturer Height restriction Model ... speed1 speed2 speed1_value speed1_unit speed_mph height_value height_unit height_ft Inversions_clean Gforce_clean
0 Switchback Railway 600 ft (180 m) 6 mph (9.7 km/h) Coney Island Removed June 16, 1884 Wood LaMarcus Adna Thompson NaN Lift Packed ... 6 mph 9.7 km/h 6.0 mph 6.0 50.0 ft NaN 0 2.9
1 Flip Flap Railway NaN NaN Sea Lion Park Removed 1895 Wood Lina Beecher NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 1 12.0
2 Switchback Railway (Euclid Beach Park) NaN NaN Cleveland, Ohio, United States Closed NaN Other NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 0 NaN
3 Loop the Loop (Coney Island) NaN NaN Other Removed 1901 Steel Edwin Prescott NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 1 NaN
4 Loop the Loop (Young's Pier) NaN NaN Other Removed 1901 Steel Edwin Prescott NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 1 NaN

5 rows × 56 columns

Q1 - How many columns and rows are in the dataset?

dataset.shape
(1087, 56)
The dataset contains 56 Columns and 1087 rows.

Q2 - Is there any missing data?

dataset.isnull().values.any()
True

Q3 - Display the summary statistics of the numeric columns using the describe method

dataset.describe()
Inversions year_introduced latitude longitude speed1_value speed_mph height_value height_ft Inversions_clean Gforce_clean
count 932.000000 1087.000000 812.000000 812.000000 937.000000 937.000000 965.000000 171.000000 1087.000000 362.000000
mean 1.547210 1994.986201 38.373484 -41.595373 53.850374 48.617289 89.575171 101.996491 1.326587 3.824006
std 2.114073 23.475248 15.516596 72.285227 23.385518 16.678031 136.246444 67.329092 2.030854 0.989998
min 0.000000 1884.000000 -48.261700 -123.035700 5.000000 5.000000 4.000000 13.100000 0.000000 0.800000
25% 0.000000 1989.000000 35.031050 -84.552200 40.000000 37.300000 44.000000 51.800000 0.000000 3.400000
50% 0.000000 2000.000000 40.289800 -76.653600 50.000000 49.700000 79.000000 91.200000 0.000000 4.000000
75% 3.000000 2010.000000 44.799600 2.778100 63.000000 58.000000 113.000000 131.200000 2.000000 4.500000
max 14.000000 2022.000000 63.230900 153.426500 240.000000 149.100000 3937.000000 377.300000 14.000000 12.000000

Q4 - Rename various columns

dataset.rename (columns = {'coaster_name' : 'Coaster_Name', 'year_introduced' : 'Year_Introduced', 
                           'opening_date_clean' : 'Opening_Date', 'speed_mph' : 'Speed_mph', 
                           'height_ft' : 'Height_ft', 'Inversions_clean' : 'Inversions', 
                           'Gforce_clean' : 'Gforce'}, inplace=True)
dataset
Coaster_Name Length Speed Location Status Opening date Type Manufacturer Height restriction Model ... speed1 speed2 speed1_value speed1_unit Speed_mph height_value height_unit Height_ft Inversions Gforce
0 Switchback Railway 600 ft (180 m) 6 mph (9.7 km/h) Coney Island Removed June 16, 1884 Wood LaMarcus Adna Thompson NaN Lift Packed ... 6 mph 9.7 km/h 6.0 mph 6.0 50.0 ft NaN 0 2.9
1 Flip Flap Railway NaN NaN Sea Lion Park Removed 1895 Wood Lina Beecher NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 1 12.0
2 Switchback Railway (Euclid Beach Park) NaN NaN Cleveland, Ohio, United States Closed NaN Other NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 0 NaN
3 Loop the Loop (Coney Island) NaN NaN Other Removed 1901 Steel Edwin Prescott NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 1 NaN
4 Loop the Loop (Young's Pier) NaN NaN Other Removed 1901 Steel Edwin Prescott NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN 1 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1082 American Dreier Looping 3,444 ft (1,050 m) 53 mph (85 km/h) Other NaN NaN Steel Anton Schwarzkopf 55 in (140 cm) NaN ... 53 mph 85 km/h 53.0 mph 53.0 111.0 ft NaN 3 4.7
1083 Pantheon (roller coaster) 3,328 ft (1,014 m) 73 mph (117 km/h) Busch Gardens Williamsburg Under construction 2022 Steel – Launched Intamin NaN Blitz Coaster ... 73 mph 117 km/h 73.0 mph 73.0 178.0 ft NaN 2 NaN
1084 Tron Lightcycle Power Run 3,169.3 ft (966.0 m) 59.3[1] mph (95.4 km/h) Other NaN June 16, 2016 Steel – Launched Vekoma 4[2] ft (122 cm) Motorbike roller coaster ... 59.3 mph 95.4 km/h 59.3 mph 59.3 78.1 ft NaN 0 4.0
1085 Tumbili 770 ft (230 m) 34 mph (55 km/h) Kings Dominion Under construction NaN Steel – 4th Dimension – Wing Coaster S&S – Sansei Technologies NaN 4D Free Spin ... 34 mph 55 km/h 34.0 mph 34.0 112.0 ft NaN 0 NaN
1086 Wonder Woman Flight of Courage 3,300 ft (1,000 m) 58 mph (93 km/h) Six Flags Magic Mountain Under construction 2022 Steel – Single-rail Rocky Mountain Construction NaN Raptor – Custom ... 58 mph 93 km/h 58.0 mph 58.0 131.0 ft NaN 3 NaN

1087 rows × 56 columns

Q5 - Are there any duplicated rows?

dataset.duplicated().sum()
0
No, there are no duplicated rows in the entire dataset

Q6 - What are the top 3 years with the most roller coasters introduced?

dataset.groupby('Year_Introduced').agg\
        (No_of_coaster = ('Year_Introduced', 'count'))\
        .sort_values(by='No_of_coaster', ascending=False).head(3).reset_index()
Year_Introduced No_of_coaster
0 1999 49
1 2000 47
2 1998 32

Q7 - What is the average speed? Also display a plot to show it’s distribution?

dataset['speed1_value'].mean()
53.850373532550684
plt.hist(dataset['speed1_value'], bins=20)
plt.title('Speed Value Distribution')
Text(0.5, 1.0, 'Speed Value Distribution')

Q8 - Explore the feature relationships. Are there any positively or negatively correlated relationships?

dataset.corr()
Inversions Year_Introduced latitude longitude speed1_value Speed_mph height_value Height_ft Inversions Gforce
Inversions 1.000000 0.211003 -0.009815 0.061589 0.163419 0.252209 0.094811 0.171330 1.000000 0.356865
Year_Introduced 0.211003 1.000000 -0.070982 0.175913 0.210191 0.204853 0.087687 0.232150 0.228758 -0.066657
latitude -0.009815 -0.070982 1.000000 -0.298488 -0.121847 -0.063757 -0.004265 0.011492 -0.014043 0.042871
longitude 0.061589 0.175913 -0.298488 1.000000 0.301179 0.051063 -0.092764 0.159733 0.087160 0.016485
speed1_value 0.163419 0.210191 -0.121847 0.301179 1.000000 0.851667 0.088761 0.815103 0.176105 0.379962
Speed_mph 0.252209 0.204853 -0.063757 0.051063 0.851667 1.000000 0.241461 0.829404 0.265763 0.489337
height_value 0.094811 0.087687 -0.004265 -0.092764 0.088761 0.241461 1.000000 1.000000 0.108199 0.337386
Height_ft 0.171330 0.232150 0.011492 0.159733 0.815103 0.829404 1.000000 1.000000 0.164246 0.475020
Inversions 1.000000 0.228758 -0.014043 0.087160 0.176105 0.265763 0.108199 0.164246 1.000000 0.345106
Gforce 0.356865 -0.066657 0.042871 0.016485 0.379962 0.489337 0.337386 0.475020 0.345106 1.000000

Q9 - What are the most used coaster over the years?

dataset.groupby('Coaster_Name').agg\
        (No_of_times_used = ('Coaster_Name', 'count'))\
        .sort_values(by='No_of_times_used', ascending=False).head(10).reset_index()
Coaster_Name No_of_times_used
0 Batman: The Ride 7
1 Flight of the Hippogriff 4
2 Lego Technic Test Track 4
3 American Dreier Looping 4
4 Big Thunder Mountain Railroad 4
5 Journey to Atlantis 3
6 Flashback (Six Flags Magic Mountain) 3
7 Alpine Bobsled 3
8 Pandemonium (roller coaster) 3
9 Super Grover's Box Car Derby 3