import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
Data Analysis Workout 06 - Roller Coaster Analysis
= pd.read_csv("C:/Users/Dechamp/Downloads/coaster_db.csv")
dataset 5) dataset.head(
coaster_name | Length | Speed | Location | Status | Opening date | Type | Manufacturer | Height restriction | Model | ... | speed1 | speed2 | speed1_value | speed1_unit | speed_mph | height_value | height_unit | height_ft | Inversions_clean | Gforce_clean | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Switchback Railway | 600 ft (180 m) | 6 mph (9.7 km/h) | Coney Island | Removed | June 16, 1884 | Wood | LaMarcus Adna Thompson | NaN | Lift Packed | ... | 6 mph | 9.7 km/h | 6.0 | mph | 6.0 | 50.0 | ft | NaN | 0 | 2.9 |
1 | Flip Flap Railway | NaN | NaN | Sea Lion Park | Removed | 1895 | Wood | Lina Beecher | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1 | 12.0 |
2 | Switchback Railway (Euclid Beach Park) | NaN | NaN | Cleveland, Ohio, United States | Closed | NaN | Other | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN |
3 | Loop the Loop (Coney Island) | NaN | NaN | Other | Removed | 1901 | Steel | Edwin Prescott | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1 | NaN |
4 | Loop the Loop (Young's Pier) | NaN | NaN | Other | Removed | 1901 | Steel | Edwin Prescott | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1 | NaN |
5 rows × 56 columns
Q1 - How many columns and rows are in the dataset?
dataset.shape
(1087, 56)
The dataset contains 56 Columns and 1087 rows.
Q2 - Is there any missing data?
any() dataset.isnull().values.
True
Q3 - Display the summary statistics of the numeric columns using the describe method
dataset.describe()
Inversions | year_introduced | latitude | longitude | speed1_value | speed_mph | height_value | height_ft | Inversions_clean | Gforce_clean | |
---|---|---|---|---|---|---|---|---|---|---|
count | 932.000000 | 1087.000000 | 812.000000 | 812.000000 | 937.000000 | 937.000000 | 965.000000 | 171.000000 | 1087.000000 | 362.000000 |
mean | 1.547210 | 1994.986201 | 38.373484 | -41.595373 | 53.850374 | 48.617289 | 89.575171 | 101.996491 | 1.326587 | 3.824006 |
std | 2.114073 | 23.475248 | 15.516596 | 72.285227 | 23.385518 | 16.678031 | 136.246444 | 67.329092 | 2.030854 | 0.989998 |
min | 0.000000 | 1884.000000 | -48.261700 | -123.035700 | 5.000000 | 5.000000 | 4.000000 | 13.100000 | 0.000000 | 0.800000 |
25% | 0.000000 | 1989.000000 | 35.031050 | -84.552200 | 40.000000 | 37.300000 | 44.000000 | 51.800000 | 0.000000 | 3.400000 |
50% | 0.000000 | 2000.000000 | 40.289800 | -76.653600 | 50.000000 | 49.700000 | 79.000000 | 91.200000 | 0.000000 | 4.000000 |
75% | 3.000000 | 2010.000000 | 44.799600 | 2.778100 | 63.000000 | 58.000000 | 113.000000 | 131.200000 | 2.000000 | 4.500000 |
max | 14.000000 | 2022.000000 | 63.230900 | 153.426500 | 240.000000 | 149.100000 | 3937.000000 | 377.300000 | 14.000000 | 12.000000 |
Q4 - Rename various columns
= {'coaster_name' : 'Coaster_Name', 'year_introduced' : 'Year_Introduced',
dataset.rename (columns 'opening_date_clean' : 'Opening_Date', 'speed_mph' : 'Speed_mph',
'height_ft' : 'Height_ft', 'Inversions_clean' : 'Inversions',
'Gforce_clean' : 'Gforce'}, inplace=True)
dataset
Coaster_Name | Length | Speed | Location | Status | Opening date | Type | Manufacturer | Height restriction | Model | ... | speed1 | speed2 | speed1_value | speed1_unit | Speed_mph | height_value | height_unit | Height_ft | Inversions | Gforce | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Switchback Railway | 600 ft (180 m) | 6 mph (9.7 km/h) | Coney Island | Removed | June 16, 1884 | Wood | LaMarcus Adna Thompson | NaN | Lift Packed | ... | 6 mph | 9.7 km/h | 6.0 | mph | 6.0 | 50.0 | ft | NaN | 0 | 2.9 |
1 | Flip Flap Railway | NaN | NaN | Sea Lion Park | Removed | 1895 | Wood | Lina Beecher | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1 | 12.0 |
2 | Switchback Railway (Euclid Beach Park) | NaN | NaN | Cleveland, Ohio, United States | Closed | NaN | Other | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | NaN |
3 | Loop the Loop (Coney Island) | NaN | NaN | Other | Removed | 1901 | Steel | Edwin Prescott | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1 | NaN |
4 | Loop the Loop (Young's Pier) | NaN | NaN | Other | Removed | 1901 | Steel | Edwin Prescott | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 1 | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1082 | American Dreier Looping | 3,444 ft (1,050 m) | 53 mph (85 km/h) | Other | NaN | NaN | Steel | Anton Schwarzkopf | 55 in (140 cm) | NaN | ... | 53 mph | 85 km/h | 53.0 | mph | 53.0 | 111.0 | ft | NaN | 3 | 4.7 |
1083 | Pantheon (roller coaster) | 3,328 ft (1,014 m) | 73 mph (117 km/h) | Busch Gardens Williamsburg | Under construction | 2022 | Steel – Launched | Intamin | NaN | Blitz Coaster | ... | 73 mph | 117 km/h | 73.0 | mph | 73.0 | 178.0 | ft | NaN | 2 | NaN |
1084 | Tron Lightcycle Power Run | 3,169.3 ft (966.0 m) | 59.3[1] mph (95.4 km/h) | Other | NaN | June 16, 2016 | Steel – Launched | Vekoma | 4[2] ft (122 cm) | Motorbike roller coaster | ... | 59.3 mph | 95.4 km/h | 59.3 | mph | 59.3 | 78.1 | ft | NaN | 0 | 4.0 |
1085 | Tumbili | 770 ft (230 m) | 34 mph (55 km/h) | Kings Dominion | Under construction | NaN | Steel – 4th Dimension – Wing Coaster | S&S – Sansei Technologies | NaN | 4D Free Spin | ... | 34 mph | 55 km/h | 34.0 | mph | 34.0 | 112.0 | ft | NaN | 0 | NaN |
1086 | Wonder Woman Flight of Courage | 3,300 ft (1,000 m) | 58 mph (93 km/h) | Six Flags Magic Mountain | Under construction | 2022 | Steel – Single-rail | Rocky Mountain Construction | NaN | Raptor – Custom | ... | 58 mph | 93 km/h | 58.0 | mph | 58.0 | 131.0 | ft | NaN | 3 | NaN |
1087 rows × 56 columns
Q5 - Are there any duplicated rows?
sum() dataset.duplicated().
0
No, there are no duplicated rows in the entire dataset
Q6 - What are the top 3 years with the most roller coasters introduced?
'Year_Introduced').agg\
dataset.groupby(= ('Year_Introduced', 'count'))\
(No_of_coaster ='No_of_coaster', ascending=False).head(3).reset_index() .sort_values(by
Year_Introduced | No_of_coaster | |
---|---|---|
0 | 1999 | 49 |
1 | 2000 | 47 |
2 | 1998 | 32 |
Q7 - What is the average speed? Also display a plot to show it’s distribution?
'speed1_value'].mean() dataset[
53.850373532550684
'speed1_value'], bins=20)
plt.hist(dataset['Speed Value Distribution') plt.title(
Text(0.5, 1.0, 'Speed Value Distribution')
Q9 - What are the most used coaster over the years?
'Coaster_Name').agg\
dataset.groupby(= ('Coaster_Name', 'count'))\
(No_of_times_used ='No_of_times_used', ascending=False).head(10).reset_index() .sort_values(by
Coaster_Name | No_of_times_used | |
---|---|---|
0 | Batman: The Ride | 7 |
1 | Flight of the Hippogriff | 4 |
2 | Lego Technic Test Track | 4 |
3 | American Dreier Looping | 4 |
4 | Big Thunder Mountain Railroad | 4 |
5 | Journey to Atlantis | 3 |
6 | Flashback (Six Flags Magic Mountain) | 3 |
7 | Alpine Bobsled | 3 |
8 | Pandemonium (roller coaster) | 3 |
9 | Super Grover's Box Car Derby | 3 |