Please submit your .Rmd
and .html
files in Sakai. If you are working together, both people should submit the files.
60 / 60 points total
The goal of the midterm project is to showcase skills that you have learned in class so far. The midterm is open note, but if you use someone else’s code, you must attribute them.
# This code came from
Pick a dataset. Ideally, the dataset should be around 2000 rows, and should have both categorical and numeric covariates.
Potential Sources for data: Tidy Tuesday: https://github.com/rfordatascience/tidytuesday
Note that most of these are .csv files. There is code to load the files from csv for each of the datasets and a short description of the variables, or you can upload the .csv
file into your data
folder.
You may use another dataset or your own data, but please make sure it is de-identified.
Please schedule a time with Eric or Me to discuss your dataset and research question. We just want to look at the data and make sure that it is appropriate for your question.
If you’d like to work together, that is encouraged, but you must divide the work equitably and you must note who worked on what. This is probably easiest as notes in the text. Please let Eric or Me know that you’ll be working together.
No acknowledgements of contributions = -10 points overall.
I will take off points (-5 points for each section) if you don’t add observations and notes in your RMarkdown document. I want you to think and reason through your analysis, even if they are preliminary thoughts.
Define your research question below. What about the data interests you? What is a specific question you want to find out about the data?
I am interested in what the most dangerous park ride and how that relates by state. My specific research question is: Between the states, is there are a difference between the most dangerous park rides?
Given your question, what is your expectation about the data?
I expect the data to show that the most dangerous park rides are different by state, this is due to local safety regulations and climate of the state.
library(readr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(skimr)
library(ggplot2)
Load the data below and use
dplyr::glimpse()
orskimr::skim()
on the data. You should upload the data file into thedata
directory.
parks <- read_csv("data/saferparks.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## .default = col_character(),
## acc_id = col_double(),
## num_injured = col_double(),
## age_youngest = col_double(),
## mechanical = col_double(),
## op_error = col_double(),
## employee = col_double()
## )
## ℹ Use `spec()` for the full column specifications.
glimpse(parks)
## Rows: 8,351
## Columns: 23
## $ acc_id <dbl> 1005813, 1004032, 1007658, 1007098, 1000094, 100…
## $ acc_date <chr> "6/12/2010", "6/12/2010", "7/10/2010", "7/10/201…
## $ acc_state <chr> "OH", "OH", "CA", "CA", "CO", "WI", "WI", "CO", …
## $ acc_city <chr> "Cleveland", "Cleveland", "Anaheim", "Carlsbad",…
## $ fix_port <chr> "F", "P", "F", "F", "F", "F", "P", "F", "P", "F"…
## $ source <chr> "Ohio Dept. of Agriculture", "United States Cons…
## $ bus_type <chr> "Sports or recreation facility", "Sports or recr…
## $ industry_sector <chr> "recreation", "recreation", "amusement ride", "w…
## $ device_category <chr> "inflatable", "inflatable", "water ride", "float…
## $ device_type <chr> "Inflatable slide", "Inflatable slide", "Boat ri…
## $ tradename_or_generic <chr> "inflatable slide", "inflatable slide", "boat ri…
## $ manufacturer <chr> "Scherba Industries / Inflatable Images", "Scher…
## $ num_injured <dbl> 9, 8, 1, 1, 1, 1, 1, 20, 1, 1, 2, 1, 1, 1, 1, 1,…
## $ age_youngest <dbl> NA, 54, 37, 37, NA, 12, 16, NA, 14, NA, 16, 36, …
## $ gender <chr> NA, "M", "F", "F", "M", "F", "F", NA, "M", NA, "…
## $ acc_desc <chr> "Inflatable slide tipped over while 7-9 patrons …
## $ injury_desc <chr> "The man who was crushed by the device died 9 da…
## $ report <chr> "https://saferparksdata.org/sites/default/files/…
## $ category <chr> "Device tipped over, blew away, or collapsed", "…
## $ mechanical <dbl> NA, NA, NA, NA, 1, NA, 1, NA, NA, NA, 1, NA, NA,…
## $ op_error <dbl> 1, 1, NA, NA, NA, 1, NA, 1, NA, NA, NA, NA, NA, …
## $ employee <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ notes <chr> "http://www.cleveland.com/metro/index.ssf/2012/1…
skim(parks)
Name | parks |
Number of rows | 8351 |
Number of columns | 23 |
_______________________ | |
Column type frequency: | |
character | 17 |
numeric | 6 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
acc_date | 0 | 1.00 | 8 | 10 | 0 | 1845 | 0 |
acc_state | 0 | 1.00 | 2 | 2 | 0 | 40 | 0 |
acc_city | 118 | 0.99 | 4 | 20 | 0 | 674 | 0 |
fix_port | 0 | 1.00 | 1 | 1 | 0 | 3 | 0 |
source | 0 | 1.00 | 12 | 57 | 0 | 30 | 0 |
bus_type | 0 | 1.00 | 4 | 29 | 0 | 17 | 0 |
industry_sector | 0 | 1.00 | 7 | 14 | 0 | 4 | 0 |
device_category | 0 | 1.00 | 7 | 23 | 0 | 21 | 0 |
device_type | 0 | 1.00 | 4 | 26 | 0 | 91 | 0 |
tradename_or_generic | 0 | 1.00 | 4 | 32 | 0 | 407 | 0 |
manufacturer | 3310 | 0.60 | 2 | 40 | 0 | 253 | 0 |
gender | 728 | 0.91 | 1 | 1 | 0 | 4 | 0 |
acc_desc | 3 | 1.00 | 4 | 1258 | 0 | 8023 | 0 |
injury_desc | 10 | 1.00 | 4 | 367 | 0 | 3985 | 0 |
report | 8273 | 0.01 | 77 | 86 | 0 | 77 | 0 |
category | 0 | 1.00 | 5 | 54 | 0 | 49 | 0 |
notes | 8290 | 0.01 | 9 | 675 | 0 | 41 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
acc_id | 0 | 1.00 | 1.005e+06 | 3126.04 | 920315 | 1002160 | 1005414 | 1007676 | 1009907 | ▁▁▁▁▇ |
num_injured | 2 | 1.00 | 1.050e+00 | 0.71 | 0 | 1 | 1 | 1 | 30 | ▇▁▁▁▁ |
age_youngest | 684 | 0.92 | 2.460e+01 | 18.28 | 0 | 10 | 18 | 38 | 92 | ▇▃▃▁▁ |
mechanical | 7977 | 0.04 | 1.000e+00 | 0.00 | 1 | 1 | 1 | 1 | 1 | ▁▁▇▁▁ |
op_error | 8192 | 0.02 | 1.000e+00 | 0.00 | 1 | 1 | 1 | 1 | 1 | ▁▁▇▁▁ |
employee | 8306 | 0.01 | 1.000e+00 | 0.00 | 1 | 1 | 1 | 1 | 1 | ▁▁▇▁▁ |
If there are any quirks that you have to deal with
NA
coded as something else, or it is multiple tables, please make some notes here about what you need to do before you start transforming the data in the next section.
Make sure your data types are correct!
Examining the data uploaded, it looks like NA
is coded correctly, but some of the data is not stored as the correct type. Of the data that is not coded corectly is: acc_date
should be coded as a date and all of the following should be coded as factors: fix_port
, industry_sector
, and gender
. Lastly, there are two NA
values for the num_injured
, this may create errors in later analysis so these accidents may be removed.
If the data needs to be transformed in any way (values recoded, pivoted, etc), do it here. Examples include transforming a continuous variable into a categorical using
case_when()
, etc.
parks$acc_date <- as.Date(parks$acc_date, format = "%m/%d/%Y")
parks$acc_state <- factor(parks$acc_state)
parks$fix_port <- factor(parks$fix_port)
parks$industry_sector <- factor(parks$industry_sector)
parks$gender <- factor(parks$gender)
I also decided to recode the columns bus_type
, device_category
, device_type
and category
as factors. I did this because I think it will make it easier to answer my research question.
parks$bus_type <- factor(parks$bus_type)
parks$device_category <- factor(parks$device_category)
parks$device_type <- factor(parks$device_type)
parks$category <- factor(parks$category)
Subset of the data and think of a question to answer the subset
To keep my data straight and in case of mistakes, I have set parks_full
as the dataframe that is uploaded and as the corrected data types. The new parks
dataframe will be the one that will be further analyzed.
parks_full <- parks
I decided to keep the following columns for my analysis: acc_date
, acc_state
, bus_type
, device_category
, device_type
, num_injured
, age_youngest
, category
. While I realized I no longer need the following columns: acc_id
, acc_city
, fix_port
, source
, industry_sector
, tradename_or_generic
, manufacturer
, gender
, acc_desc
, injury_desc
, report
, mechanical
, op_error
, employee
, notes
. I would have like to keep the mechanical
, op_error
, and employee
columns, since this shows the cause of the accident, but from the previous analysis there are about 8000 missing data points for these columns.
parks <- parks_full[c("acc_state", "bus_type", "device_category", "device_type", "num_injured", "category")]
NA
s from num_injured
Since the goal of this data analysis is to determine how many people are injured on park rides, I do not see a need for the two accidents that have a recorded NA
for num_injured
. So I have subseted these two accidents records below and removed them from the parks
dataframe.
subset(parks, is.na(parks$num_injured))
## # A tibble: 2 x 6
## acc_state bus_type device_category device_type num_injured category
## <fct> <fct> <fct> <fct> <dbl> <fct>
## 1 NJ Amusement pa… cars & track rid… Track ride NA Derailm…
## 2 NH Carnival or … coaster Coaster - fami… NA Derailm…
parks <- subset(parks, ! is.na(parks$num_injured))
parks <- parks %>% mutate(acc_state_t5 = case_when(acc_state == "CA" ~ "CA", acc_state == "PA" ~ "PA", acc_state == "FL" ~ "FL", acc_state == "TX" ~ "TX", acc_state == "OK" ~ "OK"))
parks$acc_state_t5 <- factor(ifelse(is.na(parks$acc_state_t5), "other", parks$acc_state_t5), levels = c("CA", "PA", "FL", "TX", "OK", "other"))
Bonus points (5 points) for datasets that require merging of tables, but only if you reason through whether you should use
left_join
,inner_join
, orright_join
on these tables. No credit will be provided if you don’t.
Show your transformed table here. Use tools such as
glimpse()
,skim()
orhead()
to illustrate your point.
skim(parks)
Name | parks |
Number of rows | 8349 |
Number of columns | 7 |
_______________________ | |
Column type frequency: | |
factor | 6 |
numeric | 1 |
________________________ | |
Group variables | None |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
acc_state | 0 | 1 | FALSE | 40 | CA: 2997, PA: 1883, FL: 974, TX: 551 |
bus_type | 0 | 1 | FALSE | 17 | Amu: 3666, Wat: 1767, Car: 700, Tra: 698 |
device_category | 0 | 1 | FALSE | 21 | wat: 1647, coa: 1183, spi: 914, tra: 698 |
device_type | 0 | 1 | FALSE | 91 | Coa: 863, Tra: 677, Go-: 620, Tub: 511 |
category | 0 | 1 | FALSE | 49 | Imp: 1114, Loa: 939, Bod: 688, Fal: 526 |
acc_state_t5 | 0 | 1 | FALSE | 6 | CA: 2997, PA: 1883, oth: 1540, FL: 974 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
num_injured | 0 | 1 | 1.05 | 0.71 | 0 | 1 | 1 | 1 | 30 | ▇▁▁▁▁ |
summary(parks)
## acc_state bus_type device_category
## CA :2997 Amusement park :3666 water slide :1647
## PA :1883 Water park :1767 coaster :1183
## FL : 974 Carnival or rental : 700 spinning : 914
## TX : 551 Trampoline park : 698 trampoline : 698
## NJ : 519 Family entertainment center: 484 go-kart : 620
## OK : 404 Go kart track : 228 cars & track rides: 585
## (Other):1021 (Other) : 806 (Other) :2702
## device_type num_injured
## Coaster - steel : 863 Min. : 0.000
## Trampoline court : 677 1st Qu.: 1.000
## Go-kart : 620 Median : 1.000
## Tube slide : 511 Mean : 1.053
## Aquatic play area: 333 3rd Qu.: 1.000
## Track ride : 302 Max. :30.000
## (Other) :5043
## category acc_state_t5
## Impact: hit something in participatory attraction:1114 CA :2997
## Load/Unload: scrape or stumble : 939 PA :1883
## Body pain (normal motion) : 688 FL : 974
## Fall: patron fell off inner tube, mat or board : 526 TX : 551
## Impact: hit something within ride vehicle : 519 OK : 404
## Illness or neurological symptoms : 427 other:1540
## (Other) :4136
Are the values what you expected for the variables? Why or Why not?
The values are somewhat I expected. For the accident state, I am surprised that there are park accidents in only 40 states. I expected California and Florida to have the highest number of accidents, while they are in the top 3, I am surprised that California has 3 times as many accidents as Florida and that Pennsylvania as about twice that of Florida and two-thirds that of California.
I was thinking that the number of accidents would be the highest for water parks and water slides. For the type of business, amusement parks have about twice as much accidents as water parks; which isn’t surprising but not what I expected. While the most dangerous device category is the water slide, it is closely followed by coaster and spinning rides. Lastly, the more defined device type shows that steel coasters are the single most dangerous rides, that is somewhat surprising. More surprising is that trampoline courts and go-karts have slightly less accidents than a roller coaster. I was also surprised that there were not any water slide accidents before the 4th most accident, which is for tube slides.
Lastly, a vast majority of the injuries are of one person. With a maximum of 30 injuries.Use
group_by()/summarize()
to make a summary of the data here. The summary should be relevant to your research question
totInj <- sum(parks$num_injured)
parks_summary_DCat <- parks %>% group_by(device_category) %>% summarise(sInj = sum(num_injured), pInj = round(sInj/totInj, 3) * 100) %>% arrange(desc(pInj)) %>% ungroup
## `summarise()` ungrouping output (override with `.groups` argument)
parks_summary_DCat
## # A tibble: 21 x 3
## device_category sInj pInj
## <fct> <dbl> <dbl>
## 1 water slide 1690 19.2
## 2 coaster 1218 13.9
## 3 spinning 1070 12.2
## 4 trampoline 698 7.9
## 5 cars & track rides 661 7.5
## 6 go-kart 648 7.4
## 7 water ride 540 6.1
## 8 inflatable 489 5.6
## 9 challenge activity 367 4.2
## 10 aquatic play 337 3.8
## # … with 11 more rows
< p class = “response”>The first summary shows that looking at what device categories cause 10% of park injuries in all of the data the top device types are: water slide (1690; 19.2%), coaster (1218; 13.9%) and spinning (1070; 12.2%). This was somewhat expected based on the accident counts of the data entered.
for (state in unique(parks$acc_state_t5)) {
if (state != "other") {
parks_sub <- parks[parks$acc_state_t5 == state, ]
totInj_state <- sum(parks_sub[ , "num_injured"])
parks_plot_sum <- parks_sub %>% group_by(device_category) %>% summarise(sInj = sum(num_injured), pInj = round(sInj/totInj_state, 3) * 100) %>% arrange(desc(sInj)) %>% ungroup()
print(parks_plot_sum)
}
}
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 14 x 3
## device_category sInj pInj
## <fct> <dbl> <dbl>
## 1 coaster 737 24
## 2 water slide 550 17.9
## 3 cars & track rides 472 15.4
## 4 spinning 381 12.4
## 5 water ride 379 12.4
## 6 wave device 121 3.9
## 7 float attraction 94 3.1
## 8 aquatic play 87 2.8
## 9 vertical drop 81 2.6
## 10 other attraction 78 2.5
## 11 pendulum 47 1.5
## 12 challenge activity 34 1.10
## 13 trampoline 3 0.1
## 14 inflatable 1 0
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 16 x 3
## device_category sInj pInj
## <fct> <dbl> <dbl>
## 1 go-kart 385 38.3
## 2 water slide 202 20.1
## 3 challenge activity 191 19
## 4 spinning 83 8.3
## 5 wave device 24 2.4
## 6 other attraction 23 2.3
## 7 aquatic play 22 2.20
## 8 play equipment 18 1.8
## 9 pendulum 17 1.7
## 10 coaster 13 1.3
## 11 cars & track rides 12 1.2
## 12 water ride 5 0.5
## 13 inflatable 4 0.4
## 14 float attraction 3 0.3
## 15 unknown 1 0.1
## 16 vertical drop 1 0.1
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 17 x 3
## device_category sInj pInj
## <fct> <dbl> <dbl>
## 1 inflatable 133 31.4
## 2 spinning 56 13.2
## 3 go-kart 42 9.9
## 4 coaster 36 8.5
## 5 water slide 25 5.90
## 6 trampoline 21 5
## 7 pendulum 18 4.3
## 8 play equipment 17 4
## 9 aquatic play 16 3.8
## 10 other attraction 16 3.8
## 11 water ride 13 3.1
## 12 cars & track rides 12 2.8
## 13 float attraction 9 2.1
## 14 challenge activity 6 1.4
## 15 unknown 1 0.2
## 16 vertical drop 1 0.2
## 17 wave device 1 0.2
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 13 x 3
## device_category sInj pInj
## <fct> <dbl> <dbl>
## 1 water slide 285 50.5
## 2 coaster 82 14.5
## 3 challenge activity 52 9.2
## 4 go-kart 45 8
## 5 spinning 30 5.3
## 6 water ride 26 4.6
## 7 wave device 17 3
## 8 aquatic play 9 1.6
## 9 cars & track rides 9 1.6
## 10 pendulum 4 0.7
## 11 other attraction 2 0.4
## 12 vertical drop 2 0.4
## 13 inflatable 1 0.2
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 21 x 3
## device_category sInj pInj
## <fct> <dbl> <dbl>
## 1 trampoline 558 29.3
## 2 water slide 285 14.9
## 3 coaster 208 10.9
## 4 aquatic play 166 8.7
## 5 spinning 153 8
## 6 inflatable 88 4.6
## 7 water ride 76 4
## 8 cars & track rides 62 3.3
## 9 go-kart 56 2.9
## 10 play equipment 42 2.20
## # … with 11 more rows
rm(state)
rm(parks_sub)
rm(totInj_state)
rm(parks_plot_sum)
What are your findings about the summary? Are they what you expected?
For the states I defined the top injuries as over 200 injuries or over 10% injuries in state. In California the top injuries were the: coaster (737; 24.0%), water slide (550; 17.9%), cars & track rides (472; 15.4%), spinning (381; 12.4%, and water ride (379, 12.4%). While in Florida, the most dangerous park device types are: go-kart (385; 38.3%) and water slide (202; 20.1%). In Oklahoma, none of the devices have over 200 injuries, but the top are: inflatable (133; 31.4%) and spinning (56, 13.2%). In Texas only one device type has over 200 injuries: the water slide (285; 50.5%). Lastly, in Pennsylvania the most dangerous device types are: trampoline (558; 29.3%), water slide (285; 14.9%) and coaster (208; 10.9%).
Make at least two plots that help you answer your question on the transformed or summarized data.
parks$device_category <- factor(parks$device_category, levels = unique(parks_summary_DCat$device_category[order(parks_summary_DCat$sInj)]))
barplot_all <- ggplot(parks) + aes(y = device_category, count = num_injured, fill = device_category) + geom_bar() + labs(title = "Most Dangerous Rides in all States", y = NULL) + theme(legend.position = "none")
print(barplot_all)
The bar plot above shows the comparison of the injury counts by device type. With all of the states data, the interesting thing about the data is that the graph looks somewhat like an exponential curve.
for (state in unique(parks$acc_state_t5)) {
if (state != "other") {
parks_sub <- parks[parks$acc_state_t5 == state, ]
totInj_state <- sum(parks_sub[ , "num_injured"])
parks_plot_sum <- parks_sub %>% group_by(device_category) %>% summarise(sInj = sum(num_injured), pInj = sInj/totInj_state) %>% arrange(desc(sInj)) %>% ungroup()
parks_sub$device_category <- factor(parks_sub$device_category, levels = unique(parks_plot_sum$device_category[order(parks_plot_sum$sInj)]))
print(ggplot(parks_sub) + aes(y = device_category, count = num_injured, fill = device_category) + geom_bar() + xlim(0, 750) + labs(title = paste("Most Dangerous Rides in", state), y = NULL) + theme(legend.position = "none"))
}
}
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)
rm(state)
rm(parks_sub)
rm(totInj_state)
rm(parks_plot_sum)
The California bar plot shows that there are less types devices that cause injuries; there are a few device types that have a high injury count and the rest are much lower.
The Florida bar plot shows that most of the data is below 50% of the max injuries (California reaches the max above 700 for coasters). Additionally, the injury count is surprisingly low in Florida.
The Oklahoma bar plot shows that there are more device types that cause injuries, than other states.
Similar to California, the Texas bar plot shows that there a limited number of device types that cause injuries. Also, the plot shows just how many more injuries occur in water parks in Texas.
Lastly, the Pennsylvania plot shows why Pennsylvania made it into the top 5 states, mostly due to the number of trampoline injuries.
parks_summary <- parks[parks$acc_state_t5 != "other", ] %>% group_by(acc_state_t5, device_category) %>% summarise(sInj = sum(num_injured)) %>% ungroup
## `summarise()` regrouping output by 'acc_state_t5' (override with `.groups` argument)
ggplot(parks_summary) + aes(y = device_category, x = acc_state_t5, fill = sInj) + geom_tile() + labs(title = "Heat Map of Injuries by Device Type versus State", y = NULL, x = "States with the Top 5 Injuries", fill = "Number of Injuries")
The heat map shows the data similarly to the other summaries and plots. The water slide has a lighter blue number of injuries in California and Texas; the coaster, spinning, car and track rides, and water rides types only have high injuries in California; the trampoline is only high injury count in New Jersey; go-karts are most dangerous in Florida; and moderate number of injuries are challenge activities in Florida. Comparing all of the injuries to each other, Oklahoma has mostly dark blue coloration. While California has a lot of injury counts for device types above ~400.
Summarize your research question and findings below.
My research question was to analyze the data to determine the most dangerous park ride and to see if the most dangerous park rides are similar across states. My findings show that overall the device types with the most injuries are: water slides, coasters, and spinning. As expected, water slides made it in the top injury device sources for California, Texas, Pennsylvania, and Florida. Coasters also made it in the top injury device sources for only California and Pennsylvania. My findings also show that the device types that have the most injuries are very different between states. Florida was the only state to have go-karts in the top device type injuries. Similarly, Pennsylvania was the only state to have trampoline devices as a top injury source. Lastly, only California has a high number of injuries due to spinning, cars & track rides, and water rides.
Are your findings what you expected? Why or Why not?
I expected some of the initial analysis of number of injuries by device type, I thought water slides would be the most dangerous. But, I did not expect the difference by the states. I expected that Florida would have a high number of injuries with all of the theme parks there I expected more water related rides, coaster and big amusement park rides, and challenge course rides. I was shocked go-karts was the highest number of injuries in Florida. (I also thought Florida was just more crazy and would have more injuries, I joked about that with Ted at our one on one to go over the data.) I did not expect California to have so many more injuries than Florida, seeing as how the theme parks in California are smaller than Florida. I did expect that water rides and coaster rides would rank high in California. I was very surprised that Pennsylvania had the second most injuries; also that that was a high number of trampoline injuries in PA. I thought Texas could be close to having a top ranking injuries, but I was moderately surprised that water slides was the largest injury source. Lastly, I was surprised that Oklahoma made it in the top 5 number of injuries per state. But, even more surprised that Oklahoma had so many more device categories as compared to theme park travel destinations like California, Florida, and Texas.
Since water slides and coasters were the most dangerous device types overall, I thought more of the states would have a high amount of water slide and coaster injuries. Water slides did top list of injuries in California and Texas, while coming in second in Florida and Pennsylvania. Coasters were also in the top three injury sources in California, Texas, and Pennsylvania. Surprisingly, go-karts were the most dangerous device type in Florida and trampolines the most dangerous in Pennsylvania.