Midterm (Due 2/12/2021 at 11:55 pm)

Please submit your .Rmd and .html files in Sakai. If you are working together, both people should submit the files.

60 / 60 points total

The goal of the midterm project is to showcase skills that you have learned in class so far. The midterm is open note, but if you use someone else’s code, you must attribute them.

# This code came from 

Before you get Started

  1. Pick a dataset. Ideally, the dataset should be around 2000 rows, and should have both categorical and numeric covariates.

Potential Sources for data: Tidy Tuesday: https://github.com/rfordatascience/tidytuesday

  • Note that most of these are .csv files. There is code to load the files from csv for each of the datasets and a short description of the variables, or you can upload the .csv file into your data folder.

You may use another dataset or your own data, but please make sure it is de-identified.

  1. Please schedule a time with Eric or Me to discuss your dataset and research question. We just want to look at the data and make sure that it is appropriate for your question.

Working Together

If you’d like to work together, that is encouraged, but you must divide the work equitably and you must note who worked on what. This is probably easiest as notes in the text. Please let Eric or Me know that you’ll be working together.

No acknowledgements of contributions = -10 points overall.

Please Note

I will take off points (-5 points for each section) if you don’t add observations and notes in your RMarkdown document. I want you to think and reason through your analysis, even if they are preliminary thoughts.

Define Your Research Question (10 points)

Define your research question below. What about the data interests you? What is a specific question you want to find out about the data?

I am interested in what the most dangerous park ride and how that relates by state. My specific research question is: Between the states, is there are a difference between the most dangerous park rides?

Given your question, what is your expectation about the data?

I expect the data to show that the most dangerous park rides are different by state, this is due to local safety regulations and climate of the state.

Loading the Data (10 points)

Libraries

library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(skimr)
library(ggplot2)

Initial Data

Load the data below and use dplyr::glimpse() or skimr::skim() on the data. You should upload the data file into the data directory.

parks <- read_csv("data/saferparks.csv")
## 
## ── Column specification ────────────────────────────────────────────────────────
## cols(
##   .default = col_character(),
##   acc_id = col_double(),
##   num_injured = col_double(),
##   age_youngest = col_double(),
##   mechanical = col_double(),
##   op_error = col_double(),
##   employee = col_double()
## )
## ℹ Use `spec()` for the full column specifications.
glimpse(parks)
## Rows: 8,351
## Columns: 23
## $ acc_id               <dbl> 1005813, 1004032, 1007658, 1007098, 1000094, 100…
## $ acc_date             <chr> "6/12/2010", "6/12/2010", "7/10/2010", "7/10/201…
## $ acc_state            <chr> "OH", "OH", "CA", "CA", "CO", "WI", "WI", "CO", …
## $ acc_city             <chr> "Cleveland", "Cleveland", "Anaheim", "Carlsbad",…
## $ fix_port             <chr> "F", "P", "F", "F", "F", "F", "P", "F", "P", "F"…
## $ source               <chr> "Ohio Dept. of Agriculture", "United States Cons…
## $ bus_type             <chr> "Sports or recreation facility", "Sports or recr…
## $ industry_sector      <chr> "recreation", "recreation", "amusement ride", "w…
## $ device_category      <chr> "inflatable", "inflatable", "water ride", "float…
## $ device_type          <chr> "Inflatable slide", "Inflatable slide", "Boat ri…
## $ tradename_or_generic <chr> "inflatable slide", "inflatable slide", "boat ri…
## $ manufacturer         <chr> "Scherba Industries / Inflatable Images", "Scher…
## $ num_injured          <dbl> 9, 8, 1, 1, 1, 1, 1, 20, 1, 1, 2, 1, 1, 1, 1, 1,…
## $ age_youngest         <dbl> NA, 54, 37, 37, NA, 12, 16, NA, 14, NA, 16, 36, …
## $ gender               <chr> NA, "M", "F", "F", "M", "F", "F", NA, "M", NA, "…
## $ acc_desc             <chr> "Inflatable slide tipped over while 7-9 patrons …
## $ injury_desc          <chr> "The man who was crushed by the device died 9 da…
## $ report               <chr> "https://saferparksdata.org/sites/default/files/…
## $ category             <chr> "Device tipped over, blew away, or collapsed", "…
## $ mechanical           <dbl> NA, NA, NA, NA, 1, NA, 1, NA, NA, NA, 1, NA, NA,…
## $ op_error             <dbl> 1, 1, NA, NA, NA, 1, NA, 1, NA, NA, NA, NA, NA, …
## $ employee             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
## $ notes                <chr> "http://www.cleveland.com/metro/index.ssf/2012/1…
skim(parks)
Data summary
Name parks
Number of rows 8351
Number of columns 23
_______________________
Column type frequency:
character 17
numeric 6
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
acc_date 0 1.00 8 10 0 1845 0
acc_state 0 1.00 2 2 0 40 0
acc_city 118 0.99 4 20 0 674 0
fix_port 0 1.00 1 1 0 3 0
source 0 1.00 12 57 0 30 0
bus_type 0 1.00 4 29 0 17 0
industry_sector 0 1.00 7 14 0 4 0
device_category 0 1.00 7 23 0 21 0
device_type 0 1.00 4 26 0 91 0
tradename_or_generic 0 1.00 4 32 0 407 0
manufacturer 3310 0.60 2 40 0 253 0
gender 728 0.91 1 1 0 4 0
acc_desc 3 1.00 4 1258 0 8023 0
injury_desc 10 1.00 4 367 0 3985 0
report 8273 0.01 77 86 0 77 0
category 0 1.00 5 54 0 49 0
notes 8290 0.01 9 675 0 41 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
acc_id 0 1.00 1.005e+06 3126.04 920315 1002160 1005414 1007676 1009907 ▁▁▁▁▇
num_injured 2 1.00 1.050e+00 0.71 0 1 1 1 30 ▇▁▁▁▁
age_youngest 684 0.92 2.460e+01 18.28 0 10 18 38 92 ▇▃▃▁▁
mechanical 7977 0.04 1.000e+00 0.00 1 1 1 1 1 ▁▁▇▁▁
op_error 8192 0.02 1.000e+00 0.00 1 1 1 1 1 ▁▁▇▁▁
employee 8306 0.01 1.000e+00 0.00 1 1 1 1 1 ▁▁▇▁▁

Data Observations

If there are any quirks that you have to deal with NA coded as something else, or it is multiple tables, please make some notes here about what you need to do before you start transforming the data in the next section.

Make sure your data types are correct!

Examining the data uploaded, it looks like NA is coded correctly, but some of the data is not stored as the correct type. Of the data that is not coded corectly is: acc_date should be coded as a date and all of the following should be coded as factors: fix_port, industry_sector, and gender. Lastly, there are two NA values for the num_injured, this may create errors in later analysis so these accidents may be removed.

Transforming the data (15 points)

Recode Columns

If the data needs to be transformed in any way (values recoded, pivoted, etc), do it here. Examples include transforming a continuous variable into a categorical using case_when(), etc.

parks$acc_date <- as.Date(parks$acc_date, format = "%m/%d/%Y")
parks$acc_state <- factor(parks$acc_state)
parks$fix_port <- factor(parks$fix_port)
parks$industry_sector <- factor(parks$industry_sector)
parks$gender <- factor(parks$gender)

I also decided to recode the columns bus_type, device_category, device_type and category as factors. I did this because I think it will make it easier to answer my research question.

Additional Columns Recode

parks$bus_type <- factor(parks$bus_type)
parks$device_category <- factor(parks$device_category)
parks$device_type <- factor(parks$device_type)
parks$category <- factor(parks$category)

Subset Data

Subset of the data and think of a question to answer the subset

To keep my data straight and in case of mistakes, I have set parks_full as the dataframe that is uploaded and as the corrected data types. The new parks dataframe will be the one that will be further analyzed.

parks_full <- parks

Remove Unnecessary Columns

I decided to keep the following columns for my analysis: acc_date, acc_state, bus_type, device_category, device_type, num_injured, age_youngest, category. While I realized I no longer need the following columns: acc_id, acc_city, fix_port, source, industry_sector, tradename_or_generic, manufacturer, gender, acc_desc, injury_desc, report, mechanical, op_error, employee, notes. I would have like to keep the mechanical, op_error, and employee columns, since this shows the cause of the accident, but from the previous analysis there are about 8000 missing data points for these columns.

parks <- parks_full[c("acc_state", "bus_type", "device_category", "device_type", "num_injured", "category")]

Remove NAs from num_injured

Since the goal of this data analysis is to determine how many people are injured on park rides, I do not see a need for the two accidents that have a recorded NA for num_injured. So I have subseted these two accidents records below and removed them from the parks dataframe.

subset(parks, is.na(parks$num_injured))
## # A tibble: 2 x 6
##   acc_state bus_type      device_category   device_type     num_injured category
##   <fct>     <fct>         <fct>             <fct>                 <dbl> <fct>   
## 1 NJ        Amusement pa… cars & track rid… Track ride               NA Derailm…
## 2 NH        Carnival or … coaster           Coaster - fami…          NA Derailm…
parks <- subset(parks, ! is.na(parks$num_injured))

Set Outcome as Top 5 Injury States

parks <- parks %>% mutate(acc_state_t5 = case_when(acc_state == "CA" ~ "CA", acc_state == "PA" ~ "PA", acc_state == "FL" ~ "FL", acc_state == "TX" ~ "TX", acc_state == "OK" ~ "OK"))
parks$acc_state_t5 <- factor(ifelse(is.na(parks$acc_state_t5), "other", parks$acc_state_t5), levels = c("CA", "PA", "FL", "TX", "OK", "other"))

Merging of Datasets

Bonus points (5 points) for datasets that require merging of tables, but only if you reason through whether you should use left_join, inner_join, or right_join on these tables. No credit will be provided if you don’t.

Review Transformed Data

Show your transformed table here. Use tools such as glimpse(), skim() or head() to illustrate your point.

skim(parks)
Data summary
Name parks
Number of rows 8349
Number of columns 7
_______________________
Column type frequency:
factor 6
numeric 1
________________________
Group variables None

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
acc_state 0 1 FALSE 40 CA: 2997, PA: 1883, FL: 974, TX: 551
bus_type 0 1 FALSE 17 Amu: 3666, Wat: 1767, Car: 700, Tra: 698
device_category 0 1 FALSE 21 wat: 1647, coa: 1183, spi: 914, tra: 698
device_type 0 1 FALSE 91 Coa: 863, Tra: 677, Go-: 620, Tub: 511
category 0 1 FALSE 49 Imp: 1114, Loa: 939, Bod: 688, Fal: 526
acc_state_t5 0 1 FALSE 6 CA: 2997, PA: 1883, oth: 1540, FL: 974

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
num_injured 0 1 1.05 0.71 0 1 1 1 30 ▇▁▁▁▁
summary(parks)
##    acc_state                           bus_type              device_category
##  CA     :2997   Amusement park             :3666   water slide       :1647  
##  PA     :1883   Water park                 :1767   coaster           :1183  
##  FL     : 974   Carnival or rental         : 700   spinning          : 914  
##  TX     : 551   Trampoline park            : 698   trampoline        : 698  
##  NJ     : 519   Family entertainment center: 484   go-kart           : 620  
##  OK     : 404   Go kart track              : 228   cars & track rides: 585  
##  (Other):1021   (Other)                    : 806   (Other)           :2702  
##             device_type    num_injured    
##  Coaster - steel  : 863   Min.   : 0.000  
##  Trampoline court : 677   1st Qu.: 1.000  
##  Go-kart          : 620   Median : 1.000  
##  Tube slide       : 511   Mean   : 1.053  
##  Aquatic play area: 333   3rd Qu.: 1.000  
##  Track ride       : 302   Max.   :30.000  
##  (Other)          :5043                   
##                                               category    acc_state_t5
##  Impact: hit something in participatory attraction:1114   CA   :2997  
##  Load/Unload: scrape or stumble                   : 939   PA   :1883  
##  Body pain (normal motion)                        : 688   FL   : 974  
##  Fall: patron fell off inner tube, mat or board   : 526   TX   : 551  
##  Impact: hit something within ride vehicle        : 519   OK   : 404  
##  Illness or neurological symptoms                 : 427   other:1540  
##  (Other)                                          :4136

Are the values what you expected for the variables? Why or Why not?

The values are somewhat I expected. For the accident state, I am surprised that there are park accidents in only 40 states. I expected California and Florida to have the highest number of accidents, while they are in the top 3, I am surprised that California has 3 times as many accidents as Florida and that Pennsylvania as about twice that of Florida and two-thirds that of California.

I was thinking that the number of accidents would be the highest for water parks and water slides. For the type of business, amusement parks have about twice as much accidents as water parks; which isn’t surprising but not what I expected. While the most dangerous device category is the water slide, it is closely followed by coaster and spinning rides. Lastly, the more defined device type shows that steel coasters are the single most dangerous rides, that is somewhat surprising. More surprising is that trampoline courts and go-karts have slightly less accidents than a roller coaster. I was also surprised that there were not any water slide accidents before the 4th most accident, which is for tube slides.

Lastly, a vast majority of the injuries are of one person. With a maximum of 30 injuries.

Visualizing and Summarizing the Data (15 points)

Summarizing the Data

Use group_by()/summarize() to make a summary of the data here. The summary should be relevant to your research question

Summary by Device Type

totInj <- sum(parks$num_injured)
parks_summary_DCat <- parks %>% group_by(device_category) %>% summarise(sInj = sum(num_injured), pInj = round(sInj/totInj, 3) * 100) %>% arrange(desc(pInj)) %>% ungroup
## `summarise()` ungrouping output (override with `.groups` argument)
parks_summary_DCat
## # A tibble: 21 x 3
##    device_category     sInj  pInj
##    <fct>              <dbl> <dbl>
##  1 water slide         1690  19.2
##  2 coaster             1218  13.9
##  3 spinning            1070  12.2
##  4 trampoline           698   7.9
##  5 cars & track rides   661   7.5
##  6 go-kart              648   7.4
##  7 water ride           540   6.1
##  8 inflatable           489   5.6
##  9 challenge activity   367   4.2
## 10 aquatic play         337   3.8
## # … with 11 more rows
< p class = “response”>The first summary shows that looking at what device categories cause 10% of park injuries in all of the data the top device types are: water slide (1690; 19.2%), coaster (1218; 13.9%) and spinning (1070; 12.2%). This was somewhat expected based on the accident counts of the data entered.

Summaries by Device Type and State

for (state in unique(parks$acc_state_t5)) {
  if (state != "other") {
    parks_sub <- parks[parks$acc_state_t5 == state, ]
    totInj_state <- sum(parks_sub[ , "num_injured"])
    parks_plot_sum <- parks_sub %>% group_by(device_category) %>% summarise(sInj = sum(num_injured), pInj = round(sInj/totInj_state, 3) * 100) %>% arrange(desc(sInj)) %>% ungroup()
    print(parks_plot_sum)
  }
}
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 14 x 3
##    device_category     sInj  pInj
##    <fct>              <dbl> <dbl>
##  1 coaster              737 24   
##  2 water slide          550 17.9 
##  3 cars & track rides   472 15.4 
##  4 spinning             381 12.4 
##  5 water ride           379 12.4 
##  6 wave device          121  3.9 
##  7 float attraction      94  3.1 
##  8 aquatic play          87  2.8 
##  9 vertical drop         81  2.6 
## 10 other attraction      78  2.5 
## 11 pendulum              47  1.5 
## 12 challenge activity    34  1.10
## 13 trampoline             3  0.1 
## 14 inflatable             1  0
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 16 x 3
##    device_category     sInj  pInj
##    <fct>              <dbl> <dbl>
##  1 go-kart              385 38.3 
##  2 water slide          202 20.1 
##  3 challenge activity   191 19   
##  4 spinning              83  8.3 
##  5 wave device           24  2.4 
##  6 other attraction      23  2.3 
##  7 aquatic play          22  2.20
##  8 play equipment        18  1.8 
##  9 pendulum              17  1.7 
## 10 coaster               13  1.3 
## 11 cars & track rides    12  1.2 
## 12 water ride             5  0.5 
## 13 inflatable             4  0.4 
## 14 float attraction       3  0.3 
## 15 unknown                1  0.1 
## 16 vertical drop          1  0.1
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 17 x 3
##    device_category     sInj  pInj
##    <fct>              <dbl> <dbl>
##  1 inflatable           133 31.4 
##  2 spinning              56 13.2 
##  3 go-kart               42  9.9 
##  4 coaster               36  8.5 
##  5 water slide           25  5.90
##  6 trampoline            21  5   
##  7 pendulum              18  4.3 
##  8 play equipment        17  4   
##  9 aquatic play          16  3.8 
## 10 other attraction      16  3.8 
## 11 water ride            13  3.1 
## 12 cars & track rides    12  2.8 
## 13 float attraction       9  2.1 
## 14 challenge activity     6  1.4 
## 15 unknown                1  0.2 
## 16 vertical drop          1  0.2 
## 17 wave device            1  0.2
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 13 x 3
##    device_category     sInj  pInj
##    <fct>              <dbl> <dbl>
##  1 water slide          285  50.5
##  2 coaster               82  14.5
##  3 challenge activity    52   9.2
##  4 go-kart               45   8  
##  5 spinning              30   5.3
##  6 water ride            26   4.6
##  7 wave device           17   3  
##  8 aquatic play           9   1.6
##  9 cars & track rides     9   1.6
## 10 pendulum               4   0.7
## 11 other attraction       2   0.4
## 12 vertical drop          2   0.4
## 13 inflatable             1   0.2
## `summarise()` ungrouping output (override with `.groups` argument)
## # A tibble: 21 x 3
##    device_category     sInj  pInj
##    <fct>              <dbl> <dbl>
##  1 trampoline           558 29.3 
##  2 water slide          285 14.9 
##  3 coaster              208 10.9 
##  4 aquatic play         166  8.7 
##  5 spinning             153  8   
##  6 inflatable            88  4.6 
##  7 water ride            76  4   
##  8 cars & track rides    62  3.3 
##  9 go-kart               56  2.9 
## 10 play equipment        42  2.20
## # … with 11 more rows
rm(state)
rm(parks_sub)
rm(totInj_state)
rm(parks_plot_sum)

What are your findings about the summary? Are they what you expected?

For the states I defined the top injuries as over 200 injuries or over 10% injuries in state. In California the top injuries were the: coaster (737; 24.0%), water slide (550; 17.9%), cars & track rides (472; 15.4%), spinning (381; 12.4%, and water ride (379, 12.4%). While in Florida, the most dangerous park device types are: go-kart (385; 38.3%) and water slide (202; 20.1%). In Oklahoma, none of the devices have over 200 injuries, but the top are: inflatable (133; 31.4%) and spinning (56, 13.2%). In Texas only one device type has over 200 injuries: the water slide (285; 50.5%). Lastly, in Pennsylvania the most dangerous device types are: trampoline (558; 29.3%), water slide (285; 14.9%) and coaster (208; 10.9%).

Visualisng the Data

Make at least two plots that help you answer your question on the transformed or summarized data.

Bar Plot of Most Dangerous Rides

parks$device_category <- factor(parks$device_category, levels = unique(parks_summary_DCat$device_category[order(parks_summary_DCat$sInj)]))
barplot_all <- ggplot(parks) + aes(y = device_category, count = num_injured, fill = device_category) + geom_bar() + labs(title = "Most Dangerous Rides in all States", y = NULL) + theme(legend.position = "none")
print(barplot_all)

The bar plot above shows the comparison of the injury counts by device type. With all of the states data, the interesting thing about the data is that the graph looks somewhat like an exponential curve.

Bar Plots of Most Dangerous Rides by State (in top 5 States)

for (state in unique(parks$acc_state_t5)) {
  if (state != "other") {
    parks_sub <- parks[parks$acc_state_t5 == state, ]
    totInj_state <- sum(parks_sub[ , "num_injured"])
    parks_plot_sum <- parks_sub %>% group_by(device_category) %>% summarise(sInj = sum(num_injured), pInj = sInj/totInj_state) %>% arrange(desc(sInj)) %>% ungroup()
    parks_sub$device_category <- factor(parks_sub$device_category, levels = unique(parks_plot_sum$device_category[order(parks_plot_sum$sInj)]))
    print(ggplot(parks_sub) + aes(y = device_category, count = num_injured, fill = device_category) + geom_bar() + xlim(0, 750) + labs(title = paste("Most Dangerous Rides in", state), y = NULL) + theme(legend.position = "none"))
    }
}
## `summarise()` ungrouping output (override with `.groups` argument)
## `summarise()` ungrouping output (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

## `summarise()` ungrouping output (override with `.groups` argument)

rm(state)
rm(parks_sub)
rm(totInj_state)
rm(parks_plot_sum)

The California bar plot shows that there are less types devices that cause injuries; there are a few device types that have a high injury count and the rest are much lower.

The Florida bar plot shows that most of the data is below 50% of the max injuries (California reaches the max above 700 for coasters). Additionally, the injury count is surprisingly low in Florida.

The Oklahoma bar plot shows that there are more device types that cause injuries, than other states.

Similar to California, the Texas bar plot shows that there a limited number of device types that cause injuries. Also, the plot shows just how many more injuries occur in water parks in Texas.

Lastly, the Pennsylvania plot shows why Pennsylvania made it into the top 5 states, mostly due to the number of trampoline injuries.

Heat Map of the Top 5 States versus Device Type

parks_summary <- parks[parks$acc_state_t5 != "other", ] %>% group_by(acc_state_t5, device_category) %>% summarise(sInj = sum(num_injured)) %>% ungroup
## `summarise()` regrouping output by 'acc_state_t5' (override with `.groups` argument)
ggplot(parks_summary) + aes(y = device_category, x = acc_state_t5, fill = sInj) + geom_tile() + labs(title = "Heat Map of Injuries by Device Type versus State", y = NULL, x = "States with the Top 5 Injuries", fill = "Number of Injuries")

The heat map shows the data similarly to the other summaries and plots. The water slide has a lighter blue number of injuries in California and Texas; the coaster, spinning, car and track rides, and water rides types only have high injuries in California; the trampoline is only high injury count in New Jersey; go-karts are most dangerous in Florida; and moderate number of injuries are challenge activities in Florida. Comparing all of the injuries to each other, Oklahoma has mostly dark blue coloration. While California has a lot of injury counts for device types above ~400.

Final Summary (10 points)

Summarize your research question and findings below.

My research question was to analyze the data to determine the most dangerous park ride and to see if the most dangerous park rides are similar across states. My findings show that overall the device types with the most injuries are: water slides, coasters, and spinning. As expected, water slides made it in the top injury device sources for California, Texas, Pennsylvania, and Florida. Coasters also made it in the top injury device sources for only California and Pennsylvania. My findings also show that the device types that have the most injuries are very different between states. Florida was the only state to have go-karts in the top device type injuries. Similarly, Pennsylvania was the only state to have trampoline devices as a top injury source. Lastly, only California has a high number of injuries due to spinning, cars & track rides, and water rides.

Are your findings what you expected? Why or Why not?

I expected some of the initial analysis of number of injuries by device type, I thought water slides would be the most dangerous. But, I did not expect the difference by the states. I expected that Florida would have a high number of injuries with all of the theme parks there I expected more water related rides, coaster and big amusement park rides, and challenge course rides. I was shocked go-karts was the highest number of injuries in Florida. (I also thought Florida was just more crazy and would have more injuries, I joked about that with Ted at our one on one to go over the data.) I did not expect California to have so many more injuries than Florida, seeing as how the theme parks in California are smaller than Florida. I did expect that water rides and coaster rides would rank high in California. I was very surprised that Pennsylvania had the second most injuries; also that that was a high number of trampoline injuries in PA. I thought Texas could be close to having a top ranking injuries, but I was moderately surprised that water slides was the largest injury source. Lastly, I was surprised that Oklahoma made it in the top 5 number of injuries per state. But, even more surprised that Oklahoma had so many more device categories as compared to theme park travel destinations like California, Florida, and Texas.

Since water slides and coasters were the most dangerous device types overall, I thought more of the states would have a high amount of water slide and coaster injuries. Water slides did top list of injuries in California and Texas, while coming in second in Florida and Pennsylvania. Coasters were also in the top three injury sources in California, Texas, and Pennsylvania. Surprisingly, go-karts were the most dangerous device type in Florida and trampolines the most dangerous in Pennsylvania.