Please submit your .Rmd
and .html
files in Sakai. If you are working together, both people should submit the files.
The goal of the midterm project is to showcase skills that you have learned in class so far. The midterm is open note, but if you use someone else’s code, you must attribute them. Provide link to source where borrowed code came from as a comment next to the code.
Potential Sources for data: Tidy Tuesday: https://github.com/rfordatascience/tidytuesday
.csv
file into your data
folder.You may use another dataset or your own data, but please make sure it is de-identified.
If you’d like to work together, that is encouraged, but you must divide the work equitably and you must note who worked on what. This is probably easiest as notes in the text. Please let Eric or Me know that you’ll be working together.
No acknowledgements of contributions = -10 points overall.
The code chunks and information below were obtained from the R for Data readings and in class activities. If other acknowledgments are needed they will be noted right below my work for that section or code.
I will take off points (-5 points for each section) if you don’t add observations and notes in your RMarkdown document. I want you to think and reason through your analysis, even if they are preliminary thoughts.
Define your research question below. What about the data interests you? What is a specific question you want to find out about the data?
What items from have the highest and lowest buy value’s in the game of Animal Crossing New Horizons (ACNH)?
I am interested in this dataset because I play ACNH and I am curious which of the items available in are the most and least valuable to purchase in the game of ACNH.
Given your question, what is your expectation about the data?
I expect to find out which items are the most and least valuable items to buy in the game.
Load the data below and use
dplyr::glimpse()
orskimr::skim()
on the data. You should upload the data file into thedata
directory.
items <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-05/items.csv')
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## num_id = col_double(),
## id = col_character(),
## name = col_character(),
## category = col_character(),
## orderable = col_logical(),
## sell_value = col_double(),
## sell_currency = col_character(),
## buy_value = col_double(),
## buy_currency = col_character(),
## sources = col_character(),
## customizable = col_logical(),
## recipe = col_double(),
## recipe_id = col_character(),
## games_id = col_character(),
## id_full = col_character(),
## image_url = col_character()
## )
## Warning: 2 parsing failures.
## row col expected actual file
## 4472 customizable 1/0/T/F/TRUE/FALSE Yes 'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-05/items.csv'
## 4473 customizable 1/0/T/F/TRUE/FALSE Yes 'https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-05-05/items.csv'
If there are any quirks that you have to deal with
NA
coded as something else, or it is multiple tables, please make some notes here about what you need to do before you start transforming the data in the next section.
Make sure your data types are correct!
I am dealing with some data points that have ‘NA’ values for my buy_value variable as well as my sources variable and I want to clean them up by removing them entirely from my dataset because these ‘NA’ values are not relevant to my question. I will also need to pull out all of the buy_currency values that are equal to bells only since I am not interested in comparing items that require miles currency.
For the ‘NA’s’ and buy currency = bells I used the filter function. First I created a new dataset and called it items_filter. Then I specified buy_currency == “bells” to pull all my currency values that require bells for purchasing. I specified !is.na(buy_value) and !is.na(sources) to remove the ‘NA’ values from my buy_value and sources variables within the filter function.
I was able to pull the data I needed.
In order to subset my data even more I utilized the select() function specifying which variables I wanted to see in my new data frame. First I created another new dataset and named it items_select and then specified that I wanted to see only the name, category, and buy_value from my items_filter dataset.
I had to remove the DIY source values from this dataset to be able to look at my data because it was making my graphs come out very wrong. This did not affect the highest valued item as the source for that is Nook’s Cranny.
If the data needs to be transformed in any way (values recoded, pivoted, etc), do it here. Examples include transforming a continuous variable into a categorical using
case_when()
, etc.
Subset of the data and think of a question to answer the subset.
Bonus points (5 points) for datasets that require merging of tables, but only if you reason through whether you should use
left_join
,inner_join
, orright_join
on these tables. No credit will be provided if you don’t.
items_filter<-items%>%
arrange(sources,category)%>% #From Part 3 in class and,
filter(buy_currency == "bells",
!is.na(buy_value), #From R for Data Science.
!is.na(sources),
!(sources == "DIY")) #Intro to R and RStudio by Dr. Jessica Minnier
items_select<-items_filter%>%
select(name,category,sources,buy_value)
Show your transformed table here. Use tools such as
glimpse()
,skim()
orhead()
to illustrate your point.
glimpse(items_select)
## Rows: 605
## Columns: 4
## $ name <chr> "Bandage", "Bubblegum", "Butterfly Shades", "Cat Nose", "Cu…
## $ category <chr> "Accessories", "Accessories", "Accessories", "Accessories",…
## $ sources <chr> "Able Sisters", "Able Sisters", "Able Sisters", "Able Siste…
## $ buy_value <dbl> 140, 140, 1040, 560, 700, 770, 1100, 1100, 910, 1100, 490, …
Are the values what you expected for the variables? Why or Why not?
Yes these variables are exactly what I expected. I was able to specify the variables I wanted to see in my dataset and remove the ‘NA’ values that are not relevant to my question.
Use
group_by()/summarize()
to make a summary of the data here. The summary should be relevant to your research question
For this, because my question is what are the highest and lowest valued items for buy value I used slice_min and slice_max as suggested by Eric.
items_slice_max<-items_select%>% #using slice_min & slice_max suggestion from Eric our awesome TA.
slice_max(buy_value, n=10)%>%
arrange(desc(buy_value))
items_slice_max
## # A tibble: 10 x 4
## name category sources buy_value
## <chr> <chr> <chr> <dbl>
## 1 Open-frame Kitchen Furniture Nook's Cranny 140000
## 2 Lighthouse Furniture Nook Shopping 100000
## 3 Cotton-candy Stall Furniture Nook Shopping 60000
## 4 Acnh Nintendo Swit… Furniture Nook Shopping 35960
## 5 Acnh Nintendo Swit… Furniture Receive in mail if playing on ACNH S… 35960
## 6 Nintendo Switch Furniture Receive in mail on your second day 29980
## 7 Cute Bed Furniture Nook's Cranny 12000
## 8 Den Desk Furniture Nook's Cranny 10000
## 9 High-end Stereo Furniture Nook's Cranny 10000
## 10 Pyramid Furniture Gulliver 9200
items_slice_min<-items_select%>%
slice_min(buy_value, n=10, with_ties = FALSE) #with_ties=FALSE from help section on R.
items_slice_min
## # A tibble: 10 x 4
## name category sources buy_value
## <chr> <chr> <chr> <dbl>
## 1 Bandage Accessories Able Sisters 140
## 2 Bubblegum Accessories Able Sisters 140
## 3 Red Cosmos Flowers Find on ground 160
## 4 Red Hyacinths Flowers Find on ground 160
## 5 Red Lilies Flowers Find on ground 160
## 6 Red Mums Flowers Find on ground 160
## 7 Red Pansies Flowers Find on ground 160
## 8 Red Roses Flowers Find on ground 160
## 9 Red Tulips Flowers Find on ground 160
## 10 Red Windflowers Flowers Find on ground 160
What are your findings about the summary? Are they what you expected?
I would have expected higher prices for some of these max values. The reason is I have seen things that cost way more in the game. But this could be that the dataset is from some time in 2020 and the updates that have been ran in the game may affect the items and prices, that is to say there may be new items this list is not able to account for since the updates in the game. Also what is blowing me away is that my highest value item is not shown in this list. But appears in my graph below.
Make at least two plots that help you answer your question on the transformed or summarized data.
Most of these code chunks came from a combination of Part 3 and the Introduction to R and RStudio by Dr. Jessica Minnier. Eric helped me a website for scale_y_continuous to change my labels on the y-axis, and the rotation for the labels on the x-axis came from stackoverflow.
items_slice_max%>%
ggplot(aes(x=name, y=buy_value, fill=sources))+
geom_col()+
scale_y_continuous(labels = scales::dollar_format())+ #https://datavizpyr.com/dollar-format-for-axis-labels-with-ggplot2/
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ #The code to rotate the labels in my x-axis came from stackoverflow.
labs(title = "Items with Highest Buy Values",
x = "Item Names",
y = "Buy Value in Bells")
items_slice_min%>%
ggplot(aes(x=name, y=buy_value, fill=sources))+
geom_col()+
scale_y_continuous(labels = scales::dollar_format())+ #https://datavizpyr.com/dollar-format-for-axis-labels-with-ggplot2/
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))+ #The code to rotate the labels in my x-axis came from stackoverflow.
labs(title = "Items with Lowest Buy Values",
x = "Item Names",
y = "Buy Value in Bells")
The first graph above represents the top 10 valued items in the game ACNH and the second graph represents the 10 lowest valued items in the game ACNH. These graphs are color coordinated by the source of where to get the item from. Here Bells = $.
Summarize your research question and findings below.
The question I am trying to answer is, what are the items with the highest and lowest buy values in the game of ACNH? What I discovered is highest valued item is a Open Flamed Kitchen costing 140,000 Bells, and the lowest valued items are the Bandage and Bubblegum which cost 140 Bells. I was also able to show the top 10 and bottom 10 valued items in this game. This is exactly what I wanted to know.
Are your findings what you expected? Why or Why not?
This is exactly what I expected to find. I had to do some manipulation of the data because one of the items is listed 3 times in the dataset. After manipulating the data I was able to extract what I needed and my findings are what I expected them to be.