我正在使用RStudio中的MLB Statcast数据,并试图计算每个投手的挥杆打击率(挥杆打击量除以投手投掷的总投球数)。这里的例子是一个示例数据框:
pitcher_name <- c('AJ Griffin','AJ Griffin','AJ Griffin','AJ Griffin','AJ Griffin',
'AJ Griffin','Adam Conley','Adam Conley','Adam Conley','Adam Conley',
'Adam Conley','Adam Conley')
description <- c('foul','swinging_strike','swinging_strike','swinging_strike_blocked',
'ball','hit_into_play','swinging_strike','swinging_strike',
'swinging_strike','swinging_strike_blocked','swinging_strike_blocked','ball')
pitch_analysis.data <- data.frame(pitcher_name, description)
最终目标是计算每个投手的挥动击球(挥动击球和挥动击球被阻挡),然后将此数字除以每个投手投掷的总投球数。因此,对于这个例子,最终答案应该是AJ Griffin的50%(3个挥杆超过6个球场)和Adam Conley的83%(5个挥杆超过6个球场)。我使用dplyr包尝试了以下命令:
P <- pitch_analysis.data %>% group_by(pitcher_name, description) %>% count(description)
这给了我每个描述的计数,但是我不知道如何使用dplyr来完成将两种类型的挥动打击分组在一起然后再除以每个类型的总节距数的最后一步投手。任何意见都将不胜感激,谢谢!
答案 0 :(得分:1)
使用dplyr
和stringr
软件包,您可以执行以下操作:
library(dplyr)
library(stringr)
P <- pitch_analysis.data %>%
group_by(pitcher_name) %>%
summarise(r=sum(str_detect(description,"swinging"))/n())
返回:
pitcher_name r
<fctr> <dbl>
1 Adam Conley 0.8333333
2 AJ Griffin 0.5000000
我们使用str_detect
在说明中检测到“摇摆”一词,并使用sum
计算观察到的行数。每组的总行数由n()
给出。
答案 1 :(得分:0)
这种方式如何仅使用dplyr?
pitch_analysis.data <- data_frame(pitcher_name, description)
pitch_analysis.data %>%
mutate(simplified_description=ifelse(description=="swinging_strike_blocked",
"swinging_strike", description)) %>%
group_by(pitcher_name, simplified_description) %>%
count(simplified_description)
Source: local data frame [6 x 3]
Groups: pitcher_name [?]
pitcher_name simplified_description n
<chr> <chr> <int>
1 Adam Conley ball 1
2 Adam Conley swinging_strike 5
3 AJ Griffin ball 1
4 AJ Griffin foul 1
5 AJ Griffin hit_into_play 1
6 AJ Griffin swinging_strike 3
答案 2 :(得分:0)
以下是使用data.table
library(data.table)
setDT(pitch_analysis.data)[, .(r = sum(grepl('swinging', description))/.N), pitcher_name]
# pitcher_name r
#1: AJ Griffin 0.5000000
#2: Adam Conley 0.8333333
或base R
使用rowsum
with(pitch_analysis.data, rowsum(+(grepl('swinging', description)),
pitcher_name)/tabulate(pitcher_name))
# [,1]
#Adam Conley 0.8333333
#AJ Griffin 0.5000000
或使用table/prop.table
prop.table(table(pitch_analysis.data[[1]], grepl('swinging',
pitch_analysis.data$description)), 1)[,2]