我有一个棒球逐个播放数据的数据集。这是一个简化的例子:
team <- c('A','A','A','A','A','A','A',
'B','B','B','B','B','B','B',
'C','C','C','C','C','C','C')
event <- c("OUT","WALK","OUT","OUT","HR","WALK","OUT",
"WALK","OUT","HR","WALK","OUT","OUT","WALK",
"HR","HR","WALK","WALK","HR","OUT","WALK")
df <- data.frame(team, event)
df
team event
1 A OUT
2 A WALK
3 A OUT
4 A OUT
5 A HR
6 A WALK
7 A OUT
8 B WALK
9 B OUT
10 B HR
11 B WALK
12 B OUT
13 B OUT
14 B WALK
15 C HR
16 C HR
17 C WALK
18 C WALK
19 C HR
20 C OUT
21 C WALK
我想创建一个新数据框,显示每个团队每个事件发生的次数,每个事件由一个新列表示,如下所示:
team OUT WALK HR
1 A 4 2 1
2 B 3 3 1
3 C 1 3 3
我认为必须有一种方法可以使用dplyr
执行此操作,但我无法弄明白。
答案 0 :(得分:1)
我们可以尝试使用dplyr/tidyr
。获取count
基于&#39;&#39;,&#39;事件&#39;和spread
来自&#39; long&#39;广泛&#39;
library(tidyverse)
df %>%
count(team, event) %>%
spread(event, n)
# A tibble: 3 × 4
# team HR OUT WALK
#* <fctr> <int> <int> <int>
#1 A 1 4 2
#2 B 1 3 3
#3 C 3 1 3
如果我们需要订购列,请转换&#39;事件&#39;将factor
指定为&{39;事件&#39;的levels
个元素的unique
第一
df %>%
mutate(event = factor(event, levels = unique(event))) %>%
count(team, event) %>%
spread(event, n)
# A tibble: 3 × 4
# team OUT WALK HR
#* <fctr> <int> <int> <int>
#1 A 4 2 1
#2 B 3 3 1
#3 C 1 3 3
或dcast
data.table
library(data.table)
dcast(setDT(df), team~event, length)
来自table
base R
table(df)