我有一个数据框,汇总了三年的不同类序列(2006,2007,2008),变量计数表示类序列的出现次数:
df<-data.frame(count=c(150,33,35,26,15,65),
Y.2006=c("a","a","a","d","d","d"),
Y.2007=c("a","b","b","c","c","c"),
Y.2008=c("a","b","a","c","c","d") )
看起来像:
count Y.2006 Y.2007 Y.2008
1 150 a a a
2 33 a b b
3 35 a b a
4 26 d c c
5 15 d c c
6 65 d c d
我计算2006-2007的序列:
df$Y.2006_2007<-paste(df$Y.2006, df$Y.2007)
结果:
count Y.2006 Y.2007 Y.2008 Y.2006_2007
1 150 a a a a a
2 33 a b b a b
3 35 a b a a b
4 26 d c c d c
5 15 d c c d c
6 65 d c d d c
我想要2008年不同类别的Y.2006_2007的出现次数。 所以我这样做:
table(df$Y.2006_2007, df$Y.2008)
导致:
a b c d
a a 1 0 0 0
a b 1 1 0 0
d c 0 0 2 1
现在,我想基于初始数据帧(df)的count变量更新这些结果,以考虑类序列的出现次数。所以我想:
a b c d
a a 150 0 0 0
a b 35 33 0 0
d c 0 0 41 65
我无法弄清楚如何实现这一结果。有什么想法吗?
答案 0 :(得分:3)
您可以使用xtabs
与基础R执行此操作:
xtabs(formula = count ~ Y.2006_2007 + Y.2008, data = df)
Y.2008
Y.2006_2007 a b c d
a a 150 0 0 0
a b 35 33 0 0
d c 0 0 41 65
如果您需要data.frame,可以将其包装在as.data.frame
:
as.data.frame(xtabs(formula = count ~ Y.2006_2007 + Y.2008, data = df))
Y.2006_2007 Y.2008 Freq
1 a a a 150
2 a b a 35
3 d c a 0
4 a a b 0
5 a b b 33
6 d c b 0
7 a a c 0
8 a b c 0
9 d c c 41
10 a a d 0
11 a b d 0
12 d c d 65
答案 1 :(得分:2)
或dcast
library(reshape2)
dcast(df, Y.2006_2007 ~ Y.2008, sum, value.var = "count")
## Y.2006_2007 a b c d
## 1 a a 150 0 0 0
## 2 a b 35 33 0 0
## 3 d c 0 0 41 65
答案 2 :(得分:0)
使用dplyr
library(dplyr)
library(tidyr)
df %>%
unite(Y.2006_2007, Y.2006, Y.2007) %>% #created column Y.2006-2007
group_by(Y.2006_2007, Y.2008) %>%
summarize(count=sum(count)) %>%
spread(Y.2008, count, fill=0)
# Source: local data frame [3 x 5]
# Y.2006_2007 a b c d
# 1 a_a 150 0 0 0
# 2 a_b 35 33 0 0
# 3 d_c 0 0 41 65