如何根据第三个变量更新频率表?

时间:2014-07-25 11:03:43

标签: r frequency

我有一个数据框,汇总了三年的不同类序列(2006,2007,2008),变量计数表示类序列的出现次数:

df<-data.frame(count=c(150,33,35,26,15,65), 
           Y.2006=c("a","a","a","d","d","d"), 
           Y.2007=c("a","b","b","c","c","c"),
           Y.2008=c("a","b","a","c","c","d") )

看起来像:

  count Y.2006 Y.2007 Y.2008 
1   150      a      a      a     
2    33      a      b      b        
3    35      a      b      a        
4    26      d      c      c        
5    15      d      c      c         
6    65      d      c      d         

我计算2006-2007的序列:

df$Y.2006_2007<-paste(df$Y.2006, df$Y.2007)

结果:

  count Y.2006 Y.2007 Y.2008 Y.2006_2007
1   150      a      a      a         a a
2    33      a      b      b         a b
3    35      a      b      a         a b
4    26      d      c      c         d c
5    15      d      c      c         d c
6    65      d      c      d         d c

我想要2008年不同类别的Y.2006_2007的出现次数。 所以我这样做:

table(df$Y.2006_2007, df$Y.2008)

导致:

      a b c d
  a a 1 0 0 0
  a b 1 1 0 0
  d c 0 0 2 1

现在,我想基于初始数据帧(df)的count变量更新这些结果,以考虑类序列的出现次数。所以我想:

      a   b   c   d
a a 150   0   0   0
a b  35  33   0   0
d c   0   0  41  65

我无法弄清楚如何实现这一结果。有什么想法吗?

3 个答案:

答案 0 :(得分:3)

您可以使用xtabs与基础R执行此操作:

xtabs(formula = count ~ Y.2006_2007 + Y.2008, data = df)

              Y.2008
  Y.2006_2007   a   b   c   d
          a a 150   0   0   0
          a b  35  33   0   0
          d c   0   0  41  65

如果您需要data.frame,可以将其包装在as.data.frame

as.data.frame(xtabs(formula = count ~ Y.2006_2007 + Y.2008, data = df))
   Y.2006_2007 Y.2008 Freq
1          a a      a  150
2          a b      a   35
3          d c      a    0
4          a a      b    0
5          a b      b   33
6          d c      b    0
7          a a      c    0
8          a b      c    0
9          d c      c   41
10         a a      d    0
11         a b      d    0
12         d c      d   65

答案 1 :(得分:2)

dcast

library(reshape2)
dcast(df, Y.2006_2007 ~ Y.2008, sum, value.var = "count")

##   Y.2006_2007   a  b  c  d
## 1         a a 150  0  0  0
## 2         a b  35 33  0  0
## 3         d c   0  0 41 65

答案 2 :(得分:0)

使用dplyr

library(dplyr)
library(tidyr)
df %>% 
unite(Y.2006_2007, Y.2006, Y.2007) %>% #created column Y.2006-2007 
group_by(Y.2006_2007, Y.2008) %>%
summarize(count=sum(count)) %>%
spread(Y.2008, count, fill=0)
# Source: local data frame [3 x 5]

#   Y.2006_2007   a  b  c  d
# 1         a_a 150  0  0  0
# 2         a_b  35 33  0  0
# 3         d_c   0  0 41 65