在Base R中聚合?

时间:2015-08-06 17:07:15

标签: r aggregate transpose

我有以下数据框:

df <- data.frame(ID = sample(1:100, 64),
                 Facility=rep(c("A","B","C","D"), each=4, times=4), 
                 MMWR = rep(c("1503","1504","1505", "1506"), 16), 
                 Age = sample(1:4, 64, replace=TRUE))


    ID  Facility    MMWR    Age
1   86  A   1503    4
2   85  A   1504    3
3   37  A   1505    1
4   77  A   1506    1
5   73  B   1503    2
6   40  B   1504    2
7   2   B   1505    3
8   97  B   1506    3
9   83  C   1503    4
10  80  C   1504    4
11  69  C   1505    3
12  93  C   1506    3
13  56  D   1503    3
14  12  D   1504    4
15  1   D   1505    2
16  72  D   1506    1
17  90  A   1503    1
18  95  A   1504    2
19  78  A   1505    4
20  98  A   1506    2
21  68  B   1503    2
22  38  B   1504    4
23  21  B   1505    2
24  3   B   1506    2
25  16  C   1503    4
26  74  C   1504    2
27  27  C   1505    4
28  6   C   1506    2
29  64  D   1503    1
30  59  D   1504    3
31  65  D   1505    3
32  53  D   1506    4
33  9   A   1503    1
34  22  A   1504    1
35  62  A   1505    1
36  26  A   1506    2
37  31  B   1503    3
38  100 B   1504    2
39  47  B   1505    1
40  36  B   1506    3
41  60  C   1503    3
42  18  C   1504    2
43  10  C   1505    3
44  51  C   1506    3
45  44  D   1503    3
46  54  D   1504    4
47  76  D   1505    3
48  67  D   1506    3
49  28  A   1503    1
50  58  A   1504    4
51  23  A   1505    1
52  71  A   1506    1
53  20  B   1503    3
54  32  B   1504    4
55  84  B   1505    4
56  33  B   1506    4
57  50  C   1503    1
58  61  C   1504    2
59  25  C   1505    3
60  91  C   1506    1
61  17  D   1503    2
62  81  D   1504    4
63  48  D   1505    4
64  24  D   1506    4

我想聚合(和/或转置?)以获得每个年龄段的数字。对于上面的数据集,我想要以下输出

    Facility    MMWR    Age 1   Age 2   Age 3   Age 4
1   A   1503    3   0   0   1
2   A   1504    1   1   1   1
3   A   1505    3   0   0   1
4   A   1506    2   2   0   0
5   B   1503    0   2   2   0
6   B   1504    0   2   0   2
7   B   1505    1   1   1   1
8   B   1506    0   1   2   1
9   C   1503    1   0   1   2
10  C   1504    0   3   0   1
11  C   1505    0   0   3   1
12  C   1506    1   1   2   0
13  D   1503    1   1   2   0
14  D   1504    0   0   1   3
15  D   1505    0   1   2   1
16  D   1506    1   0   1   2

请注意我只能使用BASE R!

我会在评论中提出长篇大论的解决方案,但我希望有人能给我更好的东西......

3 个答案:

答案 0 :(得分:4)

仅使用基数R,reshapeaggregate就可以了。

reshape(aggregate(ID ~ Facility + MMWR + Age, data = df, length), 
    idvar = c('Facility', 'MMWR'), 
    direction = 'wide', 
    v.name = 'ID', 
    timevar = 'Age')

答案 1 :(得分:0)

这是我用来生成答案的代码;只是想找到更清洁,没有那么多重复的东西,因为实际上我会有更多的团体......谢谢!

blah1 = aggregate( ID ~ Facility + MMWR, data = (subset(df, Age == "1")), length)
colnames(blah1)[3] = "Age 1"

blah2 = aggregate( ID ~ Facility + MMWR, data = (subset(df, Age == "2")), length)
colnames(blah2)[3] = "Age 2"

blah3 = aggregate( ID ~ Facility + MMWR, data = (subset(df, Age == "3")), length)
colnames(blah3)[3] = "Age 3"

blah4 = aggregate( ID ~ Facility + MMWR, data = (subset(df, Age == "4")), length)
colnames(blah4)[3] = "Age 4"


blah5 = merge(blah1, blah2, by = c("Facility", "MMWR"), all=TRUE)
blah6 = merge(blah5, blah3, by = c("Facility", "MMWR"), all=TRUE)
blah7 = merge(blah6, blah4, by = c("Facility", "MMWR"), all=TRUE)
blah7[is.na(blah7)] = 0

blah = blah7

答案 2 :(得分:0)

您可以使用aggregate执行此操作而不使用子集和reshape,但我更喜欢xtabs

ftable(xtabs(Count~Facility+MMWR+Age,data=transform(df,Count=1)))
              Age 1 2 3 4
Facility MMWR            
A        1503     3 0 0 1
         1504     1 1 1 1
         1505     3 0 0 1
         1506     2 2 0 0
B        1503     0 2 2 0
         1504     0 2 0 2
         1505     1 1 1 1
         1506     0 1 2 1
C        1503     1 0 1 2
         1504     0 3 0 1
         1505     0 0 3 1
         1506     1 1 2 0
D        1503     1 1 2 0
         1504     0 0 1 3
         1505     0 1 2 1
         1506     1 0 1 2

已经提供了结果,但是在ftable课程中。要转换为data.frame,我们会进行一些手动强制。

reshape.ftable<-function(ft) {    
  a<-attributes(ft)
  x<-data.frame(expand.grid(rev(a$row.vars)), unclass(ft))
  colnames(x)<-c(names(rev(a$row.vars)),
    unlist(lapply(a$col.vars,function(x) paste(names(a$col.vars),x,sep="."))))
  x
}

最终结果

reshape.ftable(ftable(xtabs(Count~Facility+MMWR+Age,data=transform(df,Count=1))))
   MMWR Facility Age.1 Age.2 Age.3 Age.4
1  1503        A     3     0     0     1
2  1504        A     1     1     1     1
3  1505        A     3     0     0     1
4  1506        A     2     2     0     0
5  1503        B     0     2     2     0
6  1504        B     0     2     0     2
7  1505        B     1     1     1     1
8  1506        B     0     1     2     1
9  1503        C     1     0     1     2
10 1504        C     0     3     0     1
11 1505        C     0     0     3     1
12 1506        C     1     1     2     0
13 1503        D     1     1     2     0
14 1504        D     0     0     1     3
15 1505        D     0     1     2     1
16 1506        D     1     0     1     2