我有以下数据框:
df <- data.frame(ID = sample(1:100, 64),
Facility=rep(c("A","B","C","D"), each=4, times=4),
MMWR = rep(c("1503","1504","1505", "1506"), 16),
Age = sample(1:4, 64, replace=TRUE))
ID Facility MMWR Age
1 86 A 1503 4
2 85 A 1504 3
3 37 A 1505 1
4 77 A 1506 1
5 73 B 1503 2
6 40 B 1504 2
7 2 B 1505 3
8 97 B 1506 3
9 83 C 1503 4
10 80 C 1504 4
11 69 C 1505 3
12 93 C 1506 3
13 56 D 1503 3
14 12 D 1504 4
15 1 D 1505 2
16 72 D 1506 1
17 90 A 1503 1
18 95 A 1504 2
19 78 A 1505 4
20 98 A 1506 2
21 68 B 1503 2
22 38 B 1504 4
23 21 B 1505 2
24 3 B 1506 2
25 16 C 1503 4
26 74 C 1504 2
27 27 C 1505 4
28 6 C 1506 2
29 64 D 1503 1
30 59 D 1504 3
31 65 D 1505 3
32 53 D 1506 4
33 9 A 1503 1
34 22 A 1504 1
35 62 A 1505 1
36 26 A 1506 2
37 31 B 1503 3
38 100 B 1504 2
39 47 B 1505 1
40 36 B 1506 3
41 60 C 1503 3
42 18 C 1504 2
43 10 C 1505 3
44 51 C 1506 3
45 44 D 1503 3
46 54 D 1504 4
47 76 D 1505 3
48 67 D 1506 3
49 28 A 1503 1
50 58 A 1504 4
51 23 A 1505 1
52 71 A 1506 1
53 20 B 1503 3
54 32 B 1504 4
55 84 B 1505 4
56 33 B 1506 4
57 50 C 1503 1
58 61 C 1504 2
59 25 C 1505 3
60 91 C 1506 1
61 17 D 1503 2
62 81 D 1504 4
63 48 D 1505 4
64 24 D 1506 4
我想聚合(和/或转置?)以获得每个年龄段的数字。对于上面的数据集,我想要以下输出
Facility MMWR Age 1 Age 2 Age 3 Age 4
1 A 1503 3 0 0 1
2 A 1504 1 1 1 1
3 A 1505 3 0 0 1
4 A 1506 2 2 0 0
5 B 1503 0 2 2 0
6 B 1504 0 2 0 2
7 B 1505 1 1 1 1
8 B 1506 0 1 2 1
9 C 1503 1 0 1 2
10 C 1504 0 3 0 1
11 C 1505 0 0 3 1
12 C 1506 1 1 2 0
13 D 1503 1 1 2 0
14 D 1504 0 0 1 3
15 D 1505 0 1 2 1
16 D 1506 1 0 1 2
请注意我只能使用BASE R!
我会在评论中提出长篇大论的解决方案,但我希望有人能给我更好的东西......
答案 0 :(得分:4)
仅使用基数R,reshape
和aggregate
就可以了。
reshape(aggregate(ID ~ Facility + MMWR + Age, data = df, length),
idvar = c('Facility', 'MMWR'),
direction = 'wide',
v.name = 'ID',
timevar = 'Age')
答案 1 :(得分:0)
这是我用来生成答案的代码;只是想找到更清洁,没有那么多重复的东西,因为实际上我会有更多的团体......谢谢!
blah1 = aggregate( ID ~ Facility + MMWR, data = (subset(df, Age == "1")), length)
colnames(blah1)[3] = "Age 1"
blah2 = aggregate( ID ~ Facility + MMWR, data = (subset(df, Age == "2")), length)
colnames(blah2)[3] = "Age 2"
blah3 = aggregate( ID ~ Facility + MMWR, data = (subset(df, Age == "3")), length)
colnames(blah3)[3] = "Age 3"
blah4 = aggregate( ID ~ Facility + MMWR, data = (subset(df, Age == "4")), length)
colnames(blah4)[3] = "Age 4"
blah5 = merge(blah1, blah2, by = c("Facility", "MMWR"), all=TRUE)
blah6 = merge(blah5, blah3, by = c("Facility", "MMWR"), all=TRUE)
blah7 = merge(blah6, blah4, by = c("Facility", "MMWR"), all=TRUE)
blah7[is.na(blah7)] = 0
blah = blah7
答案 2 :(得分:0)
您可以使用aggregate
执行此操作而不使用子集和reshape
,但我更喜欢xtabs
。
ftable(xtabs(Count~Facility+MMWR+Age,data=transform(df,Count=1)))
Age 1 2 3 4 Facility MMWR A 1503 3 0 0 1 1504 1 1 1 1 1505 3 0 0 1 1506 2 2 0 0 B 1503 0 2 2 0 1504 0 2 0 2 1505 1 1 1 1 1506 0 1 2 1 C 1503 1 0 1 2 1504 0 3 0 1 1505 0 0 3 1 1506 1 1 2 0 D 1503 1 1 2 0 1504 0 0 1 3 1505 0 1 2 1 1506 1 0 1 2
已经提供了结果,但是在ftable
课程中。要转换为data.frame
,我们会进行一些手动强制。
reshape.ftable<-function(ft) {
a<-attributes(ft)
x<-data.frame(expand.grid(rev(a$row.vars)), unclass(ft))
colnames(x)<-c(names(rev(a$row.vars)),
unlist(lapply(a$col.vars,function(x) paste(names(a$col.vars),x,sep="."))))
x
}
最终结果
reshape.ftable(ftable(xtabs(Count~Facility+MMWR+Age,data=transform(df,Count=1))))
MMWR Facility Age.1 Age.2 Age.3 Age.4 1 1503 A 3 0 0 1 2 1504 A 1 1 1 1 3 1505 A 3 0 0 1 4 1506 A 2 2 0 0 5 1503 B 0 2 2 0 6 1504 B 0 2 0 2 7 1505 B 1 1 1 1 8 1506 B 0 1 2 1 9 1503 C 1 0 1 2 10 1504 C 0 3 0 1 11 1505 C 0 0 3 1 12 1506 C 1 1 2 0 13 1503 D 1 1 2 0 14 1504 D 0 0 1 3 15 1505 D 0 1 2 1 16 1506 D 1 0 1 2