这是一个人为的例子(因此缺乏输出),但很简单(希望)能够证明我的问题。我想计算按“Country”和“FavoriteColor”分组的每个子组的mean()收入。
#For a single subgroup
mean(dat[J("Blue","Nigeria")]$Income) #dat is a data.table object
#For all the subgroups...in the output I obviously
#see the mean() for Blue/Nigeria subgroup. So far so good.
dat[,mean(Income),by=list((FavoriteColor,Country)]
但是现在,我想要所有子集的收入汇总()统计数据,而不仅仅是均值()。所以我只是......
#For a single subgroup
summary(dat[J("Blue","Nigeria")]$Income)
#For all the subgroups... but this doesn't do what I expect.
#It seems to computing something else entirely; I think
#its calling summary() on each row
dat[,summary(Income),by=list(FavoriteColor,Country)]
我做错了什么?
答案 0 :(得分:6)
如果没有具体可重复的示例,我只是猜测没有意识到summary
返回将形成单个列的数字向量。在哪里你可能认为你想要它的广泛形式。
要实现此目的,请将summary(income)
包裹在as.list
中,这样它就会变成一个长度为6
的列表
例如比较
DT <- data.table(a = letters[1:3],b= letters[1:2],i = 1:36)
DT[,summary(i),by=list(a,b)]
a b V1
1: a a 1.0
2: a a 8.5
3: a a 16.0
4: a a 16.0
5: a a 23.5
6: a a 31.0
7: b b 2.0
8: b b 9.5
9: b b 17.0
10: b b 17.0
11: b b 24.5
12: b b 32.0
13: c a 3.0
14: c a 10.5
15: c a 18.0
16: c a 18.0
17: c a 25.5
18: c a 33.0
19: a b 4.0
20: a b 11.5
21: a b 19.0
22: a b 19.0
23: a b 26.5
24: a b 34.0
25: b a 5.0
26: b a 12.5
27: b a 20.0
28: b a 20.0
29: b a 27.5
30: b a 35.0
31: c b 6.0
32: c b 13.5
33: c b 21.0
34: c b 21.0
35: c b 28.5
36: c b 36.0
a b V1
和
DT[,as.list(summary(i)),by=list(a,b)]
a b Min. 1st Qu. Median Mean 3rd Qu. Max.
1: a a 1 8.5 16 16 23.5 31
2: b b 2 9.5 17 17 24.5 32
3: c a 3 10.5 18 18 25.5 33
4: a b 4 11.5 19 19 26.5 34
5: b a 5 12.5 20 20 27.5 35
6: c b 6 13.5 21 21 28.5 36
如果您希望以long
格式保存的名称类似于
DT[,{s <- summary(i); list(s, names(s))},by=list(a,b)]
会奏效。