Question

这是一个人为的例子（因此缺乏输出），但很简单（希望）能够证明我的问题。我想计算按“Country”和“FavoriteColor”分组的每个子组的mean（）收入。

#For a single subgroup
mean(dat[J("Blue","Nigeria")]$Income)   #dat is a data.table object             
#For all the subgroups...in the output I obviously 
#see the mean() for Blue/Nigeria subgroup. So far so good.
dat[,mean(Income),by=list((FavoriteColor,Country)]

但是现在，我想要所有子集的收入汇总（）统计数据，而不仅仅是均值（）。所以我只是......

#For a single subgroup
summary(dat[J("Blue","Nigeria")]$Income)                
#For all the subgroups... but this doesn't do what I expect. 
#It seems to computing something else entirely; I think
#its calling summary() on each row
dat[,summary(Income),by=list(FavoriteColor,Country)]

我做错了什么？

Answer 1

如果没有具体可重复的示例，我只是猜测没有意识到summary返回将形成单个列的数字向量。在哪里你可能认为你想要它的广泛形式。

要实现此目的，请将summary(income)包裹在as.list中，这样它就会变成一个长度为6的列表

例如比较

DT <- data.table(a = letters[1:3],b= letters[1:2],i = 1:36)
DT[,summary(i),by=list(a,b)]
    a b   V1
 1: a a  1.0
 2: a a  8.5
 3: a a 16.0
 4: a a 16.0
 5: a a 23.5
 6: a a 31.0
 7: b b  2.0
 8: b b  9.5
 9: b b 17.0
10: b b 17.0
11: b b 24.5
12: b b 32.0
13: c a  3.0
14: c a 10.5
15: c a 18.0
16: c a 18.0
17: c a 25.5
18: c a 33.0
19: a b  4.0
20: a b 11.5
21: a b 19.0
22: a b 19.0
23: a b 26.5
24: a b 34.0
25: b a  5.0
26: b a 12.5
27: b a 20.0
28: b a 20.0
29: b a 27.5
30: b a 35.0
31: c b  6.0
32: c b 13.5
33: c b 21.0
34: c b 21.0
35: c b 28.5
36: c b 36.0
    a b   V1

和

DT[,as.list(summary(i)),by=list(a,b)]
   a b Min. 1st Qu. Median Mean 3rd Qu. Max.
1: a a    1     8.5     16   16    23.5   31
2: b b    2     9.5     17   17    24.5   32
3: c a    3    10.5     18   18    25.5   33
4: a b    4    11.5     19   19    26.5   34
5: b a    5    12.5     20   20    27.5   35
6: c b    6    13.5     21   21    28.5   36

如果您希望以long格式保存的名称类似于

DT[,{s <- summary(i); list(s, names(s))},by=list(a,b)]

会奏效。

J内部的行为不一致？

1 个答案: