我想创建多个聚合数据集各个子集的变量。有关说明示例,请说明您有以下数据:
DT = data.table(Group1 = c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4),
Group2 = c(1,1,1,2,2,1,1,2,2,2,1,1,1,1,2,1,1,2,2,2),
Var1 = c(1,1,0,0,0,1,1,0,1,0,1,0,0,0,0,0,0,0,0,0))
我想找到变量Var1
的几个平均值。我想知道:
mean(Var1)
按Group1
mean(Var1)
仅适用于那些Group2 == 1
,Group1
mean(Var1)
仅适用于那些Group2 == 2
,Group1
或者,在data.table用语中,
DT[, mean(Var1), by=Group1]
DT[Group2==1, mean(Var1), by=Group1]
DT[Group2==2, mean(Var1), by=Group1]
显然,计算其中任何一个都非常简单。但我无法找到计算所有这三个的好方法,因为它们在i
中使用了不同的子集。到目前为止我一直在使用的解决方案是单独生成它们,然后将它们合并到一个统一的表中。
DT_all <- DT[, .(avgVar1_all = mean(Var1)), by = Group1]
DT_1 <- DT[Group2 == 1, .(avgVar1_1 = mean(Var1)), by = Group1]
DT_2 <- DT[Group2 == 2, .(avgVar1_2 = mean(Var1)), by = Group1]
group_info <- merge(DT_all, DT_1, by = "Group1")
group_info <- merge(group_info, DT_2, by = "Group1")
group_info
# Group1 avgVar1_all avgVar1_1 avgVar1_2
# 1: 1 0.4 0.6666667 0.0000000
# 2: 2 0.6 1.0000000 0.3333333
# 3: 3 0.2 0.2500000 0.0000000
# 4: 4 0.0 0.0000000 0.0000000
我可以使用更优雅的方法吗?
答案 0 :(得分:5)
使用.SD
:
DT[, .(
all = mean(Var1),
grp1 = .SD[Group2==1, mean(Var1)],
grp2 = .SD[Group2==2, mean(Var1)]
),
by = Group1,
.SDcols=c("Group2","Var1")
]
# Group1 all grp1 grp2
#1: 1 0.4 0.6666667 0.0000000
#2: 2 0.6 1.0000000 0.3333333
#3: 3 0.2 0.2500000 0.0000000
#4: 4 0.0 0.0000000 0.0000000
答案 1 :(得分:3)
您可以使用reshape2::dcast
:
reshape2::dcast(DT, Group1 ~ Group2, fun=mean, margins="Group2")
Group1 1 2 (all)
1 1 0.6666667 0.0000000 0.4
2 2 1.0000000 0.3333333 0.6
3 3 0.2500000 0.0000000 0.2
4 4 0.0000000 0.0000000 0.0
@thelatmail在下面的评论中指出,这种方法不能很好地扩展。最后,在data.table&#39; dcast
中margins
should be available,这可能会更有效。
一个丑陋的解决方法:
DT[, c(
dcast(.SD, Group1 ~ Group2, fun=mean),
all = .(dcast(.SD, Group1 ~ ., fun=mean)$.)
)]
Group1 1 2 all
1: 1 0.6666667 0.0000000 0.4
2: 2 1.0000000 0.3333333 0.6
3: 3 0.2500000 0.0000000 0.2
4: 4 0.0000000 0.0000000 0.0