使用data.table进行子集,与聚合data.table进行比较

时间:2017-02-02 14:52:00

标签: r data.table

这是Subset by group with data.table使用相同data.table:

的后续问题
library(data.table)

bdt <- as.data.table(baseball)

# Aggregating and loosing information on other columns
dt1 <- bdt[ , .(max_g = max(g)), by = id]
# Aggregating and keeping information on other columns
dt2 <- bdt[bdt[, .I[g == max(g)], by = id]$V1]

为什么dt1dt2的行数不同? dt2是否应该在没有丢失其他列中的相应信息的情况下获得相同的结果?

1 个答案:

答案 0 :(得分:3)

正如@Frank所指出的那样:

bdt[ , .(max_g = max(g)), by = id]为您提供最大值,而

bdt[bdt[ , .I[g == max(g)], by = id]$V1]标识具有此最大值的所有行。

请参阅What is the difference between arg max and max?获取数学解释,并在R:

中尝试这个超薄版本
library(data.table)
bdt <- as.data.table(baseball)

dt <- bdt[id == "woodge01"][order(-g)]
dt[ , .(max = max(g)), by = id]
dt[ dt[ , .I[g == max(g)], by = id]$V1 ]