这是Subset by group with data.table使用相同data.table:
的后续问题library(data.table)
bdt <- as.data.table(baseball)
# Aggregating and loosing information on other columns
dt1 <- bdt[ , .(max_g = max(g)), by = id]
# Aggregating and keeping information on other columns
dt2 <- bdt[bdt[, .I[g == max(g)], by = id]$V1]
为什么dt1
和dt2
的行数不同?
dt2是否应该在没有丢失其他列中的相应信息的情况下获得相同的结果?
答案 0 :(得分:3)
正如@Frank所指出的那样:
bdt[ , .(max_g = max(g)), by = id]
为您提供最大值,而
bdt[bdt[ , .I[g == max(g)], by = id]$V1]
标识具有此最大值的所有行。
请参阅What is the difference between arg max and max?获取数学解释,并在R:
中尝试这个超薄版本library(data.table)
bdt <- as.data.table(baseball)
dt <- bdt[id == "woodge01"][order(-g)]
dt[ , .(max = max(g)), by = id]
dt[ dt[ , .I[g == max(g)], by = id]$V1 ]