我有这个玩具data.frame
:
df = data.frame(id = c("a","b","c","d"), value = c(2,3,6,5))
我希望根据这个玩具向量汇总其行:
collapsed.ids = c("a,b","c","d")
聚合的data.frame应保留其聚合行的max(df$value)
。
因此,对于这个玩具示例,输出将是:
> aggregated.df
id value
1 a,b 3
2 c 6
3 d 5
我应该注意到我的真实data.frame是~150,000行
答案 0 :(得分:3)
我会使用data.table
。
以下内容应该有效:
library(data.table)
DT <- data.table(df, key = "id") # Main data.table
Key <- data.table(ind = collapsed.ids) # your "Key" table
## We need your "Key" table in a long form
Key <- Key[, list(id = unlist(strsplit(ind, ",", fixed = TRUE))), by = ind]
setkey(Key, id) # Set the key to facilitate a merge
## Merge and aggregate in one step
DT[Key][, list(value = max(value)), by = ind]
# ind value
# 1: a,b 3
# 2: c 6
# 3: d 5
答案 1 :(得分:1)
您不需要data.table
,您只需使用基地R.
split.ids <- strsplit(collapsed.ids, ",")
split.df <- data.frame(id = tmp <- unlist(split.ids),
joinid = rep(collapsed.ids, sapply(split.ids, length)))
aggregated.df <- aggregate(value ~ id, data = merge(df, split.df), max)
结果:
# id value
# 1 a,b 3
# 2 c 6
# 3 d 5
df <- df[rep(1:4, 50000), ] # Make a big data.frame
system.time(...) # of the above code
# user system elapsed
# 1.700 0.154 1.947
编辑:显然Ananda的代码运行在0.039,所以我正在吃乌鸦。但是这两种尺寸都可以接受。