让我定义一个数据框,其中一列id
由整数
df <- data.frame(id = c(1,2,2,3,3))
和列objects
,而不是字符向量列表。让我们使用以下函数
randomObjects <- function(argument) {
numberObjects <- sample(c(1,2,3,4), 1)
vector <- character()
for (i in 1:numberObjects) {
vector <- c(vector, sample(c("apple","pear","banana"), 1))
}
return(vector)
}
然后使用lapply
调用
set.seed(28100)
df$objects <- lapply(df$id, randomObjects)
结果数据框是
df
# id objects
# 1 1 apple, apple
# 2 2 apple, banana, pear
# 3 2 banana
# 4 3 banana, pear, banana
# 5 3 pear, pear, apple, pear
现在我想用这样的数据框计算每个id
对应的对象数
summary <- data.frame(id = c(1, 2, 3),
apples = c(2, 1, 1),
bananas = c(0, 2, 2),
pears = c(0, 1, 4))
summary
# id apples bananas pears
# 1 1 2 0 0
# 2 2 1 2 1
# 3 3 1 2 4
如何在不使用df
循环的情况下将summary
的信息折叠为更紧凑的数据框,例如for
?
答案 0 :(得分:4)
这是一个&#34; data.table&#34;的方法:
library(data.table)
dcast.data.table(as.data.table(df)[
, unlist(objects), by = id][
, .N, by = .(id, V1)],
id ~ V1, value.var = "N", fill = 0L)
# id apple banana pear
# 1: 1 2 0 0
# 2: 2 1 2 1
# 3: 3 1 2 4
unlist
ID的值,使用.N
计算,并使用dcast.data.table
重新整形。
最初,我曾想过来自&#34; qdapTools&#34;的mtabulate
,但这并没有进行聚合步骤。不过,你可以尝试类似的东西:
library(data.table)
library(qdapTools)
data.table(cbind(df[1], mtabulate(df[[-1]])))[, lapply(.SD, sum), by = id]
# id apple banana pear
# 1: 1 2 0 0
# 2: 2 1 2 1
# 3: 3 1 2 4
答案 1 :(得分:3)
library(plyr)
ddply(df, .(id), function(d, lev) {
x <- factor(unlist(d$objects), levels = lev)
t(as.matrix(table(x)))
}, lev = unique(unlist(df$objects)))
# id apple banana pear
#1 1 2 0 0
#2 2 1 2 1
#3 3 1 2 4
答案 2 :(得分:1)
首先,汇总到id
并转换为系数
id_objs <- lapply(tapply(df$obj,df$id,unlist),factor,levels=unique(unlist(df$obj)))
然后制表
tab <- sapply(id_objs,table)
对于您想要的输出,转置结果:t(tab)
apple banana pear
1 2 0 0
2 1 2 1
3 1 2 4
答案 3 :(得分:1)
使用apply
的另一种方式:
library(data.table)
vals = unique(do.call('c', df[,2]))
setDT(df)[,as.list(table(factor(do.call('c',objects), levels=vals))),by=id]
# id apple banana pear
#1: 1 2 0 0
#2: 2 1 2 1
#3: 3 1 2 4