使用下面的示例,我想知道是否有更高效的包或函数来对匹配的字符串元素进行条件计数和表格 - 例如,使用data.table
包,dplyr
包, lapply()
喜欢这个功能吗?
produce = c("apple", "blueberry", "blueberry", "corn",
"horseradish", "rutabega", "rutabega", "tomato") # Long list
veggies = c("carrot", "corn", "horseradish", "rutabega") # Short list
basket = matrix(rep(0, length(unique(veggies))*length(unique(produce)) ), nrow = length(unique(veggies)),
ncol = length(unique(produce)) )
rownames(basket) <- unique(veggies)
colnames(basket) <- unique(produce)
basket
输出:
# apple blueberry corn horseradish rutabega tomato
# carrot 0 0 0 0 0 0
# corn 0 0 0 0 0 0
# horseradish 0 0 0 0 0 0
# rutabega 0 0 0 0 0 0
使用共享实例查找计数
for(i in 1:length(veggies)) {
counter = NULL
for (j in 1:length(produce)){
if(veggies[i] == produce[j]){
basket[i, which( colnames(basket) == produce[j] ) ] <- basket[i,
which( colnames(basket) == produce[j] ) ] + 1
}
}
}
basket
我正在寻求使用更快/更优雅的方法的结果:
# apple blueberry corn horseradish rutabega tomato
# carrot 0 0 0 0 0 0
# corn 0 0 1 0 0 0
# horseradish 0 0 0 1 0 0
# rutabega 0 0 0 0 2 0
答案 0 :(得分:6)
library(data.table)
dcast(data.table(produce), produce~produce)[veggies]
produce apple blueberry corn horseradish rutabega tomato
#1: carrot NA NA NA NA NA NA
#2: corn 0 0 1 0 0 0
#3: horseradish 0 0 0 1 0 0
#4: rutabega 0 0 0 0 2 0
答案 1 :(得分:2)
我在基地R中可以想到的最难看的解决方案:
do.call(table, replicate(2,factor(produce, levels=unique(c(produce,veggies))),simplify=FALSE))[veggies,]
或者只有一条丑陋的线条:
{{1}}