在R中简化条件表循环而不使用矩阵表示法

时间:2015-05-08 03:13:24

标签: r data.table dplyr lapply

使用下面的示例,我想知道是否有更高效的包或函数来对匹配的字符串元素进行条件计数和表格 - 例如,使用data.table包,dplyr包, lapply()喜欢这个功能吗?

produce = c("apple", "blueberry", "blueberry", "corn",
            "horseradish", "rutabega", "rutabega", "tomato") # Long list

veggies = c("carrot", "corn", "horseradish", "rutabega") # Short list

basket = matrix(rep(0, length(unique(veggies))*length(unique(produce)) ), nrow = length(unique(veggies)),  
                ncol = length(unique(produce)) )

rownames(basket) <- unique(veggies)
colnames(basket) <- unique(produce)

basket

输出:

#               apple blueberry corn horseradish rutabega tomato
# carrot          0         0    0           0        0      0
# corn            0         0    0           0        0      0
# horseradish     0         0    0           0        0      0
# rutabega        0         0    0           0        0      0

使用共享实例查找计数

for(i in 1:length(veggies)) {

  counter = NULL

  for (j in 1:length(produce)){ 

    if(veggies[i] ==  produce[j]){ 

      basket[i, which( colnames(basket) == produce[j] ) ] <- basket[i, 
                             which( colnames(basket) == produce[j] ) ] + 1

    }

  }

}

basket

我正在寻求使用更快/更优雅的方法的结果:

#               apple blueberry corn horseradish rutabega tomato
# carrot          0         0    0           0        0      0
# corn            0         0    1           0        0      0
# horseradish     0         0    0           1        0      0
# rutabega        0         0    0           0        2      0

2 个答案:

答案 0 :(得分:6)

使用data.table

library(data.table)
dcast(data.table(produce), produce~produce)[veggies]

       produce apple blueberry corn horseradish rutabega tomato
#1:      carrot    NA        NA   NA          NA       NA     NA
#2:        corn     0         0    1           0        0      0
#3: horseradish     0         0    0           1        0      0
#4:    rutabega     0         0    0           0        2      0

答案 1 :(得分:2)

我在基地R中可以想到的最难看的解决方案:

do.call(table, replicate(2,factor(produce, levels=unique(c(produce,veggies))),simplify=FALSE))[veggies,]

或者只有一条丑陋的线条:

{{1}}