查找Max Color&计数

时间:2014-12-18 17:31:14

标签: r dataframe

我有以下格式的矩阵:

     [,1]     [,2]  [,3]    [,4]   [,5]   [,6]  [,7]    [,8]   [,9]  
[1,] "blue"   "red" "blue"  "blue" "blue" "red" "green" "blue" "blue"
[2,] "green"  "red" "blue"  "blue" "blue" "red" "green" "blue" "blue"
[3,] "yellow" "red" "blue"  "blue" "blue" "red" "green" "blue" "blue"
[4,] "red"    "red" "blue"  "blue" "blue" "red" "green" "blue" "blue"
[5,] "blue"   "red" "green" "blue" "blue" "red" "green" "blue" "blue"
[6,] "green"  "red" "green" "blue" "blue" "red" "green" "blue" "blue"
 ...

如何快速计算每行的最大颜色和数量。

例如,对于第1行,它将是“蓝色,6”。我通过调用“table”的apply命令执行此操作。

但是,我的矩阵有190万行,因此需要太长时间。我该如何对此进行矢量化?

2 个答案:

答案 0 :(得分:4)

对于矩阵的每个单元,您有多少种不同的可能性?它就像你的例子吗?如果是,以下内容可能会更快

dat <- structure(c("blue", "green", "yellow", "red", "blue", "green", 
    "red", "red", "red", "red", "red", "red", "red", "red", "blue", 
    "blue", "blue", "blue", "green", "green", "red", "blue", "blue", 
    "blue", "blue", "blue", "blue", "red", "blue", "blue", "blue", 
    "blue", "blue", "blue", "blue", "red", "red", "red", "red", "red", 
    "red", "blue", "green", "green", "green", "green", "green", "green", 
    "blue", "blue", "blue", "blue", "blue", "blue", "blue", "blue", 
    "blue", "blue", "blue", "blue", "blue", "blue", "green"), .Dim = c(7L, 
    9L))

values <- c("blue", "red", "green", "yellow")
counts <- vapply(values, function(value) rowSums(dat == value), 
    numeric(nrow(dat))) # Thanks to @RichardScriven for the improvement :)
counts 
#      blue red green yellow
# [1,]    6   2     1      0
# [2,]    5   2     2      0
# [3,]    5   2     1      1
# [4,]    5   3     1      0
# [5,]    5   2     2      0
# [6,]    4   2     3      0
# [7,]    4   4     1      0

max.value.col <- max.col(counts)
max.value <- colnames(counts)[max.value.col]
max.counts <- counts[cbind(1:nrow(counts), max.value.col)]
paste(max.value, max.counts, sep = ", ")
# [1] "blue, 6" "blue, 5" "blue, 5" "blue, 5" "blue, 5" "blue, 4"

如果您想获取所有列的名称,如果存在平局,则以下情况可行但可能需要一段时间(在此情况下不确定apply的性能)

max.value.all.cols <- counts == counts[cbind(1:nrow(counts), max.value.col)]
paste(
    apply(max.value.all.cols, 1, function(r) paste(paste(colnames(counts)[r],     
       collapse = ", "))), 
    max.counts, sep = ", ")

答案 1 :(得分:0)

我认为这是一个实际的data.table解决方案。利用data.table的快速.N来计算行频

library(data.table)

flip <- data.table(t(mat))

tally <- lapply(names(flip), 
                function(x) {
                  setnames(flip[, .N, by=eval(x)][order(-N)][1,],
                           c('clr', 'N')) } )
do.call(rbind, tally)

#     clr N
# 1: blue 6
# 2: blue 5
# 3: blue 5
# 4: blue 5
# 5: blue 5
# 6: blue 4

我取矩阵并转置它,然后按每列(即原始矩阵的每一行)进行计数。 setnames位是必需的,以便我们可以方便地将结果合并在一起,但如果您乐意以列表形式获得结果,则不需要。

我使用了与其他人相同的数据:

mat <-
matrix(c( "blue","red","blue","blue","blue","red","green","blue","blue",
          "green","red","blue","blue","blue","red","green","blue","blue",
          "yellow","red","blue","blue","blue","red","green","blue","blue",
          "red","red","blue","blue","blue","red","green","blue","blue",
          "blue","red","green","blue","blue","red","green","blue","blue",
          "green","red","green","blue","blue","red","green","blue","blue"), 
       ncol = 9, byrow = TRUE)