R - 选择满足多个条件的矩阵行的最快方法

时间:2013-08-08 14:21:20

标签: r performance matrix conditional-statements multiple-columns

这是对R. {3}}问题的扩展。说我有矩阵:

       one two three four
 [1,]   1   6    11   16
 [2,]   2   7    12   17
 [3,]   3   8    11   18
 [4,]   4   9    11   19
 [5,]   5  10    15   20
 [6,]   1   6    15   20
 [7,]   5   7    12   20

我希望尽快返回matrix$two == 7matrix$three == 12的所有行。这就是我所知道的方式:

 out <- mat[mat$two == 7,]
 final_out <- out[out$three == 12, ]

显然应该有一种方法可以将final_out的内容放在一行中,例如:final_out <- which(mat$two == 7 && mat$three == 12)比上面的两行代码更快,更简洁。

返回此多条件矩阵查询的最快R代码是什么?

5 个答案:

答案 0 :(得分:12)

只需使用[子集与逻辑比较...

#  Reproducible data
set.seed(1)
m <- matrix( sample(12,28,repl=T) , 7 , 4 )
     [,1] [,2] [,3] [,4]
[1,]    4    8   10    3
[2,]    5    8    6    8
[3,]    7    1    9    2
[4,]   11    3   12    4
[5,]    3    3    5    5
[6,]   11    9   10    1
[7,]   12    5   12    5


#  Subset according to condition
m[ m[,2] == 3 & m[,3] == 12 , ]
[1] 11  3 12  4

答案 1 :(得分:3)

使用MICROBENCHMARK更新:

使用基准测试给出了相反的答案。似乎@ SimonO101给出的答案提供了稍微快一点的实现。

require(microbenchmark)
set.seed(1)
m <- matrix( sample(12,100,repl=T) , 25 , 4 )
colnames(m) <- c("one","two","three","four")

bench1 <- microbenchmark(m[which(m[,'two']==7 & m[,'three'] == 12, arr.ind = TRUE),])
summary(bench1$time)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   7700    8750    9449    9688    9800   22400

bench2 <- microbenchmark(m[ m[,2] == 3 & m[,3] == 12 , ])
summary(bench2$time)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   6300    7350    7351    7599    8050   15400

OLD ANSWER:

结合@Jiber和@ SimonO101给出的答案,给出了一个稍快的答案,至少在我的电脑上。

我使矩阵大得多,以分开计算时间。

set.seed(1)
m <- matrix( sample(12,1000000000,repl=T) , 1e8 , 10 )
colnames(m) <- c("one","two","three","four","five","six","seven","eight","nine","ten")

system.time(m[which(m[,'two']==7 & m[,'three'] == 12, arr.ind = TRUE),])
   user  system elapsed 
   6.49    1.58    8.06 
system.time(m[ m[,2] == 3 & m[,3] == 12 , ])
   user  system elapsed 
   8.23    1.29    9.52 

这显然假设矩阵列已命名。

答案 2 :(得分:1)

whicharr.ind=TRUE一起使用,如下所示:

> mat[which(mat[,"two"]==7 & mat[,"three"] == 12, arr.ind = TRUE),]
  one two three four
2   2   7    12   17
7   5   7    12   20

答案 3 :(得分:1)

如果你有很多行,那么最好先进行子集化,如下面的代码所示

Route::get('/create', [
    'as' => 'create',
    'uses' => 'TestController@index'
]);

结果如下:

set.seed(1)
m <- matrix( sample(12,28,repl=T) , 12e6 , 4 )

#  Subset according to condition
microbenchmark(sample0=m[ m[,2] == 3 & m[,3] == 12 , ],times = 10L)

microbenchmark(sample1=m[ m[,2] == 3, ],
           sample2= sample1[sample1[,3] == 12, ],times = 10L)

答案 4 :(得分:-2)

R中绝对最快的方式是ifelse,与if不同,它允许矢量化条件。您还可以缓存条件的向量(例如isSeven <- mat[, 'two'] == 7)并稍后使用/重用这些条件。

我这里没有可重复的例子,但我会做类似

的事情
ifelse(mat[, 'two'] == 7 & mat[, 'three'] == 12, "both", "not both")

你可以在那里找到其他条件,或者让它返回任何会导致一致的向量的条件。