增强for循环的速度/矢量化,包括样本函数R.

时间:2015-07-22 11:10:48

标签: r performance for-loop vectorization

我正在寻找一种快速的方法来创建一个具有一定概率的整数值的矩阵。给定向量L=c(3,4,2)和带有Prob=c(0.4,0.35,0.25,0.1,0.25,0.4,0.25,0.6,0.4)元素的概率向量sum(L),我想选择例如1:L[1] = 1:3与概率Prob[1:L[1]] = c(0.4,0.35,0.25)之间的元素。这应该在L的所有元素上执行多次,由参数rows确定,并存储在名为POP的矩阵中。

我的解决方案非常慢,因为有两个for循环,我正在通过矢量化或其他技术搜索性能更好的解决方案。

我对此问题的解决方案如下:

L = c(3,4,2)
L_cum = c(0,cumsum(L)) #vector to call vector sections from Prob
Prob = c(0.4,0.35,0.25,0.1,0.25,0.4,0.25,0.6,0.4)  #probability vector for sum(L) elements
rows = 5  #number of rows of matrix POP
POP = matrix(0,rows,length(L)) 

for(i in 1:rows){
 for(j in 1:length(L)){
   POP[i,j] = sample(1:L[j],1,prob=Prob[(L_cum[j]+1):L_cum[j+1]])
 }
}

1 个答案:

答案 0 :(得分:4)

我只是尝试:

set.seed(1234)
#set the number of extractions
n<-10
vapply(split(Prob,rep(seq_along(L),L)), 
          function(x) sample(length(x),n,replace=TRUE,prob=x),
          integer(n))
#      1 2 3
# [1,] 1 4 1
# [2,] 2 2 1
# [3,] 2 3 1
# [4,] 2 1 1
# [5,] 3 3 1
# [6,] 2 4 2
# [7,] 1 3 1
# [8,] 1 3 2
# [9,] 2 3 2
#[10,] 2 3 1