如何在R中使此循环更快?

时间:2018-07-17 12:52:38

标签: r performance loops vectorization

我已经编写了此R代码,在R中运行大约需要15分钟。

  • mergNet是尺寸为23715660 * 5的矩阵;
  • cosine_simAug_simavg_usimavg_unov已经计算出来,并且它们都是矩阵;
  • users是一个等于943的标量常量;
  • weight是根据两个条件计算得出的;
  • cosine_simAug_sim的尺寸为943 * 943;
  • avg_usimavg_unov的尺寸为943 * 1682。

如何更改此代码,使其运行更快?

weight = matrix(0,nrow= nrow(mergNet), ncol=1)
for (i in 1:nrow(mergNet)){
  temp1 = mergNet[i,1]
  temp2 = mergNet[i,3]
  mid = mergNet[i,2]
  if(temp1<944 && temp2<944)#u_u
  {
    weight[i,1]= (cosine_sim[temp1,temp2])* A[temp1,temp2] * ug_sim[temp1,temp2]

  }
  if (temp1>943 && temp2>943){
    weight[i,1] = avg_usim[mid , temp2-users] * (avg_unov[mid, temp2-users]) 
  }
  }
}

mergNet的前十行:

   src1 dst1 dst2   id1    id2
1   962    1 1186 53230  91038
2   962    1 1032 53230 156361
3   962    1 1116 53230  85937
4   962    1 1118 53230 107437
5   962    1 1150 53230 119957
6   962    1 1187 53230 101035
7   962    1 1188 53230 150941
8   962    1  962 53230 133230
9   962    1 1169 53230 116318
10  962    1 1101 53230 103387

A的前十行和前十列:

           [,1]      [,2]      [,3] [,4]      [,5]      [,6]      [,7]  [,8]      [,9]     [,10]
 [1,] 0.0000000 0.7291667 0.9019608 0.80 0.5869565 0.6583851 0.7038217 0.575 0.8333333 0.6827586
 [2,] 0.9380952 0.0000000 0.8235294 0.75 0.9710145 0.8757764 0.9713376 0.950 0.7777778 0.9379310
 [3,] 0.9761905 0.8125000 0.0000000 0.50 0.9927536 0.9440994 0.9649682 0.850 0.8888889 0.9517241
 [4,] 0.9809524 0.8958333 0.8039216 0.00 0.9927536 0.9627329 0.9808917 0.900 0.9444444 0.9862069
 [5,] 0.7285714 0.9166667 0.9803922 0.95 0.0000000 0.8509317 0.8057325 0.775 0.8888889 0.8344828
 [6,] 0.7380952 0.5833333 0.8235294 0.70 0.8260870 0.0000000 0.7324841 0.725 0.5000000 0.6275862
 [7,] 0.5571429 0.8125000 0.7843137 0.70 0.5579710 0.4782609 0.0000000 0.475 0.5000000 0.4965517
 [8,] 0.9190476 0.9583333 0.8823529 0.80 0.9347826 0.9316770 0.9331210 0.000 1.0000000 0.9310345
 [9,] 0.9857143 0.9166667 0.9607843 0.95 0.9855072 0.9440994 0.9713376 1.000 0.0000000 0.9241379
[10,] 0.7809524 0.8125000 0.8627451 0.90 0.8260870 0.6645963 0.7675159 0.750 0.3888889 0.0000000

cosine_simug_sim与A相似。

avg_usim的前十行和列:

            [,1]       [,2]       [,3]       [,4] [,5] [,6]        [,7]       [,8]       [,9]      [,10]
 [1,] 0.00000000 0.09284909 0.08681234 0.00000000    0    0 0.063968603 0.06623507 0.05759746 0.07562590
 [2,] 0.00000000 0.00000000 0.00000000 0.00000000    0    0 0.000000000 0.00000000 0.00000000 0.00000000
 [3,] 0.00000000 0.00000000 0.00000000 0.00000000    0    0 0.000000000 0.00000000 0.00000000 0.00000000
 [4,] 0.00000000 0.00000000 0.00000000 0.00000000    0    0 0.000000000 0.00000000 0.00000000 0.00000000
 [5,] 0.03792794 0.00000000 0.00000000 0.00000000    0    0 0.000000000 0.00000000 0.00000000 0.00000000
 [6,] 0.00000000 0.00000000 0.00000000 0.00000000    0    0 0.034380897 0.04893480 0.03765768 0.00000000
 [7,] 0.00000000 0.00000000 0.00000000 0.06751437    0    0 0.046987708 0.06087732 0.04545343 0.04983857
 [8,] 0.00000000 0.00000000 0.00000000 0.00000000    0    0 0.000000000 0.00000000 0.00000000 0.00000000
 [9,] 0.00000000 0.00000000 0.00000000 0.00000000    0    0 0.005074253 0.00000000 0.00000000 0.00000000
[10,] 0.03665521 0.00000000 0.00000000 0.05438324    0    0 0.041738325 0.00000000 0.00000000 0.00000000

以及avg_unovavg_usim类似。

1 个答案:

答案 0 :(得分:0)

解决方案是对循环进行矢量化处理,并选择子集并分配所需的值。需要使用purrr库中的map2将选定的行映射到矩阵。

weight = matrix(0,nrow= nrow(mergNet), ncol=1)

temp1 = mergNet[ ,1]
temp2 = mergNet[ ,3]
mid = mergNet[ ,2]

library(purrr)
#find rows where cond1 is true
cond1<-which((temp1<944) & (temp2<944))
#map the temp1 and temp2 to matrix
weight[cond1, 1]<-unlist(map2(temp1[cond1], temp2[cond1], function(i, j){(cosine_sim[i,j])* A[i,j] * ug_sim[i,j]}))

#find rows where cond2 is true 
cond2<-which((temp1>943) & (temp2>943))
weight[cond2, 1]<-unlist(map2(mid[cond2], temp2[cond2], function(i, j){avg_usim[i , j-users] * (avg_unov[i, j-users])}))

没有任何数据,很难测试该解决方案。