我已经编写了此R代码,在R中运行大约需要15分钟。
mergNet
是尺寸为23715660 * 5的矩阵; cosine_sim
,A
,ug_sim
,avg_usim
和avg_unov
已经计算出来,并且它们都是矩阵; users
是一个等于943的标量常量; weight
是根据两个条件计算得出的; cosine_sim
,A
和ug_sim
的尺寸为943 * 943; avg_usim
和avg_unov
的尺寸为943 * 1682。如何更改此代码,使其运行更快?
weight = matrix(0,nrow= nrow(mergNet), ncol=1)
for (i in 1:nrow(mergNet)){
temp1 = mergNet[i,1]
temp2 = mergNet[i,3]
mid = mergNet[i,2]
if(temp1<944 && temp2<944)#u_u
{
weight[i,1]= (cosine_sim[temp1,temp2])* A[temp1,temp2] * ug_sim[temp1,temp2]
}
if (temp1>943 && temp2>943){
weight[i,1] = avg_usim[mid , temp2-users] * (avg_unov[mid, temp2-users])
}
}
}
mergNet
的前十行:
src1 dst1 dst2 id1 id2
1 962 1 1186 53230 91038
2 962 1 1032 53230 156361
3 962 1 1116 53230 85937
4 962 1 1118 53230 107437
5 962 1 1150 53230 119957
6 962 1 1187 53230 101035
7 962 1 1188 53230 150941
8 962 1 962 53230 133230
9 962 1 1169 53230 116318
10 962 1 1101 53230 103387
A
的前十行和前十列:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0.0000000 0.7291667 0.9019608 0.80 0.5869565 0.6583851 0.7038217 0.575 0.8333333 0.6827586
[2,] 0.9380952 0.0000000 0.8235294 0.75 0.9710145 0.8757764 0.9713376 0.950 0.7777778 0.9379310
[3,] 0.9761905 0.8125000 0.0000000 0.50 0.9927536 0.9440994 0.9649682 0.850 0.8888889 0.9517241
[4,] 0.9809524 0.8958333 0.8039216 0.00 0.9927536 0.9627329 0.9808917 0.900 0.9444444 0.9862069
[5,] 0.7285714 0.9166667 0.9803922 0.95 0.0000000 0.8509317 0.8057325 0.775 0.8888889 0.8344828
[6,] 0.7380952 0.5833333 0.8235294 0.70 0.8260870 0.0000000 0.7324841 0.725 0.5000000 0.6275862
[7,] 0.5571429 0.8125000 0.7843137 0.70 0.5579710 0.4782609 0.0000000 0.475 0.5000000 0.4965517
[8,] 0.9190476 0.9583333 0.8823529 0.80 0.9347826 0.9316770 0.9331210 0.000 1.0000000 0.9310345
[9,] 0.9857143 0.9166667 0.9607843 0.95 0.9855072 0.9440994 0.9713376 1.000 0.0000000 0.9241379
[10,] 0.7809524 0.8125000 0.8627451 0.90 0.8260870 0.6645963 0.7675159 0.750 0.3888889 0.0000000
cosine_sim
和ug_sim
与A相似。
avg_usim
的前十行和列:
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 0.00000000 0.09284909 0.08681234 0.00000000 0 0 0.063968603 0.06623507 0.05759746 0.07562590
[2,] 0.00000000 0.00000000 0.00000000 0.00000000 0 0 0.000000000 0.00000000 0.00000000 0.00000000
[3,] 0.00000000 0.00000000 0.00000000 0.00000000 0 0 0.000000000 0.00000000 0.00000000 0.00000000
[4,] 0.00000000 0.00000000 0.00000000 0.00000000 0 0 0.000000000 0.00000000 0.00000000 0.00000000
[5,] 0.03792794 0.00000000 0.00000000 0.00000000 0 0 0.000000000 0.00000000 0.00000000 0.00000000
[6,] 0.00000000 0.00000000 0.00000000 0.00000000 0 0 0.034380897 0.04893480 0.03765768 0.00000000
[7,] 0.00000000 0.00000000 0.00000000 0.06751437 0 0 0.046987708 0.06087732 0.04545343 0.04983857
[8,] 0.00000000 0.00000000 0.00000000 0.00000000 0 0 0.000000000 0.00000000 0.00000000 0.00000000
[9,] 0.00000000 0.00000000 0.00000000 0.00000000 0 0 0.005074253 0.00000000 0.00000000 0.00000000
[10,] 0.03665521 0.00000000 0.00000000 0.05438324 0 0 0.041738325 0.00000000 0.00000000 0.00000000
以及avg_unov
与avg_usim
类似。
答案 0 :(得分:0)
解决方案是对循环进行矢量化处理,并选择子集并分配所需的值。需要使用purrr库中的map2
将选定的行映射到矩阵。
weight = matrix(0,nrow= nrow(mergNet), ncol=1)
temp1 = mergNet[ ,1]
temp2 = mergNet[ ,3]
mid = mergNet[ ,2]
library(purrr)
#find rows where cond1 is true
cond1<-which((temp1<944) & (temp2<944))
#map the temp1 and temp2 to matrix
weight[cond1, 1]<-unlist(map2(temp1[cond1], temp2[cond1], function(i, j){(cosine_sim[i,j])* A[i,j] * ug_sim[i,j]}))
#find rows where cond2 is true
cond2<-which((temp1>943) & (temp2>943))
weight[cond2, 1]<-unlist(map2(mid[cond2], temp2[cond2], function(i, j){avg_usim[i , j-users] * (avg_unov[i, j-users])}))
没有任何数据,很难测试该解决方案。