如何在太多循环中加快应用功能

时间:2019-11-03 08:49:20

标签: r loops apply

首先,我想道歉。我自己学习R,所以我无法简化问题,因此决定在此处写一个简短的实际变量版本。 我正在尝试在R中实现“最大似然”分类器的一种变体。因此,我对用向量和列表(每个位置都引用一个类)编写的每个类都有一些变量,并且我想将函数应用于包含我要分类的数据的矩阵。问题是我需要该函数的结果按类分开。到目前为止,我正在这样做:

cc<-vector(length=2)

mm<-list(length=2)

ii<-list(length=2)

temp1<-matrix(nrow=16,ncol=6)
temp1<-as.data.frame(temp1)
temp1[]<-c(256,235,194,235,215,173,215,215,194,215,215,215,194,173,152,215,
           430,388,388,388,388,430,430,430,388,346,346,388,388,388,346,388,
           283,317,283,283,248,283,283,283,214,214,248,283,214,283,214,248,
           3701,3450,3576,3826,3534,3450,3868,4035,3450,3493,3450,3701,3534,3242,3032,3116,
           1646,1589,1589,1646,1646,1589,1646,1732,1560,1475,1589,1589,1675,1532,1503,1418,
           474,556,556,515,556,556,597,637,556,515,515,515,515,515,434,434)


temp2<- matrix(nrow=11,ncol=6)
temp2<-as.data.frame(temp2)
temp2[]<-c(422,463,462,483,546,525,483,566,546,483,546,
           770,812,770,812,854,854,812,939,939,854,981,
           1038,1175,1004,1141,1209,1209,1038,1311,1311,1175,1311,
           2359,2359,2275,2359,2359,2359,2359,2401,2359,2401,2401,
           2445,2531,2417,2588,2759,2617,2388,2674,2730,2645,2731,
           1413,1413,1373,1495,1618,1535,1413,1535,1659,1535,1618)


cc[1]<-det(cov(temp1))
cc[2]<-det(cov(temp2))

mm[[1]]<-as.numeric(sapply(temp1,"mean"))
mm[[2]]<-as.numeric(sapply(temp2,"mean"))



ii[[1]]<-solve(cov(temp1))
ii[[2]]<-solve(cov(temp2))




data<-matrix(nrow=10,ncol=6)
data<-as.data.frame(data)
data[]<-c(181,203,224,203,203,224,181,181,161,161,
          338,338,338,338,296,296,338,381,338,296,
          208,242,208,208,208,208,208,242,208,173,
          3164,2954,2660,2787,2744,2787,2534,3457,2870,2912,
          1476,1505,1391,1332,1304,1391,1132,1591,1448,1304,
          474,474,474,515,392,432,432,556,515,474)



for (k in 1:2){
  Pxi<-apply(data,1,function(x)1/(2*pi^(6/2)*cc[k]^(1/2))*exp(-1/2*t(as.numeric(x-mm[[k]]))%*%ii[[k]]%*%(as.numeric(x-mm[[k]]))))

  if (k==1) {rule<-Pxi} else {rule<-cbind(rule,Pxi)}  
}


所以我明白了

rule
              rule           Pxi
 [1,] 4.316396e-13  0.000000e+00
 [2,] 6.835553e-15 7.970888e-284
 [3,] 8.674921e-21 2.687251e-145
 [4,] 5.923777e-19 8.020048e-189
 [5,] 5.627127e-16 8.064007e-184
 [6,] 2.495667e-17 5.738550e-209
 [7,] 6.311390e-22  8.913098e-97
 [8,] 1.413893e-12  0.000000e+00
 [9,] 5.521715e-15 1.619401e-221
[10,] 5.212091e-17 5.810407e-254

好吧,正如您可以想象的那样,数据实际上比我的示例大得多,并且当 k 太大时,最后一个循环要花费很长时间。关于如何使其更快的任何建议?

2 个答案:

答案 0 :(得分:1)

在循环中使用imshow()非常昂贵。相反,您应该将中间循环结果分配给一个列表,然后将imshow()分配到末尾:

关于cbind()语句为什么变慢的原因,循环遍历do.call(cbind, rule)的每一行都需要执行很多操作。相反,最好尝试一次全部执行矩阵运算(或函数)。

这使用apply()函数来简化data调用中的内容。事实证明,该函数使用与@ chinsoon12相同的精确方法。

mahalanobis()

我将首先制作临时数据帧exp(),然后使用1 / (2*pi^(6/2)*det(cov(temp1))^(1/2))*exp(-1 / 2 * mahalanobis(data, colMeans(temp1), cov(temp1))) mahalanobis #function (x, center, cov, inverted = FALSE, ...) #{ # x <- if (is.vector(x)) # matrix(x, ncol = length(x)) # else as.matrix(x) # if (!isFALSE(center)) # x <- sweep(x, 2L, center) # if (!inverted) # cov <- solve(cov, ...) # setNames(rowSums(x %*% cov * x), rownames(x)) #} #<bytecode: 0x000000000c217d80> #<environment: namespace:stats> 遍历它们:

list()

参考:http://sar.kangwon.ac.kr/etc/rs_note/rsnote/cp11/cp11-7.htm

答案 1 :(得分:1)

如果在矩阵中工作,应该会更快。这是替换for循环的建议

data <- as.matrix(data)
const <- 2*pi^(6/2)
do.call(cbind, lapply(1L:2L, function(k) {
    m <- sweep(data, 2L, mm[[k]])
    #1/(const*cc[k]^(1/2))* exp(-1/2 * diag(m %*% ii[[k]] %*% t(m)))
    1/(const*cc[k]^(1/2))* exp(-1/2 * rowSums((m %*% ii[[k]]) * m))
}))

使用rowSums(而不是原始的diag(m %*% ii[[k]] %*% t(m))来自compute only diagonals of matrix multiplication in R

输出:

              [,1]          [,2]
 [1,] 4.316396e-13  0.000000e+00
 [2,] 6.835553e-15 7.970888e-284
 [3,] 8.674921e-21 2.687251e-145
 [4,] 5.923777e-19 8.020048e-189
 [5,] 5.627127e-16 8.064007e-184
 [6,] 2.495667e-17 5.738550e-209
 [7,] 6.311390e-22  8.913098e-97
 [8,] 1.413893e-12  0.000000e+00
 [9,] 5.521715e-15 1.619401e-221
[10,] 5.212091e-17 5.810407e-254