首先,我想道歉。我自己学习R,所以我无法简化问题,因此决定在此处写一个简短的实际变量版本。 我正在尝试在R中实现“最大似然”分类器的一种变体。因此,我对用向量和列表(每个位置都引用一个类)编写的每个类都有一些变量,并且我想将函数应用于包含我要分类的数据的矩阵。问题是我需要该函数的结果按类分开。到目前为止,我正在这样做:
cc<-vector(length=2)
mm<-list(length=2)
ii<-list(length=2)
temp1<-matrix(nrow=16,ncol=6)
temp1<-as.data.frame(temp1)
temp1[]<-c(256,235,194,235,215,173,215,215,194,215,215,215,194,173,152,215,
430,388,388,388,388,430,430,430,388,346,346,388,388,388,346,388,
283,317,283,283,248,283,283,283,214,214,248,283,214,283,214,248,
3701,3450,3576,3826,3534,3450,3868,4035,3450,3493,3450,3701,3534,3242,3032,3116,
1646,1589,1589,1646,1646,1589,1646,1732,1560,1475,1589,1589,1675,1532,1503,1418,
474,556,556,515,556,556,597,637,556,515,515,515,515,515,434,434)
temp2<- matrix(nrow=11,ncol=6)
temp2<-as.data.frame(temp2)
temp2[]<-c(422,463,462,483,546,525,483,566,546,483,546,
770,812,770,812,854,854,812,939,939,854,981,
1038,1175,1004,1141,1209,1209,1038,1311,1311,1175,1311,
2359,2359,2275,2359,2359,2359,2359,2401,2359,2401,2401,
2445,2531,2417,2588,2759,2617,2388,2674,2730,2645,2731,
1413,1413,1373,1495,1618,1535,1413,1535,1659,1535,1618)
cc[1]<-det(cov(temp1))
cc[2]<-det(cov(temp2))
mm[[1]]<-as.numeric(sapply(temp1,"mean"))
mm[[2]]<-as.numeric(sapply(temp2,"mean"))
ii[[1]]<-solve(cov(temp1))
ii[[2]]<-solve(cov(temp2))
data<-matrix(nrow=10,ncol=6)
data<-as.data.frame(data)
data[]<-c(181,203,224,203,203,224,181,181,161,161,
338,338,338,338,296,296,338,381,338,296,
208,242,208,208,208,208,208,242,208,173,
3164,2954,2660,2787,2744,2787,2534,3457,2870,2912,
1476,1505,1391,1332,1304,1391,1132,1591,1448,1304,
474,474,474,515,392,432,432,556,515,474)
for (k in 1:2){
Pxi<-apply(data,1,function(x)1/(2*pi^(6/2)*cc[k]^(1/2))*exp(-1/2*t(as.numeric(x-mm[[k]]))%*%ii[[k]]%*%(as.numeric(x-mm[[k]]))))
if (k==1) {rule<-Pxi} else {rule<-cbind(rule,Pxi)}
}
所以我明白了
rule
rule Pxi
[1,] 4.316396e-13 0.000000e+00
[2,] 6.835553e-15 7.970888e-284
[3,] 8.674921e-21 2.687251e-145
[4,] 5.923777e-19 8.020048e-189
[5,] 5.627127e-16 8.064007e-184
[6,] 2.495667e-17 5.738550e-209
[7,] 6.311390e-22 8.913098e-97
[8,] 1.413893e-12 0.000000e+00
[9,] 5.521715e-15 1.619401e-221
[10,] 5.212091e-17 5.810407e-254
好吧,正如您可以想象的那样,数据实际上比我的示例大得多,并且当 k 太大时,最后一个循环要花费很长时间。关于如何使其更快的任何建议?
答案 0 :(得分:1)
在循环中使用imshow()
非常昂贵。相反,您应该将中间循环结果分配给一个列表,然后将imshow()
分配到末尾:
关于cbind()
语句为什么变慢的原因,循环遍历do.call(cbind, rule)
的每一行都需要执行很多操作。相反,最好尝试一次全部执行矩阵运算(或函数)。
这使用apply()
函数来简化data
调用中的内容。事实证明,该函数使用与@ chinsoon12相同的精确方法。
mahalanobis()
我将首先制作临时数据帧exp()
,然后使用1 / (2*pi^(6/2)*det(cov(temp1))^(1/2))*exp(-1 / 2 * mahalanobis(data, colMeans(temp1), cov(temp1)))
mahalanobis
#function (x, center, cov, inverted = FALSE, ...)
#{
# x <- if (is.vector(x))
# matrix(x, ncol = length(x))
# else as.matrix(x)
# if (!isFALSE(center))
# x <- sweep(x, 2L, center)
# if (!inverted)
# cov <- solve(cov, ...)
# setNames(rowSums(x %*% cov * x), rownames(x))
#}
#<bytecode: 0x000000000c217d80>
#<environment: namespace:stats>
遍历它们:
list()
参考:http://sar.kangwon.ac.kr/etc/rs_note/rsnote/cp11/cp11-7.htm
答案 1 :(得分:1)
如果在矩阵中工作,应该会更快。这是替换for
循环的建议
data <- as.matrix(data)
const <- 2*pi^(6/2)
do.call(cbind, lapply(1L:2L, function(k) {
m <- sweep(data, 2L, mm[[k]])
#1/(const*cc[k]^(1/2))* exp(-1/2 * diag(m %*% ii[[k]] %*% t(m)))
1/(const*cc[k]^(1/2))* exp(-1/2 * rowSums((m %*% ii[[k]]) * m))
}))
使用rowSums
(而不是原始的diag(m %*% ii[[k]] %*% t(m))
来自compute only diagonals of matrix multiplication in R
输出:
[,1] [,2]
[1,] 4.316396e-13 0.000000e+00
[2,] 6.835553e-15 7.970888e-284
[3,] 8.674921e-21 2.687251e-145
[4,] 5.923777e-19 8.020048e-189
[5,] 5.627127e-16 8.064007e-184
[6,] 2.495667e-17 5.738550e-209
[7,] 6.311390e-22 8.913098e-97
[8,] 1.413893e-12 0.000000e+00
[9,] 5.521715e-15 1.619401e-221
[10,] 5.212091e-17 5.810407e-254