列排列的矩阵相关

时间:2017-01-15 12:36:50

标签: r

从矩阵(nxm)开始,我想创建一个新的Matrix mxm,其中包含起始矩阵的列的排列之间的相关性2.因此,如果我的输入是Matrix 3x3,我想计算列12,13,23的相关性,并将结果分配给目标矩阵。我实际上使用了两个嵌套for循环(~O(n^2)

 for (i in 1:n) {
   for (j in i+1:n) {
     if (j <= n) {
         tmp = cor(inMatrix[, i], inMatrix[, j])
         dstMatrix[i,j] = tmp;
     }
   }
 }

这似乎有效,我想知道是否存在更好的方法来实现它。

2 个答案:

答案 0 :(得分:3)

简单的cor(inMatrix)做到了(整个矩阵直接传递给cor()):

n <- 7
m <- 5
set.seed(123)
inMatrix <- replicate(m, sample(c(1, - 1), 1) * cumsum(runif(n)))
inMatrix
#           [,1]       [,2]       [,3]       [,4]       [,5]
# [1,] 0.7883051 -0.4566147 0.04205953 -0.7085305 -0.7954674
# [2,] 1.1972821 -1.4134481 0.36998025 -1.2525965 -0.8200811
# [3,] 2.0802995 -1.8667822 1.32448390 -1.8467385 -1.2978771
# [4,] 3.0207667 -2.5443529 2.21402322 -2.1358983 -2.0563366
# [5,] 3.0663232 -3.1169863 2.90682662 -2.2830119 -2.2727445
# [6,] 3.5944287 -3.2199110 3.54733344 -3.2460361 -2.5909256
# [7,] 4.4868478 -4.1197359 4.54160321 -4.1483352 -2.8225513

dstMatrix <- matrix(nrow = m, ncol = m)
for (i in 1:(m - 1)) {
  for (j in (i+1):m) {
    if (j <= n) {
      tmp = cor(inMatrix[, i], inMatrix[, j])
      dstMatrix[i,j] = tmp;
    }
  }
}
dstMatrix
#      [,1]       [,2]       [,3]       [,4]       [,5]
# [1,]   NA -0.9823516  0.9902370 -0.9688212 -0.9825973
# [2,]   NA         NA -0.9811424  0.9570599  0.9626469
# [3,]   NA         NA         NA -0.9742235 -0.9862355
# [4,]   NA         NA         NA         NA  0.9331879
# [5,]   NA         NA         NA         NA         NA

dstMatrix_2 <- cor(inMatrix)
dstMatrix_2
#            [,1]       [,2]       [,3]       [,4]       [,5]
# [1,]  1.0000000 -0.9823516  0.9902370 -0.9688212 -0.9825973
# [2,] -0.9823516  1.0000000 -0.9811424  0.9570599  0.9626469
# [3,]  0.9902370 -0.9811424  1.0000000 -0.9742235 -0.9862355
# [4,] -0.9688212  0.9570599 -0.9742235  1.0000000  0.9331879
# [5,] -0.9825973  0.9626469 -0.9862355  0.9331879  1.0000000
dstMatrix == dstMatrix_2
#      [,1] [,2] [,3]  [,4]  [,5]
# [1,]   NA TRUE TRUE FALSE  TRUE
# [2,]   NA   NA TRUE FALSE  TRUE
# [3,]   NA   NA   NA FALSE  TRUE
# [4,]   NA   NA   NA    NA FALSE
# [5,]   NA   NA   NA    NA    NA

# The difference lies in machine precision magnitude, not sure what caused it:
dstMatrix - dstMatrix_2
#      [,1] [,2] [,3]          [,4]         [,5]
# [1,]   NA    0    0 -1.110223e-16 0.000000e+00
# [2,]   NA   NA    0  2.220446e-16 0.000000e+00
# [3,]   NA   NA   NA -1.110223e-16 0.000000e+00
# [4,]   NA   NA   NA            NA 1.110223e-16
# [5,]   NA   NA   NA            NA           NA

答案 1 :(得分:1)

计算列组合的相关系数。 combn函数用于获取列号对

根据@Sotos,函数可以直接传递给combn,因此可以避免使用apply()

cor_vals <- combn(1:col_n, 2, function(x) cor(mat1[, x[1]], mat1[, x[2]]))
# cor_vals <- apply(combn(1:col_n, 2), 2, function(x) cor(mat1[, x[1]], mat1[, x[2]]))

为相关值指定名称

cor_vals <- setNames(cor_vals, combn(1:col_n, 2, paste0, collapse = ''))
cor_vals
#        12         13         23 
# 0.1621491 -0.8211970  0.4299367 

数据:

set.seed(1L)
row_n <- 3
col_n <- 3
mat1 <- matrix(runif(row_n * col_n, min = 0, max = 20), nrow = row_n, ncol = col_n)