Pinoinverse在R,C和Python

时间:2018-01-04 15:01:27

标签: python c r numpy

我正在将算法从R转换为C,我需要获得矩阵的伪逆,但是我在C中获得的结果与我在R中获得的结果有一些差异。这些差异改变了算法的行为。

我用来获取C中的伪逆的代码是this

我做了一些阅读,并且有不同的方法来获得伪逆,C中使用的方法是Moore-Penrose。 R中使用的函数来自库corpcor。两者都使用“奇异值分解”。

这是我想从中得到伪逆的矩阵

1                  0.920980394593472  0.996160973582776   0.996772980609752   0.997372221594439 0.999972797627027
0.920980394593472  1                  0.885601439824631   0.88878682654952    0.892173764646865 0.923738536637407
0.996160973582776  0.885601439824631  1                   0.999973383442349   0.999885329646229 0.99549326808266
0.996772980609752  0.88878682654952   0.999973383442349   1                   0.999969202115456 0.996158288591094
0.997372221594439  0.892173764646865  0.999885329646229   0.999969202115456   1                 0.996814694067663
0.999972797627027  0.923738536637407  0.99549326808266    0.996158288591094   0.996814694067663 1

我从R中的函数pseudoinverse()得到的结果是:

1398676681.0709   79599.9582612864  -9585774352.21759 28302547195.6681  -19807136596.5434 -305910496.668656
79591.4731051894  3401.1232804516   52529359.4133139  -126479191.665267 76425077.4778451  -2563699.8428373
-9585920775.52777 52529288.3510008  1003916837759.99  -2454016116733.34 1501977763514.61  -42460326831.3218
28302900052.1238  -126478989.043282 -2454015575342.32 6017016899314.95  -3692050079960.62 101159202486.608
-19807349974.7679 76424938.7106429  1501977155911.81  -3692049404688.94 2270196092100.53  -60571139669.4392
-305903527.744471 -2563701.10409161 -42460406960.0488 101159421351.019  -60571285357.0572 2184863920.31107

我在C中得到的结果是:

1398795243.74255  79184.33844201    -9594022229.12525 28322858223.2099  -19819644215.1338 -305583186.690388
79166.91917247    3402.48426033     52556628.829717   -126546466.939768 76466567.769084   -2564764.38775363
-9594334089.78616 52556515.9039231  1004461808180.58  -2455360323666.24 1502806633291.96  -42481639977.8112
28323609294.95    -126546129.049526 -2455359143404.21 6020330778543.35  -3694093433789.59  101211765648.895
-19820098170.0141 76466329.4304944  1502805309171.23  -3694091962863.6   2271455511686.72  -60603547743.7687
-305568392.855205 -2564768.40243798 -42481807759.1065 101212225714.588   -60603854784.616  2185698311.36118

两者之间的差异是:(R-C)

-118562.671649933 415.6198192764    8247876.90765953  -20311027.5418015 12507618.5904007 -327309.978267968
424.5539327194    -1.3609798784     -27269.4164030999 67275.2745009959  -41490.291238904  1064.5449163299
8413314.25839043  -27227.552922301  -544970420.589966 1344206932.90039  -828869777.349854 21313146.4894028
-20709242.8262024 67140.0062440038  1343568061.89014  -3313879228.39941 2043353828.96973  -52563162.2870026
12748195.2462006  -41390.7198514938 -828153259.419922 2042558174.66016  -1259419586.19043 32408074.3294983
-335134.889266014 1067.29834637     21400799.0577011  -52804363.5690002 32569427.5587997  -834391.050109863

为了检查我在C中使用的算法是否存在问题,我在python中使用numpy.linalg.pinv()使用“奇异值分解”得到了伪逆。结果与C和R不同。

1398224882.37767  81521.32618159    -9548319116.82994 28210636794.0452  -19750702778.4149 -307443670.558374
81576.67749763    3392.80756354     52367028.3401356  -126080750.377468 76180379.3995419  -2557069.77374461
-9547349936.09641 52367486.8455529  1000758728845.37  -2446264734953.02 1497217439225.67  -42331313003.6236
28208301799.8629  -126082060.163116 -2446268326785.52 5998001838415.43  -3680372478514.1  100842703532.378
-19749291055.22   76181277.4796568  1497221470187.79  -3680376958173.79 2263027785174.03  -60376849475.2803
-307489737.200422 -2557061.32729561 -42330783514.2789 100841257137.344  -60375886615.3659 2179570267.21681
  • 如果使用的方法和数据与导致结果的方法和数据相同 与众不同?。
  • 哪一个结果最准确?

编辑我犯了一个错误,我没有把矩阵包含所有数字来重新创建结果,我用正确的矩阵更新了问题。

1 个答案:

答案 0 :(得分:2)

A generalized inverse A g 应该符合

A g A A g = A

A A g A = A

A A g T = A g A

A g A T = A A

对于给定的矩阵,corpcor::pseudoinverse的结果不满足这些属性,而MASS::ginv的结果是:

check_pinv <- function(mat, fun, ...) {
    pinv <- fun(mat, ...)
    isTRUE(all.equal(mat %*% pinv %*% mat, mat)) &&
        isTRUE(all.equal(pinv %*% mat %*% pinv, pinv)) &&
        isTRUE(all.equal(pinv %*% mat, t(mat %*% pinv))) &&
        isTRUE(all.equal(mat %*% pinv, t(pinv %*% mat)))
}

mat <- matrix(c(                                                       
   1,                  0.920980394593472,  0.996160973582776,   0.996772980609752,   0.997372221594439, 0.999972797627027,
   0.920980394593472,  1,                  0.885601439824631,   0.88878682654952,    0.892173764646865, 0.923738536637407,
   0.996160973582776,  0.885601439824631,  1,                   0.999973383442349,   0.999885329646229, 0.99549326808266,
   0.996772980609752,  0.88878682654952,   0.999973383442349,   1,                   0.999969202115456, 0.996158288591094,
   0.997372221594439,  0.892173764646865,  0.999885329646229,   0.999969202115456,   1,                 0.996814694067663,
   0.999972797627027,  0.923738536637407,  0.99549326808266,    0.996158288591094,   0.996814694067663, 1), nrow = 6, ncol = 6)

check_pinv(mat, corpcor::pseudoinverse)
#> [1] FALSE
check_pinv(mat, MASS::ginv)
#> [1] TRUE

这两个函数之间的一个重要区别是默认容差级别,用于确定是否应将奇异值视为零。如果对MASS::ginv也使用sqrt(.Machine$double.eps) corpcor::pseudoinverse)中使用的值,则会实现伪逆属性:

check_pinv(mat, corpcor::pseudoinverse, max(svd(mat)$d) * sqrt(.Machine$double.eps))
#> [1] TRUE

请注意,必须使用max(svd(mat)$d) * sqrt(.Machine$double.eps),因为corpcor::pseudoinverse在绝对意义上解释了容差,而MASS::ginv将容差视为相对于最大奇异值。使用此容差级别,产生的伪逆矩阵是相同的。

all.equal(corpcor::pseudoinverse(mat, max(svd(mat)$d) * sqrt(.Machine$double.eps)), 
          MASS::ginv(mat))
#> [1] TRUE

在python中,numpy.linalg.pinvscipy.linalg.pinv都不满足这些属性:

import numpy
mat = numpy.array([[1,            0.9209803946, 0.9961609736, 0.9967729806, 0.9973722216, 0.9999727976],
                   [0.9209803946, 1,            0.8856014398, 0.8887868265, 0.8921737646, 0.9237385366],
                   [0.9961609736, 0.8856014398, 1,            0.9999733834, 0.9998853296, 0.9954932681],
                   [0.9967729806, 0.8887868265, 0.9999733834, 1,            0.9999692021, 0.9961582886],
                   [0.9973722216, 0.8921737646, 0.9998853296, 0.9999692021, 1,            0.9968146941],
                   [0.9999727976, 0.9237385366, 0.9954932681, 0.9961582886, 0.9968146941, 1]])

pinv1 = numpy.linalg.pinv(mat)
print numpy.allclose(pinv1.dot(mat).dot(pinv1), pinv1)
# False
print numpy.allclose(mat.dot(pinv1).dot(mat), mat)
# True

from scipy import linalg
pinv2 = linalg.pinv(mat)
print numpy.allclose(pinv2.dot(mat).dot(pinv2), pinv2)
# False
print numpy.allclose(mat.dot(pinv2).dot(mat), mat)
# False

print numpy.allclose(pinv1, pinv2)
# True

注意: Matrix使用原始值。结果不受影响,因为只有最小的奇异值才会显示出显着的变化。

同样,如果使用1e-8而不是默认的1e-15作为容差,则会满足这些伪逆属性。对于C版本也是如此,可以从R和RcppGSL一起使用。