应用pca的困难

时间:2014-05-09 10:46:16

标签: r pca

我正在用R.试验pca。我有以下数据:

        V2  V3  V4  V5   V6  V7   V8  V9 V10 V11
2454   0 168 290  45 1715  61  551 245  30  91
222  188  94 105  60 3374 615    7 294   0 169
552    0   0 465   0 3040   0    0 771   0   0
2872   0   0   0   0 3380   0  289   0   0   0
2938   0  56  56   0 2039 538  311 113   0 254
2849   0   0 332   0 2548   0  332   0   0 221
3102   0   0   0   0 2690   0    0   0 807 807
3134   0   0   0   0 2897 289  144 144 144   0
558    0   0   0   0 3453   0    0   0   0   0
2893   0 262 175   0 2452 350 1138 262  87 175
552    0   0 351   0 3114   0    0 678   0   0
2874   0 109  54   0 2565 272 1037 109   0   0
1396   0   0 407   0 1730   0    0 305   0   0
2866   0  71 179   0 2403 358  753  35 107 143
449    0   0   0   0 2825   0    0   0   0   0
2888   0   0 523   0 2615 104  627 209   0   0
2537   0  57   0   0 1854   0    0 463   0   0
2873   0   0 342   0 3196   0  114   0   0 114
720    0   0 365   4 2704   0    4 643   4   0
218  125  31  94 219 2479 722    0 219   0  94

我应用以下代码:

fit <- prcomp(data)
ev <- fit$rotation # pc loadings

为了进行一些测试,当我保留所有可以保留的组件时,我试图查看我检索的数据矩阵:

numberComponentsKept = 10
featureVector = ev[,1:numberComponentsKept]
newData <- as.matrix(data)%*%as.matrix(featureVector)

newData矩阵应该与原始矩阵相同,但相反,我会得到一个非常不同的结果:

             PC1       PC2       PC3          PC4        PC5        PC6        PC7         PC8        PC9      PC10
2454 1424.447  867.5986  514.0592 -155.4783720  -574.7425   85.38724  -86.71887   90.872507   4.305168  92.08284
222  3139.681 1020.4150  376.3165  471.8718398  -796.9549  142.14301 -119.86945   32.919950 -31.269467  32.55846
552  2851.544  539.6075  883.3969  -93.3579153  -908.6689   68.34030  -40.97052  -13.856931  23.133566  89.00851
2872 3111.317 1210.0187  433.0382 -144.4065362  -381.2305  -20.08927  -49.03447    9.569258  44.201571  70.13113
2938 1788.334  945.8162  189.6526  308.7703509  -593.5577  124.88484 -109.67276 -115.127348  14.170615  99.19492
2849 2291.839  978.1819  374.7567 -243.6739292  -496.8707  287.01065 -126.22501  -18.747873  54.080763  62.80605
3102 2530.989  814.7548 -510.5978 -410.6295894 -1015.3228   46.85727  -21.20662   14.696831  23.687923  72.37691
3134 2679.430  970.1323  311.8627  124.2884480  -536.4490  -26.23858   83.86768  -17.808390 -28.802387  92.09583
558  3268.599  988.2515  353.6538  -82.9155988  -342.5729   12.96219  -60.94886   18.537087   7.291126  96.14917
2893 1921.761 1664.0084  631.0800  -55.6321469  -864.9628  -28.11045 -104.78931   37.797727 -12.078535 104.88374
552  2927.108  607.6489  799.9602  -79.5494412  -827.6994   14.14625  -50.12209  -14.020936  29.996639  86.72887
2874 2084.285 1636.7999  621.6383  -49.2934502  -577.4815  -67.27198  -11.06071   -7.167577  47.395309  51.02962
1396 1618.171  337.4320  488.2717 -100.1663625  -469.8857  212.37199   -1.19409   13.531485 -23.332701  64.58806
2866 2007.261 1387.6890  395.1586    0.8640971  -636.1243  133.41074   12.34794  -26.969634   5.506828  74.13767
449  2674.136  808.5174  289.3345  -67.8356695  -280.2689   10.60475  -49.86404   15.165731   5.965083  78.66244
2888 2254.171 1162.4988  749.7230 -206.0215007  -652.2364  302.36320   40.76341   -1.079259  17.635956  57.86999
2537 1747.098  371.8884  429.1309    9.3761544  -480.7130 -196.25019  -81.31580    2.819608  24.089379  56.91885
2873 2973.872  974.3854  433.7282 -197.0601947  -478.3647  301.96576  -81.81105   14.516646  -1.191972 100.79057
720  2537.535  504.4124  744.5909  -78.1162036  -771.1396   38.17725  -36.61446   -9.079443  25.488688  78.21597
218  2292.718  800.5257  260.6641  603.3295960  -641.9296  187.38913   11.71382   70.011487  78.047216  96.10967

我做错了什么?

1 个答案:

答案 0 :(得分:3)

我认为问题是PCA问题而不是R问题。您将原始data与旋转矩阵相乘,您不知道为什么newData!=data。仅当旋转矩阵是单位矩阵时才会出现这种情况。

您可能计划做的事情如下:

# Run PCA:
  fit <- prcomp(USArrests)
  ev <- fit$rotation # pc loadings

# Reversed PCA:
  head(fit$x%*% t(as.matrix(ev)))

# Centered Original data:
  head(t(apply(USArrests,1,'-',colMeans(USArrests))))

在最后一步中,您必须使数据居中,因为函数prcomp默认为中心。