我有一个63,000个点和120个维度的大型数据集,在规范化和删除NA列之后我使用了PCA,这告诉我前7个PC占数据方差的98%以上。
就像这个post一样,我收到了这些数据。但我不知道该怎么做。我链接的帖子并没有真正解释,而是提出了另一种方法。如何使用PCA结果来精确降低数据集的维度?我的目标是在减少列之后使用剩余部分在R中设计一个解释因变量的线性模型lm()
。
Importance of components:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
Standard deviation 0.1021 0.04005 0.03464 0.03114 0.02414 0.02047 0.01708 0.01425 0.01308 0.003287
Proportion of Variance 0.6567 0.10101 0.07555 0.06104 0.03668 0.02639 0.01838 0.01278 0.01078 0.000680
Cumulative Proportion 0.6567 0.75773 0.83328 0.89432 0.93100 0.95738 0.97576 0.98854 0.99932 1.000000
Rotation:
PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 PC9 PC10
[1,] -0.219033940 0.009323363 0.14371969 0.06987706 0.19302513 -0.02648874 0.16654618 -0.06567080 -0.925393447 0.005948459
[2,] -0.007661133 -0.027804546 -0.24045564 0.13997803 0.00461297 -0.13195868 0.13625008 0.05140013 -0.005668700 -0.939724900
[3,] -0.053184446 -0.212036806 -0.26744318 0.36220366 -0.53094911 0.24356319 -0.04692857 -0.62944042 -0.084900337 0.051564259
[4,] -0.188804651 0.062154139 -0.08807850 0.18886008 0.19969440 -0.59987987 -0.68882923 -0.20548388 -0.004509710 0.024501524
[5,] -0.299789863 0.080676352 -0.62720621 -0.23335343 0.37274825 0.50767975 -0.23796461 0.03549668 -0.025233090 0.023917725
[6,] -0.013478134 -0.052386807 -0.58015768 0.34394876 -0.01276741 -0.38994226 0.42009710 0.31887185 0.002157408 0.334375266
[7,] -0.380565266 0.227200067 0.23992808 0.40306010 0.46135693 0.09059073 0.35930614 -0.34019038 0.342613874 0.015991214
[8,] -0.432463682 0.037822199 0.20765408 0.45337044 -0.30497494 0.26299209 -0.26947304 0.57196490 0.008807625 -0.029461460
[9,] -0.654931547 0.158646794 -0.01629962 -0.51083458 -0.39357245 -0.27198634 0.20326283 -0.08572653 0.083798804 -0.010738521
[10,] -0.250287731 -0.928894500 0.10639604 -0.08339656 0.20266163 -0.03955488 0.02948133