我有一个名为pca
的Principal组件。我想找到使用所有组件(银河系空间的中心)的质心,并找到每个样本Sample
距离该中心的距离。我怎么能在R?中做到这一点?
pca<-structure(list(Sample = c("1", "2", "4", "5", "6"), PCA.1 = c(0.00338,
-0.020373, -0.019842, -0.019161, -0.019594), PCA.2 = c(0.00047,
-0.010116, -0.011532, -0.011582, -0.013245), PCA.3 = c(-0.008787,
0.001412, 0.003751, 0.00371, 0.004242), PCA.4 = c(0.011242, 0.000882,
-0.003662, -0.002206, -0.002449), PCA.5 = c(0.055873, -0.022664,
-0.014058, -0.024757, -0.020033), PCA.6 = c(-0.001511, 0.006226,
-0.005417, 0.000522, -0.003114), PCA.7 = c(-0.056734, -0.007418,
-0.01043, -0.006961, -0.006006), PCA.8 = c(0.005189, 0.008031,
-0.002979, 0.000743, 0.006276), PCA.9 = c(0.008169, -0.000265,
0.010893, 0.003233, 0.007316)), .Names = c("Sample", "PCA.1",
"PCA.2", "PCA.3", "PCA.4", "PCA.5", "PCA.6", "PCA.7", "PCA.8",
"PCA.9"), row.names = c(NA, 5L), class = "data.frame")
答案 0 :(得分:2)
假设您正在寻找Euclidean distance,您可以找到每个变量的平均值,并且您有一个质心。使用简单的数学,任何点和质心之间的距离是n维平方差之和的平方根(我希望我做对了,请参阅我上面提供的链接中的公式)。
centroid <- sapply(pca[, -1], mean)
pt <- pca[, -1]
sqrt(apply((pt - centroid)^2, MARGIN = 1, sum))
1 2 3 4 5
0.08777085 0.03572868 0.04321890 0.04162779 0.02065304