我有一个如下所示的数据集:
India China Brasil Russia SAfrica Kenya States Indonesia States Argentina Chile Netherlands HongKong
0.0854026763 0.1389383234 0.1244184371 0.0525460881 0.2945586244 0.0404562539 0.0491597968 0 0 0.0618342901 0.0174891774 0.0634064181 0
0.0519483159 0.0573851759 0.0756806292 0.0207164181 0.0409872092 0.0706355932 0.0664503936 0.0775285039 0.008545575 0.0365674701 0.026595575 0.064280902 0.0338135148
0 0 0 0 0 0 0 0 0 0 0 0 0
0.0943708876 0 0 0.0967733329 0 0.0745076688 0 0 0 0.0427047276 0 0.0583873189 0
0.0149521013 0.0067569437 0.0108914448 0.0229991162 0.0151678343 0.0413174214 0 0.0240999375 0 0.0608951432 0.0076549109 0 0.0291972756
0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0.0096710124 0.0095669967 0 0.0678582869 0 0 0.0170707337 0.0096565543 0.0116698364 0.0122773071
0.1002690681 0.0934563916 0.0821680095 0.1349534369 0.1017157777 0.1113249348 0.1713480649 0.0538715423 0.4731833978 0.1956743964 0.6865919069 0.2869189344 0.5364034876
1.5458338337 0.2675380321 0.6229046372 0.5059107039 0.934209603 0.4933799388 0.4259769181 0.3534169521 14.4134845836 4.8817632117 13.4034293299 3.7849346739 12.138551171
0.4625375671 0.320258205 0.4216459567 0.4992764309 0.4115887595 0.4783677078 0.4982410179 0.2790259278 0.3804405781 0.2594924212 0.4542162376 0.3012339384 0.3450847892
0.357614592 0.3932670219 0.3803417257 0.4615355254 0.3807061655 0.4122433346 0.4422282977 0.3053712842 0.297943232 0.2658160167 0.3244018409 0.2523836582 0.3106600754
0.359953567 0.3958391813 0.3828293473 0.4631507073 0.3831961707 0.4138590365 0.4451206879 0.3073685624 0.2046559772 0.2403036541 0.2326305393 0.2269373716 0.2342962436
0.7887404662 0.6545878236 0.7443676393 0.7681244767 0.5938002158 0.5052305973 0.4354571648 0.40511005 0.8372481106 0.5971130339 0.8025313223 0.5708610817 0.8556609579
0.5574207497 1.2175251783 0.8797484259 0.952685465 0.4476585005 1.1919229479 1.03612509 0.5490564488 0.2407034171 0.5675492645 0.4994121344 0.5460544861 0.3779468604
0.5632651223 1.0181714714 1.1253803155 1.228293512 0.6949993291 1.0346288085 0.5955221073 0.5212567091 1.1674901423 1.2442735568 1.207624867 1.3854352274 0.7557131826
0.6914760031 0.7831502333 1.0282730148 0.750270567 0.7072739935 0.8041764647 0.8918512571 0.6998554585 2.3448306081 1.2905783367 2.4295927684 1.3029766224 1.9310763864
0.3459898177 0.7474525109 0.7253451876 0.7182493014 0.3081791886 0.7462088907 0.5950509439 0.4443221541 3.6106852374 2.7647504885 3.3698608994 2.6523062395 1.8016571476
0.4629523517 0.6549211677 0.6158018856 0.7637088814 0.4951554309 0.6277236471 0.6227669055 0.383909839 2.9502307101 1.803480973 2.3083113522 1.668759497 1.7130459012
0.301548861 0.5961888126 0.4027007075 0.5540290853 0.4078662541 0.5108773106 0.4610682726 0.3712800134 0.3813402422 0.7391417247 1.0935364978 0.691857974 0.4416304953
2.5038287529 3.2005148394 2.9181517373 3.557918333 1.8868234768 2.9369926312 0.4117894127 0.3074815035 3.9187777037 7.3161555954 6.9586996112 5.7096144353 2.7007439732
2.5079707359 3.2058093222 2.9229791182 3.563804054 1.8899447728 2.9418511798 0.4124706194 0.269491388 3.9252603798 7.3282584169 6.9702111077 5.7190596205 2.7052117051
2.6643724791 1.2405320493 2.0584120188 2.2354369334 1.7199730388 2.039829709 1.7428132997 0.9977029725 8.9650886611 4.6035139163 8.1430131464 5.2450639988 6.963309864
0.5270581435 0.8222128903 0.7713479951 0.8785815313 0.624993821 0.7410405193 0.5350834321 0.4797121891 1.3753525725 1.2219267886 1.397221881 1.2433155977 0.8647136903
0.2536079475 0.5195514789 0.0492623195 0.416102668 0.2572670724 0.4805482899 0.4866090738 0.4905212099 0.2002506403 0.5508609827 0.3808572148 0.6276294938 0.3191452919
0.3499009885 0.5837491529 0.4914807442 0.5851537888 0.3638549977 0.537655052 0.5757185943 0.4730102035 0.9098072064 0.6197285737 0.7781825654 0.6424684366 0.6424429128
0.6093076876 0.9456457011 0.8518013605 1.1360347777 0.511960743 0.9038104168 0.5048413575 0.2777622235 0.2915840525 0.6628516415 0.4600364351 0.7996524113 0.3765721177
0.9119207879 1.2363073271 1.3285269752 1.4027039939 0.9250782309 2.1599381031 1.312307839 0 0 0.8253250513 0 0 0.8903632354
它存储在data.txt
文件中。
我想要一个如下所示的PCA多重图:
我在做什么:
d <- read.table("data.txt", header=TRUE, as.is=TRUE)
model <- prcomp(d, scale=TRUE)
在此之后我迷路了。
如何根据PCA投影聚类数据集并获得与上述类似的图片?
答案 0 :(得分:12)
您实际上是在问两个不同的问题:
然而,在进入那些之前我想补充一点,如果您的样本在列中,那么您没有正确地进行PCA。您应该在转置数据集上执行此操作,如下所示:
model <- prcomp(t(d), scale=TRUE)
但要实现这一点,您必须删除数据中的所有常量行。
现在我假设您按照自己想要的方式完成了PCA步骤。
当你指定retX = TRUE时, prcomp返回旋转的矩阵(默认情况下它是真的)。因此,您需要使用model$x
。
您的下一步是根据主要组件对数据进行聚类。这可以通过各种方式完成。一个是层次聚类。如果你想要最后5组,这是一种方式:
fit <- hclust(dist(model$x[,1:3]), method="complete") # 1:3 -> based on 3 components
groups <- cutree(fit, k=5) # k=5 -> 5 groups
此步骤将为您提供稍后用于着色的组。
最后一步是密谋。在这里,我写了一个简单的函数,一次性完成所有操作:
library(rgl)
plotPCA <- function(x, nGroup) {
n <- ncol(x)
if(!(n %in% c(2,3))) { # check if 2d or 3d
stop("x must have either 2 or 3 columns")
}
fit <- hclust(dist(x), method="complete") # cluster
groups <- cutree(fit, k=nGroup)
if(n == 3) { # 3d plot
plot3d(x, col=groups, type="s", size=1, axes=F)
axes3d(edges=c("x--", "y--", "z"), lwd=3, axes.len=2, labels=FALSE)
grid3d("x")
grid3d("y")
grid3d("z")
} else { # 2d plot
maxes <- apply(abs(x), 2, max)
rangeX <- c(-maxes[1], maxes[1])
rangeY <- c(-maxes[2], maxes[2])
plot(x, col=groups, pch=19, xlab=colnames(x)[1], ylab=colnames(x)[2], xlim=rangeX, ylim=rangeY)
lines(c(0,0), rangeX*2)
lines(rangeY*2, c(0,0))
}
}
这个函数很简单:它有两个参数:1)得分矩阵,主要成分在列中,样本在行中。如果你想要(例如)第一,第二和第四个组件,你基本上可以使用模型$ x [,c(1,2,4)]。 2)用于聚类的组数。
然后根据传递的主成分和图(2D或3D,取决于传递的列数)对数据进行聚类
以下是一些例子:
plotPCA(model$x[,1:2], 5)
3D示例(基于3个第一主成分):
plotPCA(model$x[,1:3], 5)
最后一个图将是交互式的,因此您可以将其旋转或放大/缩小。
希望这有帮助。