主成分的判别分析以及如何以图形方式显示数据点到其多元质心的距离

时间:2015-09-21 21:34:59

标签: r graphics cluster-analysis pca lda

我一直试图以图形方式生成一个散点图(类似于图1),显示数据点到其多元质心的距离。数据包含列族(响应变量)下的两个分类分组因子(V4或G8)和12个预测变量。数据称为LDA.scores,可以在页面底部找到。在将两个分类因素分成两个独立的数据帧(编码如下图1)之后,我使用了包adegenet来尝试为每个分类因子生成两个类似于图(1)的散点图,以显示数据中的实际簇数组。我知道这个包用于分析遗传标记,但是,我认为这些散点图可以用于任何类型的多变量数据。我试图操纵数据但无济于事。如果有人在如何为每个分类因子产生两个数字的解决方案,显示12个聚类(测量12个参数)到其多元质心,那么非常感谢。我已经按照教程,我不明白这些错误或警告消息。如果我将列[,1]更改为手册中指定的数值,则没有区别所有编码,数据位于图(1)下方。

图1

enter image description here

用于在DAPC

之后生成散点图的代码
#An attempt to create a scatterplot for the categorical factor V4 

#Split the data frame into two seperate data frames

Just.V4<-LDA.scores[LDA.scores$Family=="V4",]
Just.G8 <-LDA.scores[LDA.scores$Family=="G8",]

library(adegenet)
x<-LDA.scores[2:13]

查找群集数

grp<-find.clusters(x, max.n.clust=12, na.action="omit")

此时输出是一个BIC图,根据正曲棍球棒曲线的形状要求保留多少主成分(PC),显示特征值

我选择保留2台PC,因为这是曲线在弯头之前笔直的位置(图2)

图2

enter image description here

下一步是根据负曲棍球棒曲线到达其肘部的时间选择数据集中的实际簇数(见图3),这似乎是3个簇。

图3

enter image description here

下一步是执行主成分的判别分析

dapc1<-dapc(x, grp$grp)
scatter(dapc1)

我尝试了很多不同的组合,这里有一些错误消息

Error in dapc.data.frame(x, grp1$grp1) : Inconsistent length for grp
Warning in find.clusters.data.frame(as.data.frame(x), ...) :
NAs introduced by coercion
Error in if (n.pca >= N) warning("number of retained PCs of PCA is          greater than N") : 
missing value where TRUE/FALSE needed

解决方案

set.seed(1234)
windows(width=10, height=7)
x<-LDA.scores[,2:13]
grp1<-find.clusters(x, max.n.clust=12)
dapc1<-dapc(x, grp1$grp)

代码开始工作后,下一步是选择PCA解释的差异。我选择2台PC显示肘曲线前数据的大部分变化。

图4

enter image description here

最后,最后一个问题是选择要保留的线性判别式的数量。我之所以选择1,是因为数据的大部分差异可以用第一个判别式来解释

图5

enter image description here

myCol <- c("red","purple","darkgreen")
scatter(dapc1, 
posi.da="bottomleft", 
bg="white", 
pch=17:19, 
col=myCol,
inset.solid=0.5,
lwd=9,
lty=3,
cex.lab=2,
txt.leg=paste("Cluster", 1:3),
legend=TRUE)

myInset <- function(){
            temp <- dapc1$pca.eig
            temp <- 100* cumsum(temp)/sum(temp)
            plot(temp, col=rep(c("black","lightgrey"),
                 c(dapc1$n.pca,1000)), ylim=c(0,100),
            xlab="PCA axis", ylab="Cumulated variance (%)",
            cex=1, pch=20, type="h", lwd=2)
            }

            add.scatter(myInset(), posi="bottomright",
            inset=c(-0.03,-0.01), ratio=.28,
            bg=transp("white"))

图6

enter image description here

密度图

scatter(dapc1,1,1, col=myCol, bg="white",
        scree.da=FALSE, legend=TRUE, solid=.4)

        scatter(dapc1,1,1, col=myCol, bg="white",
        scree.da=FALSE, legend=TRUE, solid=.4)

图7

enter image description here

数据称为LDA.scores

mydat <- structure(list(Family = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("G8", "v4"), class =    "factor"), 
Swimming = c(-0.4805568, 0.12600625, 0.06823834, 0.67480139, 
0.64591744, 0.21265812, -0.01841352, 0.12600625, -0.2206012, 
0.27042603, 0.03935439, -0.45167284, -0.04729748, -0.10506539, 
0.0971223, -0.07618143, 0.29930998, 0.01047043, -0.24948516, 
-0.04729748, -0.01841352, -0.19171725, -0.4805568, 0.01047043, 
-0.42278889, -0.45167284, -0.30725307, 0.24154207, 1.45466817, 
-0.01841352, 0.38596185, 0.15489021, -0.04729748, 0.27042603, 
-0.07618143, -0.10506539, -0.01841352, 0.01047043, 0.06823834, 
-0.16283329, -0.01841352, -0.39390493, -0.04729748, 0.01047043, 
0.01047043, 0.06823834, -0.04729748, -0.2206012, -0.16283329, 
-0.07618143, -0.2206012, -0.19171725, -0.16283329, -0.2206012, 
-0.13394934, -0.27836911, -0.04729748, 0.01047043, 0.12600625, 
0.06823834, 0.06823834, 0.32819394, 0.32819394, -0.27836911, 
0.18377416, 0.55926557, -0.19171725, -0.19171725, 0.01047043, 
-0.19171725, -0.01841352, -0.07618143, -0.13394934, -0.39390493, 
-0.04729748, -0.27836911, 0.70368535, 0.29930998, -0.13394934, 
0.21265812), Not.Swimming = c(-0.0862927, -0.074481895, -0.056765686, 
-0.050860283, -0.050860283, -0.068576492, -0.068576492, 0.05543697, 
0.114491, -0.021333268, -0.04495488, 0.008193747, -0.056765686, 
0.008193747, 0.037720761, 0.01409915, 0.108585597, -0.074481895, 
0.002288344, 0.049531567, 0.043626164, 0.049531567, 0.020004552, 
0.008193747, 0.025909955, 0.031815358, 0.049531567, -0.039049477, 
-0.003617059, 0.002288344, 0.084963985, -0.080387298, 0.067247776, 
0.031815358, 0.037720761, 0.025909955, 0.126301805, 0.031815358, 
0.037720761, -0.050860283, -0.039049477, -0.003617059, 0.008193747, 
-0.039049477, -0.003617059, 0.008193747, 0.01409915, -0.015427865, 
0.020004552, 0.031815358, 0.020004552, -0.033144074, -0.039049477, 
-0.009522462, -0.003617059, -0.04495488, -0.050860283, -0.04495488, 
-0.068576492, -0.033144074, -0.027238671, -0.068576492, 0.01409915, 
0.002288344, 0.025909955, -0.009522462, -0.009522462, 0.025909955, 
0.15582882, 0.002288344, -0.04495488, -0.015427865, 0.008193747, 
0.037720761, 0.008193747, -0.015427865, -0.056765686, 0.079058582, 
-0.056765686, 0.025909955), Running = c(-0.157157188, 0.057316151, 
0.064711783, 0.153459372, 0.072107416, 0.057316151, -0.053618335, 
0.012942357, -0.03882707, 0.049920519, 0.012942357, -0.075805232, 
0.035129254, -0.046222702, 0.109085578, -0.03882707, 0.057316151, 
0.020337989, 0.035129254, 0.057316151, 0.005546724, -0.016640173, 
-0.142365923, 0.220020063, -0.149761556, -0.134970291, 0.042524886, 
0.072107416, 0.064711783, 0.020337989, 0.049920519, 0.020337989, 
0.138668107, 0.049920519, 0.020337989, -0.083200864, -0.024035805, 
-0.016640173, -0.03882707, -0.03882707, 0.005546724, -0.090596497, 
-0.00924454, -0.016640173, -0.075805232, -0.090596497, 0.012942357, 
-0.075805232, -0.061013967, -0.03882707, -0.112783394, -0.068409599, 
-0.090596497, -0.053618335, -0.075805232, -0.090596497, 0.064711783, 
0.012942357, 0.042524886, -0.061013967, -0.061013967, 0.064711783, 
0.175646269, -0.068409599, 0.027733621, 0.042524886, -0.03882707, 
-0.00924454, 0.027733621, -0.031431438, -0.046222702, -0.031431438, 
-0.068409599, -0.120179026, 0.035129254, -0.061013967, 0.39751524, 
0.138668107, 0.020337989, 0.035129254), Not.Running = c(-0.438809944, 
-0.539013927, -0.539013927, -0.539013927, -0.472211271, -0.071395338, 
-0.071395338, 0.296019267, 0.563229889, -0.03799401, 0.195815284, 
-0.171599321, -0.305204632, 0.062209973, -0.104796666, 0.095611301, 
    0.028808645, -0.071395338, 0.329420595, 0.296019267, -0.171599321, 
    -0.071395338, 0.596631217, 0.062209973, 0.028808645, -0.138197994, 
    0.095611301, -0.104796666, 0.296019267, 0.028808645, -0.03799401, 
    -0.33860596, 0.129012629, 0.195815284, -0.03799401, 0.396223251, 
    0.362821923, -0.138197994, 0.26261794, -0.405408616, -0.205000649, 
    0.129012629, 0.195815284, -0.205000649, -0.004592683, -0.205000649, 
    -0.071395338, -0.171599321, -0.104796666, -0.138197994, -0.104796666, 
    -0.071395338, -0.104796666, -0.03799401, -0.004592683, -0.238401977, 
    0.028808645, -0.305204632, -0.305204632, -0.271803305, -0.03799401, 
    -0.372007288, 0.095611301, 0.195815284, 0.162413956, 0.229216612, 
    0.229216612, 0.396223251, 0.630032545, 0.463025906, 0.496427234, 
    0.062209973, -0.071395338, 0.229216612, -0.071395338, -0.071395338, 
    -0.205000649, 0.229216612, -0.305204632, 0.396223251), Fighting = c(-0.67708172, 
    -0.58224128, -0.11436177, -0.34830152, -0.84568695, -0.32933343, 
    0.35984044, -0.3251183, 1.51478626, 0.11114773, 0.27975296, 
    -0.89626852, 0.12379312, 0.66965255, 1.56536783, 0.56427428, 
    -0.71291033, -0.75927677, -0.75295407, -1.00164679, -1.03958296, 
    0.82139726, -1.07541157, -1.0311527, -0.98900139, -1.06908888, 
    -1.20186549, 0.58324237, -0.9700333, 0.22917139, 0.41042201, 
    -1.11545531, -0.19023412, 0.25446217, -0.05324237, 0.09007207, 
    1.21129685, 0.62539368, 1.32932051, 0.40199175, 0.44625062, 
    0.60221046, 0.33665722, -0.63493041, -0.282967, -0.32722587, 
    -0.11646933, -0.10171637, 0.13643851, -0.57802615, 0.05002833, 
    -0.1607282, -0.29139726, 0.13222338, -0.41152848, 0.68229794, 
    -0.24292325, -0.11646933, -0.21341734, -0.24292325, -0.24292325, 
    0.09007207, -0.34197883, -0.30825778, -0.08696342, -0.8119659, 
    0.49683219, -0.13754498, -0.4831857, 0.39988418, 0.90148474, 
    0.28396809, 1.05322945, 1.24923303, 0.47154141, 1.27873894, 
    0.05002833, 1.54218461, 0.74763247, 0.11747042), Not.Fighting = c(-0.097624192, 
    -0.160103675, -0.092996082, -0.234153433, -0.136963126, -0.15778962, 
    -0.15778962, -0.023574435, 0.00188017, -0.224897213, -0.109194467, 
    -0.069855533, -0.123078796, -0.111508522, -0.143905291, -0.099938247, 
    -0.118450687, 1.519900201, 0.177748344, 0.108326696, 0.652129604, 
    0.638245274, -0.072169588, 0.087500202, -0.18093017, -0.146219346, 
    -0.049029039, -0.125392851, -0.134649071, -0.060599313, -0.086053918, 
    -0.197128554, -0.083739863, -0.092996082, 0.844196163, 0.055103433, 
    1.971140911, -0.111508522, -0.224897213, -0.187872334, -0.160103675, 
    -0.194814499, -0.053657149, -0.206384774, 0.108326696, -0.164731785, 
    0.187004564, 0.025020719, 0.057417488, 0.434608441, 0.057417488, 
    0.073615872, -0.035144709, -0.051343094, -0.134649071, -0.185558279, 
    0.013450444, -0.134649071, -0.215640993, -0.185558279, -0.005061995, 
    -0.238781543, -0.099938247, -0.16704584, -0.208698829, 0.048161268, 
    0.048161268, -0.037458764, 0.16154996, 0.031962884, -0.102252302, 
    -0.123078796, -0.139277181, -0.208698829, -0.118450687, -0.072169588, 
    -0.044400929, -0.030516599, -0.132335016, -0.037458764), 
    Resting = c(0.01081204879, -0.03398160805, 0.057108797, -0.04063432116, 
    -0.13084281035, -0.02997847693, 0.12732080268, -0.1028170581, 
    0.08155320398, -0.17932134171, -0.14338902206, -0.02058415581, 
    -0.11528274705, -0.11764091337, 0.04389156236, 0.01399844913, 
    -0.05755560242, 0.04711630687, 0.0158428036, 0.093485909, 
    0.09677967302, 0.02053612974, -0.03608286844, 0.07805238146, 
    -9.686695e-05, -0.02285413055, -0.00424187149, 0.01446241356, 
    0.03187450017, 0.11323315542, -0.01171898422, -0.06499053655, 
    -0.07758659568, -0.07399758157, -0.11503350996, 0.02167111711, 
    0.01904454162, 0.05768779393, 0.05555202379, -0.01031175326, 
    -0.00458313459, 0.17430774591, 0.00481502094, -0.00928412956, 
    0.09047589183, 0.08917985896, -0.05671203072, -0.05333390954, 
    0.08541446168, 0.10140397965, -0.02509342995, -0.0369877908, 
    0.04609635201, 0.06524159499, 0.0845977309, -0.03239032508, 
    -0.03208740616, 0.06264952925, 0.05241547086, -0.03437271856, 
    -0.03437271856, -0.06747523863, -0.01270059491, 0.10014629095, 
    -0.02872845706, -0.00950652573, 0.04867308008, 0.02486518629, 
    -0.05951115497, -0.02353665674, -0.01967923345, -0.10148651548, 
    -0.00480936518, -0.00098261723, -0.13970798195, -0.00286148145, 
    -0.05492902692, 0.10732815358, 0.11660744219, -0.02016620439
    ), Not.Resting = c(-0.77046287, 0.773856776, -2.593072768, 
    -2.837675606, -1.680828329, -0.947623773, -0.947623773, -2.607366431, 
    -0.637055341, -1.818396455, 2.170944974, -0.658126752, -0.808243774, 
    2.377766908, 2.111220276, -0.322326312, 2.218858946, 3.920878638, 
    -0.304945754, 1.038591535, 1.752268128, 0.907465624, 1.137774798, 
    -3.663486997, 2.350924346, 0.067293462, -1.898454393, -2.497647463, 
    -4.471716512, -1.465081244, -0.232806371, -3.043893581, -2.323908986, 
    1.437404886, 1.079056696, 1.110865131, 1.404724068, -1.706664294, 
    0.736746935, -0.005516985, 1.727170333, 1.685228831, 1.836016918, 
    0.46617392, 1.697173771, 1.057314221, 0.933704227, 0.482480775, 
    0.680713089, 0.090780703, 0.680713089, -0.982921741, -2.281900378, 
    0.97208909, 0.027767791, -0.1628815, -0.530221948, -0.385741863, 
    -0.972251823, 0.002267358, -1.134447998, 0.626424009, -0.722750217, 
    -0.382722075, -0.356550578, -1.851614124, -1.851614124, 1.731465143, 
    0.254319006, 2.043778341, -0.28991392, 1.386940871, 0.054207713, 
    0.594212936, 1.551821303, 3.100704184, 0.327263666, -1.055195336, 
    -1.134447998, 1.730726972), Hunting = c(-0.67708172, -0.58224128, 
    -0.11436177, -0.34830152, -0.84568695, -0.32933343, 0.35984044, 
    -0.3251183, 1.51478626, 0.11114773, 0.27975296, -0.89626852, 
    0.12379312, 0.66965255, 1.56536783, 0.56427428, -0.71291033, 
    -0.75927677, -0.75295407, -1.00164679, -1.03958296, 0.82139726, 
    -1.07541157, -1.0311527, -0.98900139, -1.06908888, -1.20186549, 
    0.58324237, -0.9700333, 0.22917139, 0.41042201, -1.11545531, 
    -0.19023412, 0.25446217, -0.05324237, 0.09007207, 1.21129685, 
    0.62539368, 1.32932051, 0.40199175, 0.44625062, 0.60221046, 
    0.33665722, -0.63493041, -0.282967, -0.32722587, -0.11646933, 
    -0.10171637, 0.13643851, -0.57802615, 0.05002833, -0.1607282, 
    -0.29139726, 0.13222338, -0.41152848, 0.68229794, -0.24292325, 
    -0.11646933, -0.21341734, -0.24292325, -0.24292325, 0.09007207, 
    -0.34197883, -0.30825778, -0.08696342, -0.8119659, 0.49683219, 
    -0.13754498, -0.4831857, 0.39988418, 0.90148474, 0.28396809, 
    1.05322945, 1.24923303, 0.47154141, 1.27873894, 0.05002833, 
    1.54218461, 0.74763247, 0.11747042), Not.Hunting = c(-0.097624192, 
    -0.160103675, -0.092996082, -0.234153433, -0.136963126, -0.15778962, 
    -0.15778962, -0.023574435, 0.00188017, -0.224897213, -0.109194467, 
    -0.069855533, -0.123078796, -0.111508522, -0.143905291, -0.099938247, 
    -0.118450687, 1.519900201, 0.177748344, 0.108326696, 0.652129604, 
    0.638245274, -0.072169588, 0.087500202, -0.18093017, -0.146219346, 
    -0.049029039, -0.125392851, -0.134649071, -0.060599313, -0.086053918, 
    -0.197128554, -0.083739863, -0.092996082, 0.844196163, 0.055103433, 
    1.971140911, -0.111508522, -0.224897213, -0.187872334, -0.160103675, 
    -0.194814499, -0.053657149, -0.206384774, 0.108326696, -0.164731785, 
    0.187004564, 0.025020719, 0.057417488, 0.434608441, 0.057417488, 
    0.073615872, -0.035144709, -0.051343094, -0.134649071, -0.185558279, 
    0.013450444, -0.134649071, -0.215640993, -0.185558279, -0.005061995, 
    -0.238781543, -0.099938247, -0.16704584, -0.208698829, 0.048161268, 
    0.048161268, -0.037458764, 0.16154996, 0.031962884, -0.102252302, 
    -0.123078796, -0.139277181, -0.208698829, -0.118450687, -0.072169588, 
    -0.044400929, -0.030516599, -0.132335016, -0.037458764), 
    Grooming = c(0.01081204879, -0.03398160805, 0.057108797, 
    -0.04063432116, -0.13084281035, -0.02997847693, 0.12732080268, 
    -0.1028170581, 0.08155320398, -0.17932134171, -0.14338902206, 
    -0.02058415581, -0.11528274705, -0.11764091337, 0.04389156236, 
    0.01399844913, -0.05755560242, 0.04711630687, 0.0158428036, 
    0.093485909, 0.09677967302, 0.02053612974, -0.03608286844, 
    0.07805238146, -9.686695e-05, -0.02285413055, -0.00424187149, 
    0.01446241356, 0.03187450017, 0.11323315542, -0.01171898422, 
    -0.06499053655, -0.07758659568, -0.07399758157, -0.11503350996, 
    0.02167111711, 0.01904454162, 0.05768779393, 0.05555202379, 
    -0.01031175326, -0.00458313459, 0.17430774591, 0.00481502094, 
    -0.00928412956, 0.09047589183, 0.08917985896, -0.05671203072, 
    -0.05333390954, 0.08541446168, 0.10140397965, -0.02509342995, 
    -0.0369877908, 0.04609635201, 0.06524159499, 0.0845977309, 
    -0.03239032508, -0.03208740616, 0.06264952925, 0.05241547086, 
    -0.03437271856, -0.03437271856, -0.06747523863, -0.01270059491, 
    0.10014629095, -0.02872845706, -0.00950652573, 0.04867308008, 
    0.02486518629, -0.05951115497, -0.02353665674, -0.01967923345, 
    -0.10148651548, -0.00480936518, -0.00098261723, -0.13970798195, 
    -0.00286148145, -0.05492902692, 0.10732815358, 0.11660744219, 
    -0.02016620439), Not.Grooming = c(-0.77046287, 0.773856776, 
    -2.593072768, -2.837675606, -1.680828329, -0.947623773, -0.947623773, 
    -2.607366431, -0.637055341, -1.818396455, 2.170944974, -0.658126752, 
    -0.808243774, 2.377766908, 2.111220276, -0.322326312, 2.218858946, 
    3.920878638, -0.304945754, 1.038591535, 1.752268128, 0.907465624, 
    1.137774798, -3.663486997, 2.350924346, 0.067293462, -1.898454393, 
    -2.497647463, -4.471716512, -1.465081244, -0.232806371, -3.043893581, 
    -2.323908986, 1.437404886, 1.079056696, 1.110865131, 1.404724068, 
    -1.706664294, 0.736746935, -0.005516985, 1.727170333, 1.685228831, 
    1.836016918, 0.46617392, 1.697173771, 1.057314221, 0.933704227, 
    0.482480775, 0.680713089, 0.090780703, 0.680713089, -0.982921741, 
    -2.281900378, 0.97208909, 0.027767791, -0.1628815, -0.530221948, 
    -0.385741863, -0.972251823, 0.002267358, -1.134447998, 0.626424009, 
    -0.722750217, -0.382722075, -0.356550578, -1.851614124, -1.851614124, 
    1.731465143, 0.254319006, 2.043778341, -0.28991392, 1.386940871, 
    0.054207713, 0.594212936, 1.551821303, 3.100704184, 0.327263666, 
    -1.055195336, -1.134447998, 1.730726972), Other = c(0.019502286, 
    -0.290451956, 0.359948884, 0.557840914, 0.117453376, 0.126645924, 
    0.126645924, 0.196486873, 0.152780228, 0.354469789, -0.261430968, 
    0.176448238, -0.007374708, -0.557848621, -0.213674557, -0.005819262, 
    -0.470070992, -0.786078864, 0.006063789, -0.27184265, -0.349418792, 
    -0.338096262, -0.165119403, 0.346566439, -0.344191931, 0.074321265, 
    0.179825379, 0.278407054, 0.593125727, 0.199177375, -0.058900625, 
    0.633875622, 0.428150308, -0.206023441, -0.436958199, -0.291839246, 
    -0.907641911, 0.448567295, -0.127186127, 0.024715134, -0.41634503, 
    -0.330697382, -0.469720666, -0.047494017, -0.301732446, -0.138901021, 
    0.098101379, -0.002063769, -0.02832419, 0.071630763, -0.02832419, 
    0.295110588, 0.347112947, -0.083577573, -0.036886152, 0.189045953, 
    0.467596992, 0.303378276, 0.218879697, 0.092005711, 0.27011134, 
    -0.012909856, 0.262292068, 0.107125772, 0.123422927, 0.299426602, 
    0.299426602, -0.326871824, -0.022088391, -0.428508341, -0.014675497, 
    -0.114462294, 0.087227267, -0.031519161, -0.159318008, -0.397875854, 
    0.101520559, 0.244481505, 0.529968994, -0.32661959)), .Names =   c("Family", 
"Swimming", "Not.Swimming", "Running", "Not.Running", "Fighting", 
"Not.Fighting", "Resting", "Not.Resting", "Hunting", "Not.Hunting", 
"Grooming", "Not.Grooming", "Other"), class = "data.frame", row.names = c(NA, 
-80L))

0 个答案:

没有答案