如何在r的PCA图中找到数据点?

时间:2018-11-10 23:25:36

标签: r ggplot2

我在ggplot2中制作了该PCA图。有没有办法找到带有红色箭头的数据点?我想让R告诉我该数据点关联了哪些物种(我与每个点都有关联的名称,代表一个物种的PC得分)

代码:

df_out <- as.data.frame(PPCA.scores)
theme <-
  theme(
    panel.background = element_blank(),
    panel.border = element_rect(fill = NA),
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    strip.background = element_blank(),
    axis.text.x = element_text(colour = "black"),
    axis.text.y = element_text(colour = "black"),
    axis.ticks = element_line(colour = "black"),
    plot.margin = unit(c(1, 1, 1, 1), "line")
  )
percentage <- round(PPCA$Eval / sum(PPCA$Eval) * 100, 2)
percentage <- diag(as.matrix(percentage))
percentage <- paste0(names(percentage), " (", percentage, "%)")

p<-ggplot(df_out,aes(x=PC1,y=PC2)) 
p<-p+geom_point(size=3) + theme + xlab(percentage[1]) + ylab(percentage[2])
p  

PCA plot

3 个答案:

答案 0 :(得分:2)

factoextra包提供了一种通过标记单个数据点来执行PCA的方法

set.seed(123)

# pca object
res.pca <- prcomp(iris[, -5],  scale = TRUE)

# plot
factoextra::fviz_pca_biplot(res.pca, repel = TRUE)

reprex package(v0.2.1)于2018-11-10创建

有关详细文档,请参阅- http://www.sthda.com/english/articles/31-principal-component-methods-in-r-practical-guide/112-pca-principal-component-analysis-essentials/

答案 1 :(得分:1)

您只需使用ggplot即可轻松完成此操作,假设您要做的就是能够(直观地)快速识别一个点属于哪个物种:

library(ggplot2)

irispca <- prcomp(iris[c(1:4)], scale.=T)

df$PC1 <- irispca$x[,1]
df$PC2 <- irispca$x[,2]
df$species <- iris$Species

ggplot(df, aes(x=PC1, y=PC2, color=species)) + geom_point()

enter image description here

如果要用物种名称标记,可以执行以下操作:

ggplot(df, aes(x=PC1, y=PC2)) + geom_point() + geom_text(aes(label=species))

enter image description here

或每个点的索引:

ggplot(df, aes(x=PC1, y=PC2, color=species)) + geom_point() 
    + geom_text(aes(label=rownames(df)))

enter image description here

如果您只想标记单个点,假设您知道该点的行和索引,则可以执行以下操作(也可以手动设置位置/标签):

# get the row corresponding to that specific point
point <- df[110,]

# if you want the point that is at the max of PC1, for example, you could instead use this:
point <- df[df$PC1 == max(df$PC1),]

ggplot(df, aes(x=PC1, y=PC2)) + geom_point()
    + annotate("text", label=point$species, x=point$PC1, y=point$PC2)

enter image description here

答案 2 :(得分:0)

您也可以使用ControlSource,这是来自文档的 wine 数据集,其中 ground-truth wine。类)可用。

ggbiplot

enter image description here