我在ggplot2中制作了该PCA图。有没有办法找到带有红色箭头的数据点?我想让R告诉我该数据点关联了哪些物种(我与每个点都有关联的名称,代表一个物种的PC得分)
代码:
df_out <- as.data.frame(PPCA.scores)
theme <-
theme(
panel.background = element_blank(),
panel.border = element_rect(fill = NA),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
strip.background = element_blank(),
axis.text.x = element_text(colour = "black"),
axis.text.y = element_text(colour = "black"),
axis.ticks = element_line(colour = "black"),
plot.margin = unit(c(1, 1, 1, 1), "line")
)
percentage <- round(PPCA$Eval / sum(PPCA$Eval) * 100, 2)
percentage <- diag(as.matrix(percentage))
percentage <- paste0(names(percentage), " (", percentage, "%)")
p<-ggplot(df_out,aes(x=PC1,y=PC2))
p<-p+geom_point(size=3) + theme + xlab(percentage[1]) + ylab(percentage[2])
p
答案 0 :(得分:2)
factoextra
包提供了一种通过标记单个数据点来执行PCA的方法
set.seed(123)
# pca object
res.pca <- prcomp(iris[, -5], scale = TRUE)
# plot
factoextra::fviz_pca_biplot(res.pca, repel = TRUE)
由reprex package(v0.2.1)于2018-11-10创建
答案 1 :(得分:1)
您只需使用ggplot即可轻松完成此操作,假设您要做的就是能够(直观地)快速识别一个点属于哪个物种:
library(ggplot2)
irispca <- prcomp(iris[c(1:4)], scale.=T)
df$PC1 <- irispca$x[,1]
df$PC2 <- irispca$x[,2]
df$species <- iris$Species
ggplot(df, aes(x=PC1, y=PC2, color=species)) + geom_point()
如果要用物种名称标记,可以执行以下操作:
ggplot(df, aes(x=PC1, y=PC2)) + geom_point() + geom_text(aes(label=species))
或每个点的索引:
ggplot(df, aes(x=PC1, y=PC2, color=species)) + geom_point()
+ geom_text(aes(label=rownames(df)))
如果您只想标记单个点,假设您知道该点的行和索引,则可以执行以下操作(也可以手动设置位置/标签):
# get the row corresponding to that specific point
point <- df[110,]
# if you want the point that is at the max of PC1, for example, you could instead use this:
point <- df[df$PC1 == max(df$PC1),]
ggplot(df, aes(x=PC1, y=PC2)) + geom_point()
+ annotate("text", label=point$species, x=point$PC1, y=point$PC2)
答案 2 :(得分:0)