我正在尝试实现一个很好的PC图以及解释的累积方差。 我正在处理的数据框位于https://www.kaggle.com/miroslavsabo/young-people-survey?select=responses.csv
df.responses <- read.csv("Data/responses.csv")
pref <- colnames(df.responses[0:63]) #columns for Music, Movies and Hobbies preferences
for(i in 1:length(pref)){
df.responses[is.na(df.responses[,i]), i] <- median(df.responses[,i], na.rm = TRUE)
}
df.movies <- data.frame(df.responses[20:31])
在上面我刚刚加载了df,删除了我感兴趣的col的na,然后选择了我要进行PCA的子集。
library(ggplot2)
library(factoextra)
pca.movies <- prcomp(df.movies, scale = TRUE,)
pca.movies$rotation <- -pca.movies$rotation
pca.movies$x <- -pca.movies$x
fviz_pca_var(pca.movies,
col.var = "contrib",
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE
)
pv.movies <- pca.movies$sdev^2
pvp.movies <- pv.movies/sum(pv.movies)
pvp.movies
fviz_eig(pca.movies,
addlabels = T,
barcolor = "#E7B800",
barfill = "#E7B800",
linecolor = "#00AFBB",
choice = "variance",
ylim=c(0,25))
plot(cumsum(pvp.movies),xlab = "Cumulative proportion of Variance Explained", ylim=c(0,1),type = 'b')
通过上面的内容,我设法获得了两个不错的PCA图,我想在第二个图上添加累计和线(第三个难看的图所示) 有没有办法将这样的线添加到fviz_eig图? 我知道此PCA并非真正有效,我只是通过一些dataviz挑战自己。
答案 0 :(得分:1)
fviz_eig
返回的对象是ggplot
对象,因此您可以按以下步骤合并两个图:
p <- fviz_eig(pca.movies,
addlabels = T,
barcolor = "#E7B800",
barfill = "#E7B800",
linecolor = "#00AFBB",
choice = "variance",
ylim=c(0,25))
df <- data.frame(x=1:length(pvp.movies),
y=cumsum(pvp.movies)*100/4)
p <- p +
geom_point(data=df, aes(x, y), size=2, color="#00AFBB") +
geom_line(data=df, aes(x, y), color="#00AFBB") +
scale_y_continuous(sec.axis = sec_axis(~ . * 4,
name = "Cumulative proportion of Variance Explained") )
print(p)