我有一个数据集,其中10维作为特征,1维作为聚类编号(11维一起)。如何使用R?
绘制我的数据(PC1)的PCA与群集号的关联qplot(x = not_null_df$TSC_8125, y = pca, data = subset(not_null_df, select = c (not_null_df$AVG_ERTEBAT,not_null_df$AVG_ROSHD,not_null_df$AVG_HOGHOGH,not_null_df$AVG_MM,not_null_df$AVG_MK,not_null_df$AVG_TM,not_null_df$AVG_VEJHE,not_null_df$AVG_ANGIZEH,not_null_df$AVG_TAHOD)), main = "Loadings for PC1", xlab = "cluster number")
实际上我写了这部分代码,我收到了这个错误:
Don't know how to automatically pick scale for object of type princomp. Defaulting to continuous.
Error: Aesthetics must be either length 1 or the same as the data (564): x, y
summary(not_null_df)
ï..QN NAMECODE GENDER VAZEYATTAAHOL TAHSILAT SEN SABEGHE
Min. : 1.00 Min. : 1.0 Min. :1.000 Min. :1.00 Min. :1.000 Min. :1.000 Min. :1.000
1st Qu.: 28.00 1st Qu.:11.0 1st Qu.:1.000 1st Qu.:1.75 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.000
Median : 60.00 Median :13.0 Median :1.000 Median :2.00 Median :3.000 Median :1.000 Median :1.000
Mean : 68.63 Mean :11.7 Mean :1.152 Mean :1.75 Mean :2.578 Mean :1.394 Mean :1.121
3rd Qu.:103.25 3rd Qu.:14.0 3rd Qu.:1.000 3rd Qu.:2.00 3rd Qu.:3.000 3rd Qu.:2.000 3rd Qu.:1.000
Max. :190.00 Max. :16.0 Max. :2.000 Max. :2.00 Max. :3.000 Max. :3.000 Max. :3.000
AVG_ERTEBAT AVG_ROSHD AVG_HOGHOGH AVG_MM AVG_MK AVG_TM AVG_VEJHE
Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000
1st Qu.: 5.333 1st Qu.: 4.125 1st Qu.: 1.750 1st Qu.: 5.000 1st Qu.: 3.125 1st Qu.: 5.981 1st Qu.: 4.556
Median : 7.000 Median : 5.875 Median : 3.500 Median : 7.727 Median : 5.000 Median : 8.000 Median : 6.333
Mean : 6.730 Mean : 5.787 Mean : 4.001 Mean : 6.903 Mean : 4.890 Mean : 7.390 Mean : 6.095
3rd Qu.: 8.425 3rd Qu.: 7.656 3rd Qu.: 6.000 3rd Qu.: 9.182 3rd Qu.: 6.688 3rd Qu.: 9.204 3rd Qu.: 7.778
Max. :10.000 Max. :10.000 Max. :10.000 Max. :10.000 Max. :10.000 Max. :10.000 Max. :10.000
AVG_ANGIZEH AVG_TAHOD AVG_SOALAT TSC_8125 avg
Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. :1.000 Min. :0.000
1st Qu.: 5.000 1st Qu.: 5.833 1st Qu.: 4.000 1st Qu.:1.000 1st Qu.:4.788
Median : 7.000 Median : 7.667 Median : 7.000 Median :2.000 Median :6.301
Mean : 6.549 Mean : 7.171 Mean : 6.025 Mean :2.046 Mean :6.154
3rd Qu.: 8.750 3rd Qu.: 9.000 3rd Qu.: 8.000 3rd Qu.:3.000 3rd Qu.:7.599
Max. :10.000 Max. :10.000 Max. :10.000 Max. :3.000 Max. :9.978
我可以通过以下代码获得pca:
pca <- princomp(not_null_df, cor=TRUE, scores=TRUE)
summary(pca)
Importance of components:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
Standard deviation 2.887437 1.28937443 1.12619079 1.08816449 0.98432226 0.91257779 0.90980017 0.82303807 0.74435256
Proportion of Variance 0.438805 0.08749929 0.06675293 0.06232116 0.05099423 0.04383149 0.04356507 0.03565219 0.02916109
Cumulative Proportion 0.438805 0.52630426 0.59305720 0.65537835 0.70637258 0.75020406 0.79376914 0.82942133 0.85858242
Comp.10 Comp.11 Comp.12 Comp.13 Comp.14 Comp.15 Comp.16 Comp.17 Comp.18
Standard deviation 0.70304085 0.67709130 0.62905993 0.59284646 0.50799135 0.48013732 0.4476952 0.39317004 0.378722707
Proportion of Variance 0.02601402 0.02412909 0.02082718 0.01849826 0.01358185 0.01213325 0.0105490 0.00813593 0.007548994
Cumulative Proportion 0.88459644 0.90872553 0.92955271 0.94805097 0.96163282 0.97376607 0.9843151 0.99245101 1.000000000
Comp.19
Standard deviation 1.838143e-08
Proportion of Variance 1.778301e-17
Cumulative Proportion 1.000000e+00
我的目标是绘制pca(仅Comp.1
)与TSC_8125(即群集号)
答案 0 :(得分:1)
函数princomp()返回7个元素的列表。这些是sdev,加载,中心,比例,n.obs,分数和通话。您可以在功能帮助页面中找到这些的描述(您可以通过键入?princomp来访问它)。根据你的情节的目的,这里感兴趣的可能是分数。
分数:所提供数据的分数 主要成分。
加载:可变加载的矩阵(即,列的矩阵) 包含特征向量)。
访问列表元素的最简单方法是通过$运算符。因此,pca $ score或pca $ loadings将分别访问这些。分数和加载都是类矩阵,每列对应一个主要组件(第一个col是第一个主要组件,依此类推。)
因此,要访问第一个主要组件分数,您可以使用
comp.1 <- pca$scores[,1]
根据您可以使用的群号进行绘制
plot (comp.1 ~ not_null_df$TSC_8125)
或使用qplot绘制它,如果您喜欢
qplot(x = not_null_df$TSC_8125, y = comp.1, main = "Scores for PC1", xlab = "cluster number")