我正在对垃圾邮件数据集进行PCA,直到我想要相互绘制主要组件pc1vspc2,pc1vspc3和pc2vspc3为止,一切都很好。散点图运行良好,但是我想在非垃圾邮件数据的顶部显示垃圾邮件数据点。
我已经在上下搜索了一种方法,但是似乎找不到任何有效的方法!
#Seperating Feautures
X = df.iloc[:,:54]
#Seperating Target, changing 0's to non-spam & 1's to spam
Y = df['Spam_Indicator'].values.tolist()
for i in range(len(Y)):
if Y[i] == 1:
Y[i] = 'Spam'
else:
Y[i] = 'Non-spam'
Y = np.asarray(Y)
#no of principal components
N = 3
col_numbering = [str(x) for x in range(1,N + 1)]
#Applies PCA reducing from 54 to N dimensions
pca = PCA(n_components = N)
X_red = pca.fit_transform(X)
X_red = pd.DataFrame(data = X_red, columns = col_numbering)
#Prints the components, explained variance and explained variance ratio
#print('Components:',pca.components_)
print('Explained Variance:' ,pca.explained_variance_)
print('Explained Variance Ratio:' ,pca.explained_variance_ratio_)
plt.figure(figsize=(20,10))
plt.subplot(1,3,1)
sns.scatterplot(x = '1', y = '2', data = X_red, hue = Y,
alpha = .75, hue_norm = (0.7))
plt.subplot(1,3,2)
sns.scatterplot(x = '1', y = '3', data = X_red, hue = Y,
alpha = .75, hue_norm = (0.7))
plt.subplot(1,3,3)
sns.scatterplot(x = '2', y = '3', data = X_red, hue = Y,
alpha = .75, hue_norm = (0.7))
plt.show()
以下是我所拥有的图像,以便您更好地了解我要的是什么。 Seaborn Scatter Plot