我已经成功创建了标签和特征向量,并且能够对其进行pca分析,但是发生的是所生成的列是数据类型向量,而每一行都是一个向量。如何绘制pca组件的散点图
from pyspark.ml.feature import PCA
from pyspark.ml.linalg import Vectors
import matplotlib.pyplot as plt
import numpy as np
# from matplotlib.pyplot import matplotlib
pca = PCA(k=2, inputCol="scaledFeatures", outputCol="pcaFeatures")
model = pca.fit(df2)
result = model.transform(df2).select("pcaFeatures")
result.show(truncate=False)
result.printSchema()
sum(model.explainedVariance)
我得到的输出如下:
pcaFeatures |
+-----------------------------------------+
|[0.9636424850516068,0.3313811478935345] |
|[0.8373410183626885,0.3880024159205323] |
|[-0.10845002652578276,0.6564023408615134]|
|[-0.479560942670008,1.1082617061107987] |
|[0.9576794865061756,0.2714643678687506] |
|[0.7879027918969023,0.5145147352059565] |
|[0.5124304692668866,-0.1917648708243116] |
|[-0.7369547765884317,1.0356901001261056] |
|[-0.10282606527163515,0.671822806010155] |
|[1.0661514594145962,0.3285042864447201] |
|[-0.32474294634018674,0.8134787300694735]|
|[-0.2109752165189983,0.7625432021333773] |
|[0.9643915702012056,0.3276715407315949] |
|[0.8970032005901719,0.3514814197107741] |
|[0.47244006359864477,0.6034483574148226] |
|[0.7840860892766188,0.421458958932977] |
|[-0.7640855001185652,1.117508731487764] |
|[0.5078194714105165,0.5364599694359978] |
|[1.020982108328857,0.36510796039610344] |
|[-0.6823665987365033,-0.5902523648089859]|
+-----------------------------------------+
only showing top 20 rows
root
|-- pcaFeatures: vector (nullable = true)
0.4127855508907272