添加图例到散点图(PCA)

时间:2018-06-02 07:37:54

标签: python matplotlib legend pca biplot

我是python的新手,发现了这个优秀的PCA双标图建议(Plot PCA loadings and loading in biplot in sklearn (like R's autoplot))。现在我尝试为不同的目标添加一个图例。但命令plt.legend()不起作用。

有一种简单的方法吗? 例如,虹膜数据带有来自上面链接的双时隙代码。

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.decomposition import PCA
import pandas as pd
from sklearn.preprocessing import StandardScaler

iris = datasets.load_iris()
X = iris.data
y = iris.target
#In general a good idea is to scale the data
scaler = StandardScaler()
scaler.fit(X)
X=scaler.transform(X)    

pca = PCA()
x_new = pca.fit_transform(X)

def myplot(score,coeff,labels=None):
    xs = score[:,0]
    ys = score[:,1]
    n = coeff.shape[0]
    scalex = 1.0/(xs.max() - xs.min())
    scaley = 1.0/(ys.max() - ys.min())
    plt.scatter(xs * scalex,ys * scaley, c = y)
    for i in range(n):
        plt.arrow(0, 0, coeff[i,0], coeff[i,1],color = 'r',alpha = 0.5)
        if labels is None:
            plt.text(coeff[i,0]* 1.15, coeff[i,1] * 1.15, "Var"+str(i+1), color = 'g', ha = 'center', va = 'center')
        else:
            plt.text(coeff[i,0]* 1.15, coeff[i,1] * 1.15, labels[i], color = 'g', ha = 'center', va = 'center')
plt.xlim(-1,1)
plt.ylim(-1,1)
plt.xlabel("PC{}".format(1))
plt.ylabel("PC{}".format(2))
plt.grid()

#Call the function. Use only the 2 PCs.
myplot(x_new[:,0:2],np.transpose(pca.components_[0:2, :]))
plt.show()

欢迎任何关于PCA biplots的建议! 还有其他代码,如果以另一种方式添加图例更容易!

2 个答案:

答案 0 :(得分:5)

我最近提出了一种向散点图添加图例的简便方法,请参阅GitHub PR。这仍在讨论中。

与此同时,您需要从y中的唯一标签手动创建图例。对于每个对象,您将使用与散点图中使用的标记相同的标记创建Line2D对象,并将其作为参数提供给plt.legend

scatter = plt.scatter(xs * scalex,ys * scaley, c = y)
labels = np.unique(y)
handles = [plt.Line2D([],[],marker="o", ls="", 
                      color=scatter.cmap(scatter.norm(yi))) for yi in labels]
plt.legend(handles, labels)

enter image description here

答案 1 :(得分:0)

尝试使用“ pca”库。这将绘制解释的方差,并创建一个双图。

wget -q --header='Content-type:application/json' 'http://rest.ensembl.org/ld/human/pairwise/rs6792369/rs1042779?population_name=1000GENOMES:phase_3:KHV'  -O - > list.txt

explained variance PCs

pip install pca

from pca import pca

# Initialize to reduce the data up to the number of componentes that explains 95% of the variance.
model = pca(n_components=0.95)

# Or reduce the data towards 2 PCs
model = pca(n_components=2)

# Load example dataset
import pandas as pd
import sklearn
from sklearn.datasets import load_iris
X = pd.DataFrame(data=load_iris().data, columns=load_iris().feature_names, index=load_iris().target)

# Fit transform
results = model.fit_transform(X)

# Plot explained variance
fig, ax = model.plot()

PCA biplot

结果是一个字典,其中包含许多PC,负载等的统计信息