如何在Python中获取变量和主成分之间的关​​联

时间:2019-07-14 09:10:19

标签: python pca

我正在使用PCA来减少变量。我想获得每个变量与最终PC的相关性。我想要的是每台PC的每个变量的调整后Rsquare。这将确定哪些变量与特定PC最密切相关,而与其他PC高度不相关。

Sklean给出特征值,特征向量,解释方差比。是否有任何属性可以通过变量获得与PC的相关性

genes = ['gene' + str(i) for i in range (1,101)]

wt = ['wt' + str(i) for i in range (1,6)]
ko = ['ko' + str(i) for i in range (1,6)]

data = pd.DataFrame(columns = [*wt,*ko], index = genes)

for gene in data.index:
    data.loc[gene,'wt1':'wt5'] = np.random.poisson(lam = rd.randrange(10,1000), size =5)
    data.loc[gene,'ko1':'ko5'] = np.random.poisson(lam = rd.randrange(10,1000), size =5)

x = StandardScaler().fit_transform(data)

pca = PCA(0.95)

principalComponents = pca.fit_transform(x)

corr = pca.components_

features = pca.explained_variance_ratio_

matrix = pca.components_.T * np.sqrt(pca.explained_variance_)

1 个答案:

答案 0 :(得分:0)

首先,您可以直接进行计算:

pd.DataFrame(data=[[np.corrcoef(data[c],principalComponents[:,n])[1,0] 
               for n in range(pca.n_components_)] for c in data],
             columns=[0,1],
             index = data.columns)