python中的相关和余弦平方矩阵

时间:2019-10-16 22:54:07

标签: python statistics pca

我正在使用Python对数据集进行PCA分析,我想知道什么等效于以下两个用R编写但使用Python制作的矩阵。

Correlation and cosine square matrix

这两个矩阵显示我的变量与每个主成分(dim)的相关性。我需要知道如何获得类似的东西。

这是文件:https://storage.googleapis.com/min_ambiente/servi_acc/datos.csv

代码如下:

# I have libraries es for some other methods I Implemented here.


import pandas as pd
import numpy as np
import random
import matplotlib.pyplot as plt
from google.cloud import bigquery
from sklearn.preprocessing import StandardScaler
from factor_analyzer.factor_analyzer import calculate_bartlett_sphericity
from factor_analyzer.factor_analyzer import calculate_kmo
from factor_analyzer import FactorAnalyzer
%matplotlib inline

#load csv
from google.colab import files
uploaded = files.upload()

data = pd.read_csv("datos.csv") 

data.fillna(0, inplace=True)
a,b = data.shape
X= data.iloc[:,0:b-1]
X.head()

#####################################################
###Estandarizar y realizar la matriz de covarianza###
#####################################################
#Standardize features by removing the mean and scaling to unit variance
#used for generating learning model parameters from training data and
#generate transformed data set
X_std = StandardScaler().fit_transform(X)
mean_vec = np.mean(X_std, axis=0)
cov_mat = (X_std - mean_vec).T.dot((X_std - mean_vec)) / (X_std.shape[0]-1)

###Valores y vectores propios obtenidos de la matriz covarianza
cov_mat = np.cov(X_std.T)
eig_vals, eig_vecs = np.linalg.eig(cov_mat)

dictionary = dict(zip(lst2, lst1))
print(dictionary)
###print from the highest to the lowest
eig_pairs.sort()
eig_pairs.reverse()


print('eigenvalues in descending order :')
for i in eig_pairs:
    print(i[0])
#PCA
pca=PCA(n_components=.9) # Otra opción es instanciar pca sólo con dimensiones nuevas hasta obtener un mínimo "explicado" ej.: pca=PCA(.85)
pca.fit(X_std) # obtener los componentes principales
X_pca=pca.transform(X_std) # convertimos nuestros datos con las nuevas dimensiones de PCA

print("shape of X_pca", X_pca.shape)
expl = pca.explained_variance_ratio_
print(expl)
print('suma:',sum(expl[0:5]))

0 个答案:

没有答案