如何计算数据帧的协方差矩阵

时间:2019-04-28 22:31:10

标签: python pca covariance eigenvalue

我已经使用pandas read_fwf函数读取了传感器数据的数据帧。 我需要找到读取的928991 x 8矩阵的协方差矩阵。最终, 我想使用此协方差矩阵的主成分分析算法来查找特征向量和特征值。

3 个答案:

答案 0 :(得分:1)

首先,您需要使用 df.values 将熊猫数据框放入numpy数组。例如:

A = df.values

将数据放入numpy数组后,计算协方差矩阵或PCA会很容易。有关更多信息:

# import functions you need to compute covariance matrix from numpy
from numpy import array
from numpy import mean
from numpy import cov
from numpy.linalg import eig

# assume you load your data using pd.read_fwf to variable *df*
df = pd.read_fwf(filepath, widths=col_widths, names=col_names)
#put dataframe values to a numpy array
A = df.values
#check matrix A's shape, it should be (928991, 8)
print(A.shape)
# calculate the mean of each column
M = mean(A.T, axis=1)
print(M)
# center columns by subtracting column means
C = A - M
print(C)
# calculate covariance matrix of centered matrix
V = cov(C.T)
print(V)
# eigendecomposition of covariance matrix
values, vectors = eig(V)
print(vectors)
print(values)
# project data
P = vectors.T.dot(C.T)
print(P.T)

首先运行示例,然后打印原始矩阵,然后打印中心协方差矩阵的特征向量和特征值,最后打印原始矩阵。这是您可能对PCA task有用的链接。

答案 1 :(得分:1)

为什么不只使用pd.DataFrame.cov function

答案 2 :(得分:0)

这个问题的答案如下

import pandas as pd
import numpy as np
from numpy.linalg import eig

df_sensor_data = pd.read_csv('HT_Sensor_dataset.dat', delim_whitespace=True)
del df_sensor_data['id']
del df_sensor_data['time']
del df_sensor_data['Temp.']
del df_sensor_data['Humidity']
df = df_sensor_data.notna().astype('float64')
covariance_matrix = df_sensor_data.cov()
print(covariance_matrix)

values, vectors = eig(covariance_matrix)
print(values)
print(vectors)