我现在正在尝试分解数据。
这是我的代码:
import xlrd
import xlrd
import xlwt
import numpy as np
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
data = xlrd.open_workbook('x.xlsx')
sh=data.sheet_by_index(1)
num_rows = sh.nrows -1
num_cells = sh.ncols -1
inputData = np.empty([sh.nrows - 1, sh.ncols])
curr_row = -1
while curr_row < num_rows: # for each row
curr_row += 1
row = sh.row(curr_row)
if curr_row > 0: # don't want the first row because those are labels
for col_ind, el in enumerate(row):
inputData[curr_row - 1, col_ind] = el.value
print(inputData.shape)
pca = PCA(n_components=3)
newData = pca.fit_transform(inputData)
print(inputData - np.dot(newData, pca.components_))
我认为inputData和np.dot(newData,pca.components_)之间的区别应该非常小,但事实是结果似乎远离原始数据。
你能帮助我吗?
答案 0 :(得分:2)
您需要添加均值。要进行重建:
rec = np.dot(newData, pca.components_) + pca.mean_
print(inputData - rec)