我正在尝试在python中使用PLS算法。他们在sklearn webpage for PLS中说:
T: x_scores_
P: x_loadings_
X = T P.T
因此,可以期望根据PLS负载和得分来恢复原始输入数据(居中)。但是,如果我基于bostonhousing数据集运行一个小示例:
from __future__ import division
import numpy as np
from sklearn.datasets import load_boston
from sklearn.cross_decomposition import PLSRegression
boston = load_boston()
x = boston.data #506 x 13 matrix
x_center = x - x.mean(axis=0)
y = boston.target # 506 x 1 vector
pls = PLSRegression(n_components=3, scale=False)
pls.fit(x, y)
p = pls.x_loadings_ # 13 x 3 matrix
t = pls.x_scores_ # 506 x 3 matrix
x2 = np.dot(t, p.T)
预计x2
等于x_center
,但我得到以下结果:
x_center[0:3, 0:3]
array([[ -3.58744071, 6.63636364, -8.82677866],
[ -3.56645071, -11.36363636, -4.06677866],
[ -3.56647071, -11.36363636, -4.06677866]])
x2[0:3, 0:3]
array([[ -3.54146571, 6.16576566, -3.49838208],
[ -4.25799917, -12.09468599, -2.2124418 ],
[ -4.51537461, -3.28200078, -3.42819311]])
我想念什么吗?