Question

我正在尝试在python中使用PLS算法。他们在sklearn webpage for PLS中说：

T: x_scores_
P: x_loadings_
X = T P.T

因此，可以期望根据PLS负载和得分来恢复原始输入数据（居中）。但是，如果我基于bostonhousing数据集运行一个小示例：

from __future__ import division
import numpy as np
from sklearn.datasets import load_boston
from sklearn.cross_decomposition import PLSRegression

boston = load_boston()
x = boston.data #506 x 13 matrix
x_center = x - x.mean(axis=0)
y = boston.target # 506 x 1 vector
pls = PLSRegression(n_components=3, scale=False)
pls.fit(x, y)

p = pls.x_loadings_ # 13 x 3 matrix
t = pls.x_scores_  # 506 x 3 matrix

x2 = np.dot(t, p.T)

预计x2等于x_center，但我得到以下结果：

x_center[0:3, 0:3]
array([[ -3.58744071,   6.63636364,  -8.82677866],
       [ -3.56645071, -11.36363636,  -4.06677866],
       [ -3.56647071, -11.36363636,  -4.06677866]])
x2[0:3, 0:3]
array([[ -3.54146571,   6.16576566,  -3.49838208],
       [ -4.25799917, -12.09468599,  -2.2124418 ],
       [ -4.51537461,  -3.28200078,  -3.42819311]])

我想念什么吗？

PLS分数与居中数据和负荷的乘积不匹配

0 个答案: