Scikit Learn变换方法 - 手动计算?

时间:2016-12-07 01:37:13

标签: python scikit-learn transform pca

我对Scikit-Learn的PCA transform方法有疑问。找到代码here - 向下滚动以找到<td onclick="load(testTile)"></td>方法。

他们在此simple example中显示了该过程 - 该过程首先适合然后转换:

transform()

我试图按如下方式手动执行此操作:

pca.fit(X) #step 1: fit()
X_transformed = fast_dot(X, self.components_.T) #step 2: transform()

预期:

import numpy as np
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.utils.extmath import fast_dot

iris = load_iris()
X = iris.data
y = iris.target

pca = PCA(n_components=3)

pca.fit(X)

Xm = X.mean(axis=1)
print pca.transform(X)[:5,:] #Method 1 - expected
X = X - Xm[None].T # or can use X = X - Xm[:, np.newaxis]
print fast_dot(X,pca.components_.T)[:5,:] #Method 2 - manual

手册

[[-2.68420713 -0.32660731  0.02151184]
 [-2.71539062  0.16955685  0.20352143]
 [-2.88981954  0.13734561 -0.02470924]
 [-2.7464372   0.31112432 -0.03767198]
 [-2.72859298 -0.33392456 -0.0962297 ]]

如您所见,两个结果不同。 [[-0.98444292 -2.74509617 2.28864171] [-0.75404746 -2.44769323 2.35917528] [-0.89110797 -2.50829893 2.11501947] [-0.74772562 -2.33452022 2.10205674] [-1.02882877 -2.75241342 2.17090017]] 方法中某处缺少一个步骤吗?

1 个答案:

答案 0 :(得分:0)

我不是PCA的优秀专家,但通过查看sklearn源代码,我发现了你的问题 - 你沿着错误的轴取平均值。

以下是解决方案:

Xm = X.mean(axis=0)  # Axis 0 instead of 1
print pca.transform(X)[:5,:] #Method 1 - expected
X = X - Xm  # No need for transpose now
print fast_dot(X,pca.components_.T)[:5,:] #Method 2 - manual

结果:

[[-2.68420713  0.32660731 -0.02151184]
 [-2.71539062 -0.16955685 -0.20352143]
 [-2.88981954 -0.13734561  0.02470924]
 [-2.7464372  -0.31112432  0.03767198]
 [-2.72859298  0.33392456  0.0962297 ]]
[[-2.68420713  0.32660731 -0.02151184]
 [-2.71539062 -0.16955685 -0.20352143]
 [-2.88981954 -0.13734561  0.02470924]
 [-2.7464372  -0.31112432  0.03767198]
 [-2.72859298  0.33392456  0.0962297 ]]