我对Scikit-Learn的PCA transform方法有疑问。找到代码here - 向下滚动以找到<td onclick="load(testTile)"></td>
方法。
他们在此simple example中显示了该过程 - 该过程首先适合然后转换:
transform()
我试图按如下方式手动执行此操作:
pca.fit(X) #step 1: fit()
X_transformed = fast_dot(X, self.components_.T) #step 2: transform()
预期:
import numpy as np
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
from sklearn.utils.extmath import fast_dot
iris = load_iris()
X = iris.data
y = iris.target
pca = PCA(n_components=3)
pca.fit(X)
Xm = X.mean(axis=1)
print pca.transform(X)[:5,:] #Method 1 - expected
X = X - Xm[None].T # or can use X = X - Xm[:, np.newaxis]
print fast_dot(X,pca.components_.T)[:5,:] #Method 2 - manual
手册
[[-2.68420713 -0.32660731 0.02151184]
[-2.71539062 0.16955685 0.20352143]
[-2.88981954 0.13734561 -0.02470924]
[-2.7464372 0.31112432 -0.03767198]
[-2.72859298 -0.33392456 -0.0962297 ]]
如您所见,两个结果不同。 [[-0.98444292 -2.74509617 2.28864171]
[-0.75404746 -2.44769323 2.35917528]
[-0.89110797 -2.50829893 2.11501947]
[-0.74772562 -2.33452022 2.10205674]
[-1.02882877 -2.75241342 2.17090017]]
方法中某处缺少一个步骤吗?
答案 0 :(得分:0)
我不是PCA的优秀专家,但通过查看sklearn源代码,我发现了你的问题 - 你沿着错误的轴取平均值。
以下是解决方案:
Xm = X.mean(axis=0) # Axis 0 instead of 1
print pca.transform(X)[:5,:] #Method 1 - expected
X = X - Xm # No need for transpose now
print fast_dot(X,pca.components_.T)[:5,:] #Method 2 - manual
结果:
[[-2.68420713 0.32660731 -0.02151184]
[-2.71539062 -0.16955685 -0.20352143]
[-2.88981954 -0.13734561 0.02470924]
[-2.7464372 -0.31112432 0.03767198]
[-2.72859298 0.33392456 0.0962297 ]]
[[-2.68420713 0.32660731 -0.02151184]
[-2.71539062 -0.16955685 -0.20352143]
[-2.88981954 -0.13734561 0.02470924]
[-2.7464372 -0.31112432 0.03767198]
[-2.72859298 0.33392456 0.0962297 ]]