Question

我正在跟随Principal component analysis in Python在Python下使用PCA，但我正在努力确定选择哪些功能（即我的哪些列/功能具有最佳差异）。

当我使用scipy.linalg.svd时，它会自动对我的奇异值进行排序，因此我无法确定它们属于哪一列。

示例代码：

import numpy as np
from scipy.linalg import svd
M = [
     [1, 1, 1, 1, 1, 1],
     [3, 3, 3, 3, 3, 3],
     [2, 2, 2, 2, 2, 2],
     [9, 9, 9, 9, 9, 9]
]
M = np.transpose(np.array(M))
U,s,Vt = svd(M, full_matrices=False)
print s

如果没有对奇异值进行排序，是否有不同的方法可以解决这个问题？

更新：看起来这可能是不可能的，至少根据Matlab论坛上的这篇文章：http://www.mathworks.com/matlabcentral/newsreader/view_thread/241607。如果有人不知道，请告诉我:)。

Answer 1

我错误地认为PCA执行功能选择，而是功能提取。

相反，PCA创建了一系列新功能，每个功能都是输入功能的组合。

从PCA，如果您真的想要进行功能选择，您可以查看PCA创建的功能上的输入功能的权重。例如，matplotlib.mlab.PCA库提供属性（more on library）中的权重：

from matplotlib.mlab import PCA
res = PCA(data)
print "weights of input vectors: %s" % res.Wt

听起来像特征提取路线是使用PCA的方式。

如何在Python中使用PCA / SVD进行特征选择和识别？

1 个答案: