Question

我正在尝试执行LDA（线性判别分析），以对我的数据集（features，一个1360x532矩阵）进行降维（来自532个特征）。

lda = LinearDiscriminantAnalysis(n_components=80)
features = lda.fit(features, target).transform(features)
print("[STATUS] LDA performed")
print("[STATUS] feature vector size {}".format(np.array(features).shape))

我编写了这段代码，并期待这些功能成为80，但我得到了这个意外的输出。

[STATUS] target labels shape: (1360,)
/home/robb/.local/lib/python2.7/site-packages/sklearn/discriminant_analysis.py:388: UserWarning: Variables are collinear.
  warnings.warn("Variables are collinear.")
[STATUS] LDA performed
[STATUS] feature vector size (1360, 16)

为什么16？它与我收到的警告有某种联系吗？

Answer 1

LDA的行为不符合您的预期。

组件的数量总是始终，少于唯一类的数量。

从docs：

减少维数的组件数（

我的猜测是，target变量中有17个唯一的类标签，因此，通过指定80个组件（大于17个）可以得出此结果。

LDA选择的功能数量意外

1 个答案: