Question

在该计划中，我每隔2.5秒扫描一次40 x 64 x 64图像的时间序列拍摄的大量样本。因此，每个图像中的“体素”（3D像素）数量为~168,000 ish（40 * 64 * 64），每个都是图像样本的“特征”。

我想过使用递归特征消除（RFE）。然后使用主成分分析（PCA）进行跟进，因为n的高度可以降低维数。

有9个课程要预测。因此是一个多类别的分类问题。从RFE开始：

estimator = SVC(kernel='linear')
rfe = RFE(estimator,n_features_to_select= 20000, step=0.05)
rfe = rfe.fit(X_train,y_train)
X_best = rfe.transform(X_train)

现在执行PCA：

X_best = scale(X_best)

def get_optimal_number_of_components():
    cov = np.dot(X_best,X_best.transpose())/float(X_best.shape[0])
    U,s,v = svd(cov)
    print 'Shape of S = ',s.shape

    S_nn = sum(s)

    for num_components in range(0,s.shape[0]):
        temp_s = s[0:num_components]
        S_ii = sum(temp_s)
        if (1 - S_ii/float(S_nn)) <= 0.01:
            return num_components

    return s.shape[0]

n_comp = get_optimal_number_of_components()
print 'optimal number of components = ', n_comp

pca = PCA(n_components = n_comp)
pca = pca.fit(X_best)
X_pca_reduced = pca.transform(X_best)

使用SVM训练简化的组件数据集

svm = SVC(kernel='linear',C=1,gamma=0.0001)
svm = svm.fit(X_pca_reduced,y_train)

现在将训练集转换为RFE-PCA减少并进行预测

X_test = scale(X_test)
X_rfe = rfe.transform(X_test)
X_pca = pca.transform(X_rfe)

predictions = svm.predict(X_pca) 

print 'predictions = ',predictions
print 'actual = ',y_test

我为我的数据子集训练了它，并且 76.92％。我并不太担心这个数字较低，因为它仅针对我的数据集的1/12进行训练。

我尝试将训练大小加倍并获得 92％准确度。这很不错。但后来我针对整个数据集进行了训练，并且看到了 92.5％

的准确度

因此，数据集增加6倍，准确度提高了0.5％。此外，数据样本没有噪声。所以样品没有问题。

此外，对于数据集训练大小的1/12，当我选择n_features_to_select = 1000时，我得到相同的76.92％。（执行RFE时相同，20000 !!）。这里肯定有问题。为什么在选择这么少的功能时我会获得相同的性能？

RFE为所选的不同数量的功能提供相同的精度

0 个答案: