Question

我使用sklearn使用Naive Bayes实现了PCA，并使用GridSearchCV优化了PCA数量的组件。

我试图找出最佳估算器的功能名称，但我无法做到。这是我尝试过的代码。

from sklearn.cross_validation import train_test_split 
features_train, features_test, labels_train, labels_test = \
train_test_split(features, labels, test_size=0.3, random_state=42)
### A Naive Bayes classifier combined with PCA is used and its accuracy is tested 

pca = decomposition.PCA()
#clf = GaussianNB()
clf = Pipeline(steps=[('pca', pca), ('gaussian_NB', GaussianNB())])
n_components = [3, 5, 7, 9]
clf = GridSearchCV(clf,
                         dict(pca__n_components=n_components))

# from sklearn.tree import DecisionTreeClassifier
#clf = DecisionTreeClassifier(random_state=0, min_samples_split=20)
clf = clf.fit(features_train, labels_train)
features_pred = clf.predict(features_test) 
print "The number of components of the best estimator is ", clf.best_estimator_.named_steps['pca'].n_components
print "The best parameters:", clf.best_params_
#print "The best estimator", clf.best_estimator_.get_params(deep=True).gaussian_NB
# best_est = RFE(clf.best_estimator_)
# print "The best estimator:", best_est
estimator = clf.best_estimator_
print "The features are:", estimator['features'].get_feature_names()

Answer 1

您似乎混淆了维度降低和功能选择。 PCA是降维技术，它不选择特征，它寻找较低维度的线性投影。您生成的特征不是您原来的特征 - 它们是这些特征的线性组合。因此，如果您的原始特征是PCA到暗淡2后的“宽度”，“高度”和“年龄”，您最终会得到“0.4 *宽度+0.1 *高度 - 0.05 *年龄”和“0.3 *高度 - 0.2 *宽度”等功能”

Answer 2

似乎this answer可能就是你所追求的。它也包含一个非常好的和详尽的例子！

如何从gridSearchCV

2 个答案: