我使用sklearn使用Naive Bayes实现了PCA,并使用GridSearchCV优化了PCA数量的组件。
我试图找出最佳估算器的功能名称,但我无法做到。这是我尝试过的代码。
from sklearn.cross_validation import train_test_split
features_train, features_test, labels_train, labels_test = \
train_test_split(features, labels, test_size=0.3, random_state=42)
### A Naive Bayes classifier combined with PCA is used and its accuracy is tested
pca = decomposition.PCA()
#clf = GaussianNB()
clf = Pipeline(steps=[('pca', pca), ('gaussian_NB', GaussianNB())])
n_components = [3, 5, 7, 9]
clf = GridSearchCV(clf,
dict(pca__n_components=n_components))
# from sklearn.tree import DecisionTreeClassifier
#clf = DecisionTreeClassifier(random_state=0, min_samples_split=20)
clf = clf.fit(features_train, labels_train)
features_pred = clf.predict(features_test)
print "The number of components of the best estimator is ", clf.best_estimator_.named_steps['pca'].n_components
print "The best parameters:", clf.best_params_
#print "The best estimator", clf.best_estimator_.get_params(deep=True).gaussian_NB
# best_est = RFE(clf.best_estimator_)
# print "The best estimator:", best_est
estimator = clf.best_estimator_
print "The features are:", estimator['features'].get_feature_names()
答案 0 :(得分:2)
您似乎混淆了维度降低和功能选择。 PCA是降维技术,它不选择特征,它寻找较低维度的线性投影。您生成的特征不是您原来的特征 - 它们是这些特征的线性组合。因此,如果您的原始特征是PCA到暗淡2后的“宽度”,“高度”和“年龄”,您最终会得到“0.4 *宽度+0.1 *高度 - 0.05 *年龄”和“0.3 *高度 - 0.2 *宽度”等功能”
答案 1 :(得分:1)
似乎this answer可能就是你所追求的。它也包含一个非常好的和详尽的例子!