如何在10倍交叉验证中获得均值,标准差和准确性得分的p值

时间:2018-07-19 12:29:22

标签: random-forest cross-validation feature-extraction

您好,我通过10倍交叉验证运行随机森林递归特征提取,因此我需要报告所有10倍均值的均值,标准差和p值(对于我使用的每个数据集) 。对于我的一生,我不知道该怎么做。

这是我正在运行的代码:

# random forest#######################################
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold
from sklearn.feature_selection import RFECV
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_classification

# Build a classification task using 8 informative features
# If you want to reproduce the problem
X, y = make_classification(n_samples=1000, n_features=75, n_informative=8,
                           n_redundant=2, n_repeated=0, n_classes=8,
                           n_clusters_per_class=1, random_state=0)

# split data into train and test split
from sklearn.cross_validation import train_test_split
# if we need train test split
X_train, X_test, y_train, y_test = train_test_split(
        X, y, test_size=0.3)

# Create the RFE object and compute a cross-validated score.
rfc = RandomForestClassifier(n_estimators=128)
# The "accuracy" scoring is proportional to the number of correct
# classifications
rfecv = RFECV(estimator=rfc, step=1, cv=StratifiedKFold(10),
              scoring='accuracy')
rfecv.fit(X_train, y_train)

print("Optimal number of features : %d" % rfecv.n_features_)
print(rfecv.ranking_)

# Plot number of features VS. cross-validation scores
plt.figure()
plt.xlabel("Number of features selected")
plt.ylabel("Cross validation score (nb of correct classifications)")
plt.plot(range(1, len(rfecv.grid_scores_) + 1), rfecv.grid_scores_)
plt.show()

rfecv.predict(X_test)
ranking = rfecv.ranking_
y_hats = rfecv.predict(X_test)
predictions = [round(value) for value in y_hats]
accuracy = accuracy_score(y_test, predictions)
print("Test Accuracy: %.2f%%" % (accuracy*100.0))

添加了make_classification,以便您可以重现此问题,我正在使用不同的数据集,我希望它可以正常工作,但是我不确定,但我将其包括在内只是为了遵循有关SO的发布问题准则。如果不对,我事先表示歉意。谢谢!

0 个答案:

没有答案