流水线标准缩放器,递归功能选择和分类器

时间:2018-07-18 22:41:53

标签: scikit-learn

我有给定的数据集X和Y。 我想使用管道实施以下步骤:

- Standardscaler
- Recursive feature selection
- RandomForestClassifier
- cross-validation predict

我实现如下:

import numpy as np 
from sklearn.feature_selection import RFE, RFECV
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_predict, KFold
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
from sklearn.datasets import load_iris

data = load_iris()

X = data.data
Y = data.target

print X.shape
print Y.shape   

clf = RandomForestClassifier(n_estimators=50,max_features=None,n_jobs=-1,random_state=0)
kf = KFold(n_splits=2, shuffle=True, random_state=0)
pipeline = Pipeline([('standardscaler', StandardScaler()),
                     ('rfecv', RFECV(estimator=clf, step=1, cv=kf, scoring='accuracy', n_jobs=7)),
                      ('clf', clf)])

pipeline.fit(X,Y)

ypredict = cross_val_predict(pipeline, X, Y, cv=kf)
accuracy = accuracy_score(Y, ypredict)

print (accuracy)

请深入研究我的实现,让我知道我的代码哪里出了问题。谢谢。

1 个答案:

答案 0 :(得分:-1)

这有效。 pipeline中的最终估算器仅需要实现fit所做的REFCV。这是代码:

from sklearn.feature_selection import RFECV
from sklearn.metrics import accuracy_score
from sklearn.model_selection import cross_val_predict, KFold
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.pipeline import Pipeline
from sklearn.datasets import load_iris

data = load_iris()

X = data.data
Y = data.target

clf = RandomForestClassifier()

# create pipeline
estimators = [('standardize' , StandardScaler()),
             ('rfecv', RFECV(estimator=clf, scoring='accuracy'))]

# build the pipeline
pipeline = Pipeline(estimators)

# run the pipeline
kf = KFold(n_splits=2, shuffle=True, random_state=0)
ypredict = cross_val_predict(pipeline, X, Y, cv=kf)
accuracy = accuracy_score(Y, ypredict)

print (accuracy)

'Output':
0.96