sklearn中的定制变压器

时间:2018-09-13 13:45:48

标签: python scikit-learn

我尝试制作管道并添加我的自定义变压器,如下所示:

class DataFrameSelector(BaseEstimator, TransformerMixin):
    def __init__(self, attribute_names):
        self.attribute_names = attribute_names
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        return X[list(self.attribute_names)]

class DummyTransform(BaseEstimator, TransformerMixin):

    def __init__(self):
        return None

    def transform(self, X):
        return pd.get_dummies(X).values

    def fit(self, X, y=None):
        return self

但是当我这样做时:     RF = RandomForestClassifier(n_estimators = 100,oob_score = True,random_state = 3)

pipe= Pipeline(steps=[
    ('Selector', DataFrameSelector(attribute_names=('lat','long','type'))),   # selects the second and 4th column      
    ('Encoder', DummyTransform() ) 
    ('clf',RF)
    ])
rforest=pipe.fit(X_train,Y_train)

我遇到以下错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-168-108f5c7552a0> in <module>()
      4     ('Selector', DataFrameSelector(attribute_names=('lat','long','type'))),   # selects the second and 4th column
      5     ('Encoder', DummyTransform() )
----> 6     ('clf',RF)
      7     ])
      8 rforest=pipe.fit(X_train,Y_train)

TypeError: 'tuple' object is not callable

为什么???

PS:奇怪的是这个作品:

RF=RandomForestClassifier(n_estimators=100,oob_score=True,random_state=3)

pipe= Pipeline(steps=[
    ('Selector', DataFrameSelector(attribute_names=('lat','long','type'))),   # selects the second and 4th column      
    ('Encoder', DummyTransform() ) 
    #('clf',DecisionTreeClassifier())
    ])
X=pipe.fit_transform(X_train,Y_train)
RF.fit(X,Y_train)

编辑:RF代表这一行代码
    RF = RandomForestClassifier(n_estimators = 100,oob_score = True,random_state = 3)

1 个答案:

答案 0 :(得分:2)

在错误的上方一行缺少一个逗号,最后,当您评论它时,它就起作用了,因为最后一个项目缺少逗号