sklearn:TypeError:所有估算器都应该实现fit和transform

时间:2018-03-23 09:58:59

标签: python scikit-learn

我正在尝试使用FeatureUnion()函数来组合不同的功能:dentisty_undictionary,file_length,tdm。而tdm是由TfidfVectorizer()生成的一种向量。代码在这里:

process_features = Pipeline(
    [
        ('features',FeatureUnion(transformer_list=[('dentisty_undictionary',train_set.dentisty_undictionary),
                                ('file_length',train_set.file_length),
                                ('tdm',train_set.tdm)])),
        ('svc', SVC(kernel='linear')),
    ])

然后我得到一个错误:

Traceback (most recent call last):
  File "NBayes_Predict_FeatureUnion.py", line 29, in <module>
    ('tdm',train_set.tdm)])),
  File "C:\Python27\lib\site-packages\sklearn\pipeline.py", line 622, in __init__
    self._validate_transformers()
  File "C:\Python27\lib\site-packages\sklearn\pipeline.py", line 666, in _validate_transformers
    (t, type(t)))
TypeError: All estimators should implement fit and transform.'[0.8125, 0.7597402597402597, 0.7703513281919452,.......,0.7914338919925512]' (type <type 'list'>) doesn't

我很高兴sklearn.Any建议解决这个错误将受到欢迎。谢谢!

1 个答案:

答案 0 :(得分:2)

FeatureUnion处理来自sklearn的类对象,它们实现fit()transform()方法。您正在向FeatureUnion提供数据(列),这就是错误的原因。

删除FeatureUnion和Pipeline,直接向SVC提供所需的列:

train_data = train_set[['dentisty_undictionary', 'file_length', 'tdm']]
model = SVC(kernel='linear')
model.fit(train_data, y)

请参阅examples here以了解FeatureUnion的正确用法。