我正在尝试使用FeatureUnion()函数来组合不同的功能:dentisty_undictionary,file_length,tdm。而tdm是由TfidfVectorizer()生成的一种向量。代码在这里:
process_features = Pipeline(
[
('features',FeatureUnion(transformer_list=[('dentisty_undictionary',train_set.dentisty_undictionary),
('file_length',train_set.file_length),
('tdm',train_set.tdm)])),
('svc', SVC(kernel='linear')),
])
然后我得到一个错误:
Traceback (most recent call last):
File "NBayes_Predict_FeatureUnion.py", line 29, in <module>
('tdm',train_set.tdm)])),
File "C:\Python27\lib\site-packages\sklearn\pipeline.py", line 622, in __init__
self._validate_transformers()
File "C:\Python27\lib\site-packages\sklearn\pipeline.py", line 666, in _validate_transformers
(t, type(t)))
TypeError: All estimators should implement fit and transform.'[0.8125, 0.7597402597402597, 0.7703513281919452,.......,0.7914338919925512]' (type <type 'list'>) doesn't
我很高兴sklearn.Any建议解决这个错误将受到欢迎。谢谢!
答案 0 :(得分:2)
FeatureUnion处理来自sklearn的类对象,它们实现fit()
和transform()
方法。您正在向FeatureUnion提供数据(列),这就是错误的原因。
删除FeatureUnion和Pipeline,直接向SVC提供所需的列:
train_data = train_set[['dentisty_undictionary', 'file_length', 'tdm']]
model = SVC(kernel='linear')
model.fit(train_data, y)
请参阅examples here以了解FeatureUnion的正确用法。