尝试在sklearn中进行fit_transform管道时,ColumnTransformer生成TypeError

时间:2019-11-13 02:29:51

标签: python scikit-learn

我希望这里的人能够帮助我调试部分代码。我正在尝试为Ames,爱荷华州的Kaggle比赛设置住房的预测模型,以及由于不断出错而在实施管道方面遇到问题。 这是我要运行的代码

from sklearn.preprocessing import OneHotEncoder
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer


num_attributes = list(train_set.select_dtypes(exclude=['object'])) #to select all num columns, we exclude any column with object types
cat_attributes = list(train_set.select_dtypes(include=['object'])) #here we select all columns with object types

cat_pipeline = ([
    ('imputer', SimpleImputer(fill_value='none', strategy='constant')),
    ('one_hot', OneHotEncoder())
])

full_pipeline = ColumnTransformer([
    ('num', StandardScaler(), num_attributes),
    ('cat', cat_pipeline, cat_attributes)
])

train_set_prepared = full_pipeline.fit_transform(train_set)

这是我收到的错误消息

--------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-abf9d30bdc2b> in <module>
     20 ])
     21 
---> 22 train_set_prepared = full_pipeline.fit_transform(train_set)

~\Anaconda3\envs\ml_book\lib\site-packages\sklearn\compose\_column_transformer.py in fit_transform(self, X, y)
    470         """
    471         X = _check_X(X)
--> 472         self._validate_transformers()
    473         self._validate_column_callables(X)
    474         self._validate_remainder(X)

~\Anaconda3\envs\ml_book\lib\site-packages\sklearn\compose\_column_transformer.py in _validate_transformers(self)
    277                                 "transform, or can be 'drop' or 'passthrough' "
    278                                 "specifiers. '%s' (type %s) doesn't." %
--> 279                                 (t, type(t)))
    280 
    281     def _validate_column_callables(self, X):

TypeError: All estimators should implement fit and transform, or can be 'drop' or 'passthrough' specifiers. '[('imputer', SimpleImputer(add_indicator=False, copy=True, fill_value='none',
              missing_values=nan, strategy='constant', verbose=0)), ('one_hot', OneHotEncoder(categorical_features=None, categories=None, drop=None,
              dtype=<class 'numpy.float64'>, handle_unknown='error',
              n_values=None, sparse=True))]' (type <class 'list'>) doesn't.

我知道问题特别是cat_pipeline。有谁知道这个问题可能是什么? 谢谢你的帮助

1 个答案:

答案 0 :(得分:0)

我知道了。原来我是个白痴。我忘了在cat_pipeline中启动管道 这就是应该说的

cat_pipeline = **Pipeline**([
    ('imputer', SimpleImputer(fill_value='none', strategy='constant')),
    ('one_hot', OneHotEncoder())