python FeatureUnion fit_transform工作但fit适合给出错误

时间:2017-06-28 01:44:50

标签: python-3.x machine-learning scikit-learn

我试图修改我的数据框,用稀疏矩阵替换所有分类属性。我使用FeatureUnion合并了3个管道。当我使用fit_transform时,它工作得很好,但是当我尝试做恰到好处的时候给我一个错误。 我想训练这个管道以便稍后在测试数据集上使用它,这就是为什么我需要适合的部分。 我使用的是Python 3

import pandas as pd
import numpy as np
data = [[3,4,'WN','DEN','SNA',2],[6,1,'WN','FLL','DAL',1],[6,1,'WN','FLL','DAL',1],[6,1,'WN','FLL','DAL',1],[6,1,'WN','FLL','DAL',1],[6,1,'WN','FLL','DAL',1]]

df = pd.DataFrame(data, columns = ['MONTH','DAY_OF_WEEK','AIRLINE','ORIGIN_AIRPORT','DESTINATION_AIRPORT','SCHEDULED_DEPARTURE'])


from sklearn.pipeline import Pipeline
from sklearn.pipeline import FeatureUnion
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.preprocessing import LabelBinarizer
from sklearn.preprocessing import MultiLabelBinarizer

class DataFrameSelector(BaseEstimator, TransformerMixin):
    def __init__(self, attribute_names):
        self.attribute_names = attribute_names
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        return X[self.attribute_names].values

MONTH_pipeline = Pipeline([
    ('selector', DataFrameSelector(['MONTH'])),
    ('label_binarizer', LabelBinarizer()),
])

DAY_OF_WEEK_pipeline = Pipeline([
    ('selector', DataFrameSelector(['DAY_OF_WEEK'])),
    ('label_binarizer', LabelBinarizer()),
])

AIRLINE_pipeline = Pipeline([
    ('selector', DataFrameSelector(['AIRLINE'])),
    ('label_binarizer', LabelBinarizer()),
])

full_pipeline = FeatureUnion(transformer_list = [
    ('MONTH_pipeline',MONTH_pipeline),
    ('DAY_OF_WEEK_pipeline',DAY_OF_WEEK_pipeline),
    ('AIRLINE_pipeline',AIRLINE_pipeline),
])


train_set_prepared = full_pipeline.fit_transform(df)
full_pipeline.fit(df)

使用fit_transform的第一个命令可以很好地工作并给出一个想要的答案,但第二个使用恰当拟合的命令会产生错误。如果有人能帮我理解原因,我将不胜感激。

0 个答案:

没有答案