我正在与Titanic数据集一起尝试构建自定义转换器,该转换器将两列加在一起,并通过管道上的fit_transform方法输出结果的numpy数组。
我建立了一个类“ dfSelector”,用于选择感兴趣的两列“ SibSp”和“ Parch”,并创建了以下类“ family_size”以将两者相加。管道中还有一个麻烦的人来照顾数据框中的任何空条目。
class family_size(BaseEstimator, TransformerMixin):
def __init__(self, sibsp, parch):
self.sibsp = sibsp
self.parch = parch
def fit(self, X, y=None):
return self
def transform(self, X, y=None):
family = (self.sibsp + self.parch)
return family
# Run through a pipeline with an imputer to take care of 'nan' rows
family_pipeline = Pipeline([('column_selector', dfSelector(['SibSp', 'Parch'])),
('num_imputer', SimpleImputer(strategy='median')),
('family_encoder', family_size(['SibSp', 'Parch'])),
])
family_pipeline.fit_transform(traindf)
TypeError __init__() missing 1 required positional argument: 'parch'
如何获得两列之和的数组?