我用Python编写了这个自定义变换器。目的是在Pipeline类中使用它来对数据预处理步骤进行排序。我的数据集有9个数字,第10个列是分类的。
from sklearn.base import BaseEstimator, TransformerMixin
class DataFrameSelector(BaseEstimator, TransformerMixin):
def _init_(self, attribute_names):
self.attribute_names = attribute_names
def fit(self, X, y=None):
return self
def transform(self, X):
return X[self.attribute_names].values
在我尝试运行这段代码时定义此类后,我收到错误列在下面
FYI .... datasets_num是仅包含数字列/属性的数据帧。
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
num_attributes = list(datasets_num)
cat_attributes = ["ocean_proximity"]
num_pipeline = Pipeline([
('selector', DataFrameSelector(num_attributes)),
('imputer', Imputer(strategy = "median")),
('std_scalar', StandardScaler())
])
cat_pipeline = Pipeline([
('selector', DataFrameSelector(cat_attributes)),
('label_binarizer', LabelBinarizer())
])
错误:
Traceback (most recent call last):
File "<ipython-input-34-f509d02ccc6e>", line 7, in <module>
('selector', DataFrameSelector(num_attributes)),
TypeError: object() takes no parameters
答案 0 :(得分:1)
下面:
class DataFrameSelector(BaseEstimator, TransformerMixin):
def _init_(self, attribute_names):
你想要双下划线:
def __init__(self, attribute_names):