我目前正在阅读“使用Scikit-Learn& TensorFlow进行动手机器学习”。当我尝试重新创建Transformation Pipelines代码时出错。我该如何解决这个问题?
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
num_pipeline = Pipeline([('imputer', Imputer(strategy = "median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
])
housing_num_tr = num_pipeline.fit_transform(housing_num)
from sklearn.pipeline import FeatureUnion
num_attribs = list(housing_num)
cat_attribs = ["ocean_proximity"]
num_pipeline = Pipeline([
('selector', DataFrameSelector(num_attribs)),
('imputer', Imputer(strategy = "median")),
('attribs_adder', CombinedAttributesAdder()),
('std_scaler', StandardScaler()),
])
cat_pipeline = Pipeline([('selector', DataFrameSelector(cat_attribs)),
('label_binarizer', LabelBinarizer()),
])
full_pipeline = FeatureUnion(transformer_list = [("num_pipeline", num_pipeline),
("cat_pipeline", cat_pipeline),
])
# And we can now run the whole pipeline simply:
housing_prepared = full_pipeline.fit_transform(housing)
housing_prepared
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
<ipython-input-350-3a4a39e5bc1c> in <module>()
43
44 num_pipeline = Pipeline([
---> 45 ('selector', DataFrameSelector(num_attribs)),
46 ('imputer', Imputer(strategy = "median")),
47 ('attribs_adder', CombinedAttributesAdder()),
NameError: name 'DataFrameSelector' is not defined
答案 0 :(得分:13)
DataFrameSelector
,需要导入。它不是sklearn
的一部分,但sklearn-features中提供了相同名称的内容:
from sklearn_features.transformers import DataFrameSelector
(DOCS)
答案 1 :(得分:4)
from sklearn.base import BaseEstimator, TransformerMixin
class DataFrameSelector(BaseEstimator, TransformerMixin):
def __init__(self, attribute_names):
self.attribute_names=attribute_names
def fit(self, X, y=None):
return self
def transform(self, X):
return X[self.attribute_names].values
这应该有效。
答案 2 :(得分:1)
如果您通过Sklearn和Tensorflow关注机器学习之手, 在接下来的页面上,一个定制的Dataframe生成器
from sklearn.pipeline import FeatureUnion
class DataFrameSelector(BaseEstimator, TransformerMixin):
def __init__(self, attribute_names):
self.attribute_names = attribute_names
def fit(self, X, y=None):
return self
def transform(self, X):
return X[self.attribute_names].values
答案 3 :(得分:0)
from sklearn.pipeline import FeatureUnion
class DataFrameSelector(BaseEstimator, TransformerMixin):
def __init__(self, attribute_names):
self.attribute_names = attribute_names
def fit(self, X, y=None):
return self
def transform(self, X):
return X[self.attribute_names].values
可能有效。
答案 4 :(得分:0)
您似乎正在从事《California Housing Price Predictions
》一书中的一个项目 Hands-On Machine Learning with Scikit-learn and TensorFlow
。
错误
<块引用>NameError: name 'DataFrameSelector' 未定义
出现是因为 sklearn 中没有 DataFrameSelector
转换器。要克服此错误,您需要为此编写自己的自定义转换器。
在本书中,您可以在下一页找到 DataFrameSelector
转换器代码,但我也会在下面复制此代码。
from sklearn.base import BaseEstimator, TransformerMixin
class DataFrameSelector(BaseEstimator, TransformerMixin):
def __init__(self, attribute_names):
self.attribute_names = attribute_names
def fit(self, X, y=None):
return self
def transform(self, X):
return X[self.attribute_names].values
BaseEstimator
和 TransformerMixin
类用于继承 fit()
、transform()
和 fit_transform()
方法。
现在 sklearn-pandas 中还有另一个类 DataFrameMapper
也有类似的目标。
您可以从以下链接找到有关此类的详细信息:
DataFrameMapper