我围绕sklearn
中的一个现有类创建了一个传递包装类,它的行为与预期不符:
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder
tiny_df = pd.DataFrame({'x': ['a', 'b']})
class Foo(OrdinalEncoder):
def __init__(self, *args, **kwargs):
super().__init__(self, *args, **kwargs)
def fit(self, X, y=None):
super().fit(X, y)
return self
oe = OrdinalEncoder()
oe.fit(tiny_df) # works fine
foo = Foo()
foo.fit(tiny_df) # fails
我收到的错误消息的相关部分是:
~\.conda\envs\pytorch\lib\site-packages\sklearn\preprocessing\_encoders.py in _fit(self, X, handle_unknown)
69 raise ValueError("Unsorted categories are not "
70 "supported for numerical categories")
---> 71 if len(self._categories) != n_features:
72 raise ValueError("Shape mismatch: if n_values is an array,"
73 " it has to be of shape (n_features,).")
TypeError: object of type 'Foo' has no len()
尽管我已经在类的_categories
方法中调用了父级构造函数,但似乎仍未设置父级的私有属性__init__()
。我必须在这里缺少一些简单的东西,希望能对您有所帮助!
答案 0 :(得分:2)
您不必再次将self
传递给super
函数。并且scikit-learn
的估算器应始终在其__init__
的签名中指定其参数,并且不允许使用varargs
,否则您将获得RUNTIMEERROR
,因此必须将其删除。我已经修改了您的代码,如下所示:
import pandas as pd
from sklearn.preprocessing import OrdinalEncoder
tiny_df = pd.DataFrame({'x': ['a', 'b']})
class Foo(OrdinalEncoder):
def __init__(self, **kwargs):
super().__init__(**kwargs)
def fit(self, X, y=None):
super().fit(X, y)
return self
oe = OrdinalEncoder()
oe.fit(tiny_df) # works fine
foo = Foo()
foo.fit(tiny_df) # works fine too
样品输出
foo.transform(tiny_df)
array([[0.],
[1.]])
有点额外
class Foo(OrdinalEncoder):
def __init__(self, *args, **kwargs):
super().__init__(*args,**kwargs)
def fit(self, X, y=None):
super().fit(X, y)
return self
创建Foo
时:
foo= Foo()
RuntimeError: scikit-learn estimators should always specify their parameters in the signature of their __init__ (no varargs). <class '__main__.Foo'> with constructor (self, *args, **kwargs) doesn't follow this convention.
希望有帮助!