我正在尝试在Pyspark中推出自己的MLlib Pipeline算法,但我无法解决以下错误:
'function' object has no attribute '_input_kwargs'
我已将代码缩减至此:
from pyspark import keyword_only
from pyspark.ml.param import Params
class SimpleAlgorithm(Params):
@keyword_only
def __init__(self, threshold=2.0):
super(SimpleAlgorithm, self).__init__()
kwargs = self.__init__._input_kwargs
self.setParams(**kwargs)
@keyword_only
def setParams(self, threshold=2.0):
kwargs = self.setParams._input_kwargs
return self._set(**kwargs)
但是当我这样做时:
test_alg = SimpleAlgorithm()
我收到上面显示的错误。有人知道我做错了吗? (我看到一个明显相关的问题here但是,作为一个Python新手,我不确定我是否理解答案或者它是否与我相关。)
编辑如果它在这里有帮助,那么在Spark 2.1 Databricks集群上运行它之后会出现错误的完整堆栈跟踪:
----> 1 test_alg = SimpleAlgorithm()
/databricks/spark/python/pyspark/__init__.py in wrapper(self, *args, **kwargs)
99 raise TypeError("Method %s forces keyword arguments." % func.__name__)
100 self._input_kwargs = kwargs
--> 101 return func(self, **kwargs)
102 return wrapper
103
<ipython-input-12-dcb888c97487> in __init__(self, threshold)
8
9 #super(SimpleAlgorithm, self).__init__()
---> 10 kwargs = self.__init__._input_kwargs
11 self.setParams(**kwargs)
12
AttributeError: 'function' object has no attribute '_input_kwargs'
答案 0 :(得分:2)
问题在于这一行:
kwargs = self.__init__._input_kwargs
您正尝试访问_input_kwargs
对象function
上的属性self.__init__
,该属性不存在。
如果您只是尝试设置threshold
属性,我通常会这样做:
class SimpleAlgorithm(Params):
@keyword_only
def __init__(self, threshold=2.0):
super(SimpleAlgorithm, self).__init__()
self.threshold = threshold
我查看了您链接的源代码,并注意到keyword_only
包装器的实现已从Spark 2.1更改为Spark 2.2。您的代码确实适用于Spark 2.1,但如果您使用的是Spark 2.2,则应将其替换为:
from pyspark import keyword_only
from pyspark.ml.param import Params
class SimpleAlgorithm(Params):
@keyword_only
def __init__(self, threshold=2.0):
super(SimpleAlgorithm, self).__init__()
kwargs = self._input_kwargs
self.setParams(**kwargs)
@keyword_only
def setParams(self, threshold=2.0):
kwargs = self._input_kwargs
return self._set(**kwargs)
答案 1 :(得分:1)
我想为@timchap答案做出贡献。如果您正在使用Zeppelin 0.7.3
使用Spark 2.1.0
,但您的代码在spark 2.2.0
上运行,则可以检查版本并以这种方式获取兼容版本
from pyspark import keyword_only
from pyspark.ml.param import Params
class SimpleAlgorithm(Params):
@keyword_only
def __init__(self, threshold=2.0):
super(SimpleAlgorithm, self).__init__()
if spark.version.startswith('2.1'):
kwargs = self.__init__._input_kwargs
else:
kwargs = self._input_kwargs
self.setParams(**kwargs)
@keyword_only
def setParams(self, threshold=2.0):
if spark.version.startswith('2.1'):
kwargs = self.__init__._input_kwargs
else:
kwargs = self._input_kwargs
return self._set(**kwargs)