Pyspark MLlib中的自定义算法:'function'对象没有属性'_input_kwargs'

时间:2017-07-19 11:27:54

标签: python pyspark apache-spark-mllib

我正在尝试在Pyspark中推出自己的MLlib Pipeline算法,但我无法解决以下错误:

'function' object has no attribute '_input_kwargs'

我已将代码缩减至此:

from pyspark import keyword_only
from pyspark.ml.param import Params

class SimpleAlgorithm(Params):

    @keyword_only
    def __init__(self, threshold=2.0):

        super(SimpleAlgorithm, self).__init__()
        kwargs = self.__init__._input_kwargs
        self.setParams(**kwargs)

    @keyword_only
    def setParams(self, threshold=2.0):

        kwargs = self.setParams._input_kwargs
        return self._set(**kwargs)

但是当我这样做时:

test_alg = SimpleAlgorithm()

我收到上面显示的错误。有人知道我做错了吗? (我看到一个明显相关的问题here但是,作为一个Python新手,我不确定我是否理解答案或者它是否与我相关。)

编辑如果它在这里有帮助,那么在Spark 2.1 Databricks集群上运行它之后会出现错误的完整堆栈跟踪:

----> 1 test_alg = SimpleAlgorithm()

/databricks/spark/python/pyspark/__init__.py in wrapper(self, *args, **kwargs)
     99             raise TypeError("Method %s forces keyword arguments." % func.__name__)
    100         self._input_kwargs = kwargs
--> 101         return func(self, **kwargs)
    102     return wrapper
    103 

<ipython-input-12-dcb888c97487> in __init__(self, threshold)
      8 
      9         #super(SimpleAlgorithm, self).__init__()
---> 10         kwargs = self.__init__._input_kwargs
     11         self.setParams(**kwargs)
     12 

AttributeError: 'function' object has no attribute '_input_kwargs'

2 个答案:

答案 0 :(得分:2)

问题在于这一行:

kwargs = self.__init__._input_kwargs

您正尝试访问_input_kwargs对象function上的属性self.__init__,该属性不存在。

如果您只是尝试设置threshold属性,我通常会这样做:

class SimpleAlgorithm(Params):

    @keyword_only
    def __init__(self, threshold=2.0):
        super(SimpleAlgorithm, self).__init__()
        self.threshold = threshold

编辑:

我查看了您链接的源代码,并注意到keyword_only包装器的实现已从Spark 2.1更改为Spark 2.2。您的代码确实适用于Spark 2.1,但如果您使用的是Spark 2.2,则应将其替换为:

from pyspark import keyword_only
from pyspark.ml.param import Params

class SimpleAlgorithm(Params):

    @keyword_only
    def __init__(self, threshold=2.0):

        super(SimpleAlgorithm, self).__init__()
        kwargs = self._input_kwargs
        self.setParams(**kwargs)

    @keyword_only
    def setParams(self, threshold=2.0):

        kwargs = self._input_kwargs
        return self._set(**kwargs)

Link to v2.2 keyword_only implementation

Link to v2.2 example usage

答案 1 :(得分:1)

我想为@timchap答案做出贡献。如果您正在使用Zeppelin 0.7.3使用Spark 2.1.0,但您的代码在spark 2.2.0上运行,则可以检查版本并以这种方式获取兼容版本

from pyspark import keyword_only
from pyspark.ml.param import Params

class SimpleAlgorithm(Params):

    @keyword_only
    def __init__(self, threshold=2.0):

        super(SimpleAlgorithm, self).__init__()
        if spark.version.startswith('2.1'):
            kwargs = self.__init__._input_kwargs
        else:
            kwargs = self._input_kwargs
        self.setParams(**kwargs)

    @keyword_only
    def setParams(self, threshold=2.0):

        if spark.version.startswith('2.1'):
            kwargs = self.__init__._input_kwargs
        else:
            kwargs = self._input_kwargs
        return self._set(**kwargs)