py4j.Py4JException:将类方法传递给pyspark

时间:2019-12-16 10:03:56

标签: pyspark

我在一个类中有一个rdd,并且将map函数定义为类方法,当我将map函数传递给pyspark map时,它引发了错误:py4j.Py4JException: Method __getstate__([]) does not exist,我的代码:

class A(object):
def __init__(self):
    conf = SparkConf().setMaster("local[*]").setAppName("A")
    self.spark = SparkSession.builder.config(conf=conf).getOrCreate()

def f(self):
    mapper = self.mapper
    rdds = self.spark.sparkContext.parallelize([1, 2, 3])
    print(rdds.map(mapper).collect())

# @staticmethod
def mapper(self, row):
    s = []
    for i in range(5):
        if row == 1:
            if len(s) >= 2:
                break
        if row == 2:
            if len(s) >= 3:
                break
        s.append(row)
    return s

有人说自我无法传递给工作人员,所以我使用mapper = self.mapper,但仍然无法正常工作,除了在mapper中添加staticmethod装饰器之外,我该如何处理?

0 个答案:

没有答案