我在一个类中有一个rdd,并且将map函数定义为类方法,当我将map函数传递给pyspark map时,它引发了错误:py4j.Py4JException: Method __getstate__([]) does not exist
,我的代码:
class A(object):
def __init__(self):
conf = SparkConf().setMaster("local[*]").setAppName("A")
self.spark = SparkSession.builder.config(conf=conf).getOrCreate()
def f(self):
mapper = self.mapper
rdds = self.spark.sparkContext.parallelize([1, 2, 3])
print(rdds.map(mapper).collect())
# @staticmethod
def mapper(self, row):
s = []
for i in range(5):
if row == 1:
if len(s) >= 2:
break
if row == 2:
if len(s) >= 3:
break
s.append(row)
return s
有人说自我无法传递给工作人员,所以我使用mapper = self.mapper
,但仍然无法正常工作,除了在mapper中添加staticmethod装饰器之外,我该如何处理?