应用错误收集

如何在pySpark中将对象集合序列化为RDD？我遇到了一个问题：在Scala中，只需要“类”扩展可序列化，但是在python中怎么办？

喜欢以下代码：

SurfaceView

class test:
    data = 1
    def __init__(self):
        self.property=0

    def test2(self):
        print('hello')

if __name__ == '__main__':
    p1 = test()
    p2 = test()
    a = [p1, p2]
    sc = SparkContext('local[2]', 'test' )
    rdd = sc.parallelize(a)
    rdd.map(lambda x : x.property).collect()
    sc.stop()

我正在网上搜索很长时间。但是没用。请提供帮助或尝试提供一些实现方法的建议。

如何在pySpark中将对象集合序列化为RDD？

0 个答案: