如何在pyspark中编写以下scala代码?
rdd1.join(rdd2.map {case ((t, w), u) => (t, (w, u))}).map {case (t, (v, (w, u))) => ((t, w), (u, v))}.collect()
答案 0 :(得分:2)
您可以使用lambda函数:
rdd1 = sc.parallelize(range(1,10)).map(lambda x: (x, x+1))
rdd2 = sc.parallelize(range(1,10)).map(lambda x: ((x, x*2), x*3)))
rdd1.join(rdd2.map(lambda ((t, w), u): (t, (w, u)))).map(lambda (t, (v, (w, u))): ((t, w), (u, v))).collect()