我有两个RDD。第一个包含
(pID, Name, Price, Column1)
第二个包含
(pID, Seller, Column3)
我想获得pID相同的第3列。我仍然想保留第一个RDD格式。我无法弄清楚输出这个逻辑。我也对功能编程逻辑感到困扰。请帮帮我。
答案 0 :(得分:1)
val as = List((101, ("iteam A", 1.24)),
(102, ("iteam B", 2.45)),
(103, ("iteam C", 3.54)))
val rdd1 = sc.parallelize(as) // Pair Rdd with key = pId, value = (name, price)
val ls = List((101, "Seller A"),
(101, "Seller B"),
(102, "Seller C"),
(102, "Seller D"),
(103, "Seller E"))
val rdd2 = sc.parallelize(ls) // Pair Rdd with key = pId, value = (seller)
//call inner join:
val innerJoinedRdd = rdd1.join(rdd2)
innerJoinedRdd.collect().foreach(println)
(101,((iteam A,1.24),Seller A))
(101,((iteam A,1.24),Seller B))
(102,((iteam B,2.45),Seller C))
(102,((iteam B,2.45),Seller D))
(103,((iteam C,3.54),Seller E))