我的RDD为Array[Array[String]] = Array(Array(12345, 1232A, 66QQ2, ASC42, 0003A, 2294AA, AGDT33, 23881), Array(536366, 22633, 22632)....)
我希望输出为
Array[(String, String)] = Array((12345,1232A), (12345,66QQ2)....
答案 0 :(得分:1)
尝试flatmap
转换并使用其余元素发出数组的第一个元素:
rdd.flatMap(s => {
var output = new ListBuffer[Tuple2[String,String]]()
for (i <- 1 to (s.length - 1)) {
output+=((s(0), s(i)) )
}
output
}).foreach(println);
答案 1 :(得分:0)
尝试使用rdd Map和Stream将每个内部数组的头部用其尾部的每个元素压缩。
val test: Array[Array[String]] = Array(Array("12345", "1232A", "66QQ2", "ASC42", "0003A", "2294AA", "AGDT33", "23881"), Array("536366", "22633", "22632"))
val TestRdd = sc.parallelize(test)
val finalOutput: Array[(String,String)] = (TestRdd map(xs => (Stream.continually(xs.head) zip xs.tail).toList)).flatten
// finalOutput is
// res8: Array[(String, String)] = Array((12345,1232A), (12345,66QQ2), (12345,ASC42), (12345,0003A), (12345,2294AA), (12345,AGDT33), (12345,23881), (536366,22633), (536366,22632))