Flatmap scala [String,String,List [String]]

时间:2017-06-06 14:17:03

标签: scala apache-spark extract rdd flatmap

我有这个prbolem,我有RDD[(String,String, List[String]),我想“平面图”它以获得RDD[(String,String, String)]

e.g:

val x :RDD[(String,String,  List[String]) = 
RDD[(a,b, list[ "ra", "re", "ri"])]

我想得到:

val result: RDD[(String,String,String)] = 
RDD[(a, b, ra),(a, b, re),(a, b, ri)])]

2 个答案:

答案 0 :(得分:7)

使用flatMap

val rdd = sc.parallelize(Seq(("a", "b", List("ra", "re", "ri"))))
// rdd: org.apache.spark.rdd.RDD[(String, String, List[String])] = ParallelCollectionRDD[7] at parallelize at <console>:28

rdd.flatMap{ case (x, y, z) => z.map((x, y, _)) }.collect
// res23: Array[(String, String, String)] = Array((a,b,ra), (a,b,re), (a,b,ri))

答案 1 :(得分:0)

这是另一种使用flatMap再次

的方法
val rdd  =  sparkContext.parallelize(Seq(("a", "b", List("ra", "re", "ri"))))
rdd.flatMap(array => array._3.map(list => (array._1, array._2, list))).foreach(println)