我在RDD中有这样的数据:
RDD[((Int, Int, Int), ((Int, Int), Int))]
为:
(((9,679,16),((2,274),1)), ((250,976,13),((2,218),1)))
我希望输出为:
((9,679,16,2,274,1),(250,976,13,2,218,1))
加入2 rdds后:
val joinSale = salesTwo.join(saleFinal)
我得到了结果集。我尝试了以下代码。
joinSale.flatMap(x => x).take(100).foreach(println)
我试过map / flatMap但是无法做到。任何想法如何实现这样的场景?提前谢谢..
答案 0 :(得分:5)
您可以使用scala中的模式匹配来执行此操作。只需将您的元组修改逻辑包装在类似于下面的地图中:
val mappedJoinSale = joinSale.map { case ((a, b, c), ((d, e), f)) => (a, b, c, d, e, f) }
使用您的示例,我们有:
scala> val example = sc.parallelize(Array(((9,679,16),((2,274),1)), ((250,976,13),((2,218),1))))
example: org.apache.spark.rdd.RDD[((Int, Int, Int), ((Int, Int), Int))] = ParallelCollectionRDD[0] at parallelize at <console>:12
scala> val mapped = example.map { case ((a, b, c), ((d, e), f)) => (a, b, c, d, e, f) }
mapped: org.apache.spark.rdd.RDD[(Int, Int, Int, Int, Int, Int)] = MappedRDD[1] at map at <console>:14
scala> mapped.take(2).foreach(println)
...
(9,679,16,2,274,1)
(250,976,13,2,218,1)
答案 1 :(得分:3)
您还可以使用奇妙的shapeless库创建通用元组展平器,如下所示:
import shapeless._
import shapeless.ops.tuple
trait LowLevelFlatten extends Poly1 {
implicit def anyFlat[T] = at[T](x => Tuple1(x))
}
object concat extends Poly2 {
implicit def atTuples[T1, T2](implicit prepend: tuple.Prepend[T1, T2]): Case.Aux[T1, T2, prepend.Out] =
at[T1,T2]((t1,t2) => prepend(t1,t2))
}
object flatten extends LowLevelFlatten {
implicit def tupleFlat[T, M](implicit
mapper: tuple.Mapper.Aux[T, flatten.type, M],
reducer: tuple.LeftReducer[M, concat.type]
): Case.Aux[T, reducer.Out] =
at[T](t => reducer(mapper(t)))
}
现在,在import shapeless._
存在的任何代码中,您都可以将其用作
joinSale.map(flatten)