我有一个类型为
的配对RDDArray[((String, String), ((String, String, String, String, String), (Double, Double)))]
E.g: -
scala> joinWD.collect
res75: Array[((String, String), ((String, String, String, String, String), (Double, Double)))] = Array(((82010200-01,2008),((Acorn Lake,Washington,Lower St. Croix River,-92.97171054,45.01655642),(1.0413333177566528,0.04000000283122063))),
((82010200-01,2008),((Acorn Lake,Washington,Lower St. Croix River,-92.97171054,45.01655642),(1.0413333177566528,0.04000000283122063)))]
我想将其展平为A rray[(String, String),String, String, String, String, String, Double, Double]
。第一个元组是键,所有其他元素都是值。
我们如何使用Spark / Scala展平它?
答案 0 :(得分:1)
据我所知,没有flatten
元组的方法(除非你使用shapeless),所以map
可能看起来不太开心:
val myArr: Array[((String, String), ((String, String, String, String, String), (Double, Double)))] = Array(
(("82010200-01", "2008"), (("Acorn Lake", "Washington", "Lower St. Croix River", "-92.97171054", "45.01655642"), (1.0413333177566528, 0.04000000283122063))),
(("82010200-01", "2008"), (("Acorn Lake", "Washington", "Lower St. Croix River", "-92.97171054", "45.01655642"), (1.0413333177566528, 0.04000000283122063)))
)
myArr.map{ case (k, (u, v)) => (k, u._1, u._2, u._3, u._4, u._5, v._1, v._2) }
res1: Array[((String, String), String, String, String, String, String, Double, Double)] = Array(
((82010200-01, 2008), Acorn Lake, Washington, Lower St. Croix River, -92.97171054, 45.01655642, 1.0413333177566528, 0.04000000283122063),
((82010200-01, 2008), Acorn Lake, Washington, Lower St. Croix River, -92.97171054, 45.01655642, 1.0413333177566528, 0.04000000283122063)
)