如何在RDD中提取数组字符串数组中的值

时间:2018-03-16 05:57:43

标签: scala apache-spark apache-spark-sql rdd

val rdd :Array[Array[String]] = Array(Array("2345","345","fghj","dfhg")
                                ,Array("2345","3450","fghj","dfhg")
                                ,Array("23145","1345","fghj","dffghg")
                                ,Array("23045","345","feghj","adfhg"))

这是我的意见。我需要以键值对的形式提取每个数组的前两个元素。

我想获得输出

(2345,345)
(2345,3450)
(23145,1345)
(23045,345)

1 个答案:

答案 0 :(得分:2)

你可以简单地做

rdd.map(array => (array(0), array(1)))
//res0: Array[(String, String)] = Array((2345,345), (2345,3450), (23145,1345), (23045,345))

如果您想要Map中的输出,那么您可以添加.toMap函数调用

rdd.map(array => (array(0), array(1))).toMap
//res0: scala.collection.immutable.Map[String,String] = Map(2345 -> 3450, 23145 -> 1345, 23045 -> 345)