我使用Scala创建了RDD
以下格式:
Array[(String, (Array[String], Array[String]))]
如何从Array[1]
获取RDD
的列表?
第一条数据线的数据是:
// Array[(String, (Array[String], Array[String]))]
Array(
(
966515171418,
(
Array(4579848447, 4579848453, 2015-07-29 03:27:28, 44, 1, 1, 966515171418, 966515183263, 420500052424347, 0, 52643, 9, 5067, 5084, 2, 1, 0, 0),
Array(4579866236, 4579866226, 2015-07-29 04:16:22, 37, 1, 1, 966515171418, 966515183264, 420500052424347, 0, 3083, 9, 5072, 5084, 2, 1, 0, 0)
)
)
)
答案 0 :(得分:0)
假设你有这样的东西(只需粘贴到spark-shell
):
val a = Array(
("966515171418",
(Array("4579848447", "4579848453", "2015-07-29 03:27:28", "44", "1", "1", "966515171418", "966515183263", "420500052424347", "0", "52643", "9", "5067", "5084", "2", "1", "0", "0"),
Array("4579866236", "4579866226", "2015-07-29 04:16:22", "37", "1", "1", "966515171418", "966515183264", "420500052424347", "0", "3083", "9", "5072", "5084", "2", "1", "0", "0")))
)
val rdd = sc.makeRDD(a)
然后你使用
获得第一个数组scala> rdd.first._2._1
res9: Array[String] = Array(4579848447, 4579848453, 2015-07-29 03:27:28, 44, 1, 1, 966515171418, 966515183263, 420500052424347, 0, 52643, 9, 5067, 5084, 2, 1, 0, 0)
表示第一行(即Tuple2),然后是元组的第二个元素(也是Tuple2),然后是第一个元素。
使用模式匹配
scala> rdd.first match { case (_, (array1, _)) => array1 }
res30: Array[String] = Array(4579848447, 4579848453, 2015-07-29 03:27:28, 44, 1, 1, 966515171418, 966515183263, 420500052424347, 0, 52643, 9, 5067, 5084, 2, 1, 0, 0)
如果您想获得所有行,只需使用map()
:
scala> rdd.map(_._2._1).collect()
将所有行的结果放入数组中。
另一种选择是在map()
中使用模式匹配:
scala> rdd.map { case (_, (array1, _)) => array1 }.collect()