Apache Spark GrpahX的代码给了我结果:
Array[(org.apache.spark.graphx.VertexId, Array[org.apache.spark.graphx.VertexId])] = Array((4,Array(17, 18, 20)), (16,Array(20)), (14,Array()), (6,Array(7)), (8,Array(9, 10)), (12,Array(1)), (20,Array(16, 19)), (18,Array()), (10,Array()), (2,Array(4, 15, 16)), (19,Array(4)), (13,Array()), (15,Array()), (11,Array(1)), (1,Array(5, 8)), (17,Array(4)), (3,Array(1, 8, 13, 14)), (7,Array(5)), (9,Array(5, 8)), (5,Array(1, 6, 7, 8)))
saveAsTextFile之后:
(16,[J@4ee106a0)
(20,[J@6d1dcef6)
(13,[J@4c3850da)
(3,[J@7e97b33a)
(8,[J@7c0ad5d1)
(2,[J@321e8c0d)
(1,[J@7964eb06)
(5,[J@172243cb)
(14,[J@519adbc6)
(18,[J@1154e795)
(15,[J@16175a92)
(7,[J@5fc8c46b)
(4,[J@6996f848)
(12,[J@34e6faa9)
(19,[J@6aec10c5)
(17,[J@69a45e4d)
(6,[J@6a94d262)
(10,[J@3c4a02cd)
(11,[J@7081d0e4)
(9,[J@78269e87)
我如何转换此数组以可读方式保存它:
(4: (17, 18, 20))
或类似的东西
答案 0 :(得分:0)
使用mkString()函数将集合转换为字符串:
scala> val records = Array((4,Array(17, 18, 20)), (16,Array(20)), (14,Array()))
records: Array[(Int, Array[_ <: Int])] = Array((4,Array(17, 18, 20)), (16,Array(20)), (14,Array()))
scala> val recordsRDD = sc.parallelize(records)
recordsRDD: org.apache.spark.rdd.RDD[(Int, Array[_ <: Int])] = ParallelCollectionRDD[0] at parallelize at <console>:14
scala> recordsRDD.map(rec => "(" + rec._1 + ": (" + rec._2.mkString(",") + "))").collect().foreach(println)
(4: (17,18,20))
(16: (20))
(14: ())
mkString方法已重载,因此您还可以添加前缀和 后缀:
val a = Array("apple", "banana", "cherry") a.mkString("[", ", ", "]") res4: String = [apple, banana,cherry]
scala> recordsRDD.map(rec => "(" + rec._1 + ": (" + rec._2.mkString(",") + "))").saveAsTextFile("/user/cloudera/col_toString1")
scala> recordsRDD.map(rec => "(" + rec._1 + rec._2.mkString(": (", ", ", ")") + ")").saveAsTextFile("/user/cloudera/col_toString2")
-----
[cloudera@quickstart ~]$ hadoop fs -cat /user/cloudera/col_toString1/p*
(4: (17,18,20))
(16: (20))
(14: ())
[cloudera@quickstart ~]$ hadoop fs -cat /user/cloudera/col_toString2/p*
(4: (17, 18, 20))
(16: (20))
(14: ())