我有一个JavaPairRDD可以说类型为
的数据<Integer,List<Integer>>
当我做data.saveAsTextFile(“输出”)时 输出将包含以下格式的数据:
(1,[1,2,3,4])
等...
我想在输出文件中输入这样的内容:
1 1,2,3,4
i.e. 1\t1,2,3,4
任何帮助将不胜感激
答案 0 :(得分:3)
You need to understand what's happening here. You have an RDD[T,U]
where T and U are some obj types, read it as RDD of Tuple of T and U. On this RDD when you call saveAsTextFile()
, it essentially converts each element of RDD to string, hence the text file is generated as output.
Now, how is an object of some type T converted to a string? By calling the toString() on it. This is the reason why you have [] representing the List, and () representing the Tuple as whole.
Solution, map each element in your RDD to a string as per your format. I'm not that familiar with the Java Syntax but with Scala I'll do something like,
rdd.map(e=>s"${e._1}\t${e._2.mkString(",")}")
Where mkString concatenates a collection using some delimiter.
Let me know if this helped. Cheers.