我有Dataset[Array[String]]
字符串
12345, 2341, a465c2a, p, 2015-06-10, 2015-02-23, 2015-02-23, 2, "", 1, 98941, 1, ., 17, 21, 1, "", 67890, 4313, a465c2a, p, 2015-06-10, 2015-02-23, 2015-02-23, 2, 7391, 1, 98941, 1, ., 17, 21, 1, 01
在此字符串中从零开始,记录结束于16位置第17个索引是新记录的开始。 如何将其保存为Spark中的文本文件,以便每个新记录都以新行开头。 我知道数据集可以保存为textFile,如write.text
答案 0 :(得分:0)
这样做的一种方法是在sliding
上使用Array[String]
功能,并在"\n"
的末尾附加String
,因为您已经知道了//your original Dataset
val data: Dataset[Array[String]] = sqlContext.createDataset(Seq(Array("12345", "2341", "a465c2a",
"p", "2015-06-10", "2015-02-23", "2015-02-23", "2", " ", "1",
"98941", "1", ".", "17", "21", "1", "67890", "4313", "a465c2a",
"p", "2015-06-10", "2015-02-23", "2015-02-23", "2", "7391",
"1", "98941", "1", ".", "17", "21", "1", "01")))
//apply sliding function to the Array and append \n
val result: RDD[String] = data.rdd.map(_.sliding(17, 17).map(_.mkString(",") + "\n").mkString(""))
//to display the output
result.foreach(print(_))
//output
//12345,2341,a465c2a,p,2015-06-10,2015-02-23,2015-02-23,2, ,1,98941,1,.,17,21,1,67890
//4313,a465c2a,p,2015-06-10,2015-02-23,2015-02-23,2,7391,1,98941,1,.,17,21,1,01
//to save the result to file
result.saveAsTextFile("PATH_TO_SAVE_FILE")
的结束索引线。
New permissions added
Users who use the APK with version 18 code may need to accept the android.permission.READ_PHONE_STATE permission, which may make it impossible to upgrade to this version of the app.