我有一个类似的案例类:
case class ResultDays (name: String, number: Double, values: Double*)
我希望将其保存到.csv
文件
resultRDD.toDF()
.coalesce(1)
.write.format("com.databricks.spark.csv")
.option("header", "true")
.save("res/output/result.csv")
不幸的是我有这个错误:
java.lang.UnsupportedOperationException: CSV data source does not support array<double> data type.
那么,如何插入可变数量的values
并将其保存到.csv
?
答案 0 :(得分:1)
如果您可以假设resultRDD
中的所有记录在values
中具有相同的列数 - 您可以阅读first()
记录,使用它来确定数组中的值数,并将这些数组转换为单独的列:
// determine number of "extra" columns:
val extraCols = resultRDD.first().values.size
// create a sequence of desired columns:
val columns = Seq($"name", $"number") ++ (1 to extraCols).map(i => $"values"(i - 1) as s"col$i")
// select the above columns before saving:
resultRDD.toDF()
.select(columns: _*)
.coalesce(1)
.write.format("com.databricks.spark.csv")
.option("header", "true")
.save("res/output/result.csv")
示例CSV结果类似于:
name,number,col1,col2
a,0.1,0.01,0.001
b,0.2,0.02,0.002
c,0.3,0.03,0.003