I have data like the following in a CSV file:
ColumnA,1,2,3,2,1
"YYY",242,34234,232,322,432
"ZZZ",16,435,363,3453,3434
I want to read it with https://github.com/databricks/spark-csv
I would like to read this into a DataFrame and condense all the columns except the first one into a Seq
.
So I would like to obtain something like this from it:
MyCaseClass("YYY", Seq(242,34234,232,322,432))
MyCaseClass("ZZZ", Seq(16,435,363,3453,3434))
I'm not sure how to obtain that.
I tried reading like this, where url is the location of the file:
val rawData = sqlContext.read
.format("com.databricks.spark.csv")
.option("header", "true")
.option("inferSchema", "true")
.load(url)
Then, I am mapping it into the values that I want.
The problem is that I get the error:
The header contains a duplicate entry: '1'
So how can I condense all the fields except the first into a Seq
using spark-csv
?
EDIT
I can not change the format of the input.
答案 0 :(得分:-1)
you can do by mapping over row . And also as Pawel's comment duplicate column name is not allowed. So, you can do like :
val dataFrame = yourCSV_DataFrame
dataFrame.map{row =>
Row(row(0), Seq(row(1), row(2), row(3) ...))
}