我正在阅读CSV文件并创建数据框。我想要的输出是List。我的数据框看起来像
+---------------------+-----------------------+---------------------+
|last_value |processed_on |notes |
+---------------------+-----------------------+---------------------+
|2017-01-10 00:10:00.0|2017-10-09 08:32:33.689|2017-01-04,2017-05-09|
|2016-01-20 00:05:00.0|2017-10-09 08:33:18.567|2017-01-10,2017-01-20|
+---------------------+-----------------------+---------------------+
我有一个Bean类,它将存储一行输出。
public class MyClass implements Serializable {
private String last_value;
private String proccessed_on;
private String notes;
public CheckPointEntity() {}
public void setLastValue(String last_value)
{this.last_value=last_value;}
public String getLastValue(){return last_value;}
public void setProccessedOn(String proccessed_on){
this.proccessed_on=proccessed_on;
}
public String getProccessed(){
return proccessed_on;
}
public void setNotes(String notes){
this.notes=notes;
}
public String getNotes(){
return notes;
}
}
我正在尝试将数据框转换为List
Dataset<Row> df = SparkSession.read()
.format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load("/tmp/somepath");;
df.map((MapFunction<Row, MyClass>)
Row -> setBean(
(String) Row.get(schemaVal.get("last_value")),
(String) Row.get(schemaVal.get("processed_on")),
(String) Row.get(schemaVal.get("notes"))
),
Encoders.bean(MyClass.class)
).collectAsList();
但我收到错误
java.lang.NullPointerException
at org.spark_project.guava.reflect.TypeToken.method(TypeToken.java:465)
at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:126)
at org.apache.spark.sql.catalyst.JavaTypeInference$$anonfun$2.apply(JavaTypeInference.scala:125)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:186)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:186)
at org.apache.spark.sql.catalyst.JavaTypeInference$.org$apache$spark$sql$catalyst$JavaTypeInference$$inferDataType(JavaTypeInference.scala:125)
at org.apache.spark.sql.catalyst.JavaTypeInference$.inferDataType(JavaTypeInference.scala:55)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.javaBean(ExpressionEncoder.scala:89)
at org.apache.spark.sql.Encoders$.bean(Encoders.scala:142)
at org.apache.spark.sql.Encoders.bean(Encoders.scala)
这里有什么问题?应该采用不同的方法吗?