我正在尝试使用RDD#toDS方法从RDD创建Spark DataSet。
但是,我想使用第三方库中定义的现有域对象,而不是使用Scala案例类来指定架构。但是,当我这样做时,我收到以下错误:
scala> import org.hl7.fhir.dstu3.model.Patient
import org.hl7.fhir.dstu3.model.Patient
scala> val patients = sc.loadFromMongoDB(ReadConfig(Map("uri" -> "mongodb://mongodb/fhir.patients")))
patients: com.mongodb.spark.rdd.MongoRDD[org.bson.Document] = MongoRDD[0] at RDD at MongoRDD.scala:47
scala> val patientsDataSet = patients.toDS[Patient]()
<console>:44: error: not enough arguments for method toDS: (beanClass: Class[org.hl7.fhir.dstu3.model.Patient])org.apache.spark.sql.Dataset[org.hl7.fhir.dstu3.model.Patient].
Unspecified value parameter beanClass.
val patientsDataSet = patients.toDS[Patient]()
^
这是我删除括号时得到的结果:
scala> val patientsDataSet = patients.toDS[Patient]
<console>:46: error: missing arguments for method toDS in class MongoRDD;
follow this method with `_' if you want to treat it as a partially applied function
val patientsDataSet = patients.toDS[Patient]
无论如何,我可以在这里使用Java对象来代替案例类吗?
谢谢!
答案 0 :(得分:0)
可能会创建一个扩展java对象的case类。
爪哇:
public class Patient {
private final String name;
private final String status;
public Patient(String name, String status) {
this.name = name;
this.status = status;
}
public String getName() {
return name;
}
public String getStatus() {
return status;
}
}
Scala的:
case class Patient0(name: String, status: String) extends Patient(name, status)
val patientsDataSet = patients.toDS[Patient]()
val patients = sc.loadFromMongoDB(ReadConfig(Map("uri" -> "mongodb://mongodb/fhir.patients")))