spark从类型序列创建数据集

时间:2017-03-06 15:24:52

标签: scala apache-spark apache-spark-sql

与此处的spark文档类似:​​

  

http://spark.apache.org/docs/latest/sql-programming-guide.html

case class Person(name: String, age: Long)
val caseClassDS = Seq(Person("Andy", 32)).toDS()
caseClassDS.show()
Seq[org.opengis.feature.simple.SimpleFeature]

序列的

错误

/geomesaSparkFirstSteps/src/main/scala/myOrg/GeoInMemory.scala:162: value toDS is not a member of Seq[org.opengis.feature.simple.SimpleFeature]
[error]   geoResult.toDS() 

有关详细信息,请参阅https://github.com/geoHeil/geomesaSparkFirstSteps/blob/master/src/main/scala/myOrg/GeoInMemory.scala#L162

我该如何解决这个问题?编码器来自Seq[someObject]吗?

1 个答案:

答案 0 :(得分:1)

存储类的SeqDataset require an implicit Encoder之间的转化。

implicit def localSeqToDatasetHolder[T](s: Seq[T])(
  implicit arg0: Encoder[T]): DatasetHolder[T] 

包含常见Scala类型的产品类型(如案例类)使用Encoders提供的隐式SparkSession.implicits。对于任意类,您必须使用通用Java或Kryo编码器。有关详细信息,请参阅How to store custom objects in Dataset?