带有JsValues的Spark RDD到Dataframe有异常

时间:2016-07-01 16:55:06

标签: apache-spark apache-spark-sql spark-dataframe

我想要一个RDD[(String, JsValue)]转换为数据帧以使用SQL。

如果我定义case class Holder(s:String, j:JsValue),我可以做

val mydf = myrdd.map(x => Holder(x._1,x._2)).toDF().

但是,如果我然后执行mydf.schema.mkString(","),我会在下面收到以下错误。任何想法是什么以及如何解决它?

java.lang.UnsupportedOperationException: Schema for type play.api.libs.json.JsValue is not supported
        at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:718)
        at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30)
        at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:693)
        at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$schemaFor$1.apply(ScalaReflection.scala:691)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
        at scala.collection.immutable.List.foreach(List.scala:318)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
        at scala.collection.AbstractTraversable.map(Traversable.scala:105)
        at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:691)
        at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30)
        at org.apache.spark.sql.catalyst.ScalaReflection$class.schemaFor(ScalaReflection.scala:630)
        at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:30)
        at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:414)
        at org.apache.spark.sql.SQLImplicits.rddToDataFrameHolder(SQLImplicits.scala:94)

0 个答案:

没有答案