我有一个如下所示的虚拟数据框:
val df = Seq((Seq("abc", "cde"), 19, "red, abc"), (Seq("eefg", "efa", "efb"), 192, "efg, efz efz")).toDF("names", "age", "color")
我创建了一个sqlContext,如下所示:
val sqlContext = new org.apache.spark.sql.SQLContext(sc);
我还导入了隐式:
import sqlContext.implicits._
但是,当我尝试执行以下操作时,它会失败:
scala> val d2 = df.map(x => x).toDF
<console>:30: error: Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._ Support for serializing other types will be added in future releases.
val d2 = df.map(x => x).toDF
^
我在这里想念什么?
答案 0 :(得分:1)
scala> import org.apache.spark.sql.catalyst.encoders.RowEncoder
import org.apache.spark.sql.catalyst.encoders.RowEncoder
scala> val df = Seq((Seq("abc", "cde"), 19, "red, abc"), (Seq("eefg", "efa", "efb"), 192, "efg, efz efz")).toDF("names", "age", "color")
df: org.apache.spark.sql.DataFrame = [names: array<string>, age: int ... 1 more field]
scala> implicit val encoder = RowEncoder(df.schema)
encoder: org.apache.spark.sql.catalyst.encoders.ExpressionEncoder[org.apache.spark.sql.Row] = class[names[0]: array<string>, age[0]: int, color[0]: string]
scala> val df2 = df.map(x => x).toDF
df2: org.apache.spark.sql.DataFrame = [names: array<string>, age: int ... 1 more field]
scala> df2.collect
res0: Array[org.apache.spark.sql.Row] = Array([WrappedArray(abc, cde),19,red, abc], [WrappedArray(eefg, efa, efb),192,efg, efz efz])