为什么下面的数据框映射到数据框失败?

时间:2018-11-05 11:15:24

标签: scala

我有一个如下所示的虚拟数据框:

val df = Seq((Seq("abc", "cde"), 19, "red, abc"), (Seq("eefg", "efa", "efb"), 192, "efg, efz efz")).toDF("names", "age", "color")

我创建了一个sqlContext,如下所示:

val sqlContext = new org.apache.spark.sql.SQLContext(sc);

我还导入了隐式:

import sqlContext.implicits._

但是,当我尝试执行以下操作时,它会失败:

scala> val d2 = df.map(x => x).toDF


<console>:30: error: Unable to find encoder for type stored in a Dataset.  Primitive types (Int, String, etc) and Product types (case classes) are supported by importing spark.implicits._  Support for serializing other types will be added in future releases.
       val d2 = df.map(x => x).toDF
                      ^

我在这里想念什么?

1 个答案:

答案 0 :(得分:1)

scala> import org.apache.spark.sql.catalyst.encoders.RowEncoder
import org.apache.spark.sql.catalyst.encoders.RowEncoder

scala> val df = Seq((Seq("abc", "cde"), 19, "red, abc"), (Seq("eefg", "efa", "efb"), 192, "efg, efz efz")).toDF("names", "age", "color")
df: org.apache.spark.sql.DataFrame = [names: array<string>, age: int ... 1 more field]

scala> implicit val encoder = RowEncoder(df.schema)
encoder: org.apache.spark.sql.catalyst.encoders.ExpressionEncoder[org.apache.spark.sql.Row] = class[names[0]: array<string>, age[0]: int, color[0]: string]

scala> val df2 = df.map(x => x).toDF
df2: org.apache.spark.sql.DataFrame = [names: array<string>, age: int ... 1 more field]

scala> df2.collect
res0: Array[org.apache.spark.sql.Row] = Array([WrappedArray(abc, cde),19,red, abc], [WrappedArray(eefg, efa, efb),192,efg, efz efz])