Question

我正在从文本文件创建Spark DataFrame。说包含String，Int，Char。

的Employee文件

创建了一个类：

case class Emp (
  Name: String, 
  eid: Int, 
  Age: Int, 
  Sex: Char, 
  Sal: Int, 
  City: String)

使用split创建RDD1，然后创建RDD2：

val textFileRDD2 = textFileRDD1.map(attributes => Emp(
  attributes(0), 
  attributes(1).toInt, 
  attributes(2).toInt, 
  attributes(3).charAt(0), 
  attributes(4).toInt, 
  attributes(5)))

最终RDDS为：

finalRDD = textFileRDD2.toDF

当我创建最终的RDD时，它会抛出错误：

java.lang.UnsupportedOperationException：找不到scala.Char的编码器“

任何人都可以帮我解决原因和解决方法吗？

Answer 1

Spark SQL不为Encoders和generic Encoders are not very useful提供Char。

您可以使用StringType：

attributes(3).slice(0, 1)

或ShortType（或BooleanType，ByteType，如果您只接受二元回复）：

attributes(3)(0) match {
   case 'F' => 1: Short
   ...
   case _ => 0: Short
}

Spark DataFrame不支持Char数据类型

1 个答案: