从数据集转换为Dataframe时,如何点亮选项

时间:2016-07-21 10:05:28

标签: scala apache-spark apache-spark-sql apache-spark-dataset

所以这就是我一直在尝试的:

import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.functions._
​
val conf = 
  new SparkConf().setMaster("local[*]").setAppName("test")
  .set("spark.ui.enabled", "false").set("spark.app.id", "testApp")

val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)

import sqlContext.implicits._
​
case class B(s: String)
case class A(i: Int, b: Option[B])
​
val df = Seq(1,2,3).map(Tuple1.apply).toDF
​
// lit with a struct works fine
df.select(col("_1").as("i"), struct(lit("myString").as("s")).as("b")).as[A].show
​
/*
+---+-----------------+
|  i|                b|
+---+-----------------+
|  1|Some(B(myString))|
|  2|Some(B(myString))|
|  3|Some(B(myString))|
+---+-----------------+
*/
​
// lit with a null throws an exception
df.select(col("_1").as("i"), lit(null).as("b")).as[A].show
​
/*
org.apache.spark.sql.AnalysisException: Can't extract value from b#16;
    at org.apache.spark.sql.catalyst.expressions.ExtractValue$.apply(complexTypeExtractors.scala:73)
    at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$10$$anonfun$applyOrElse$4.applyOrElse(Analyzer.scala:475)
*/

1 个答案:

答案 0 :(得分:2)

使用正确的类型:

import org.apache.spark.sql.types._

val s = StructType(Seq(StructField("s", StringType)))

df.select(col("_1").as("i"), lit(null).cast(s).alias("b")).as[A].show

lit(null)单独表示为NullType,因此它不会与预期类型匹配。