所以这就是我一直在尝试的:
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.sql.functions._
val conf =
new SparkConf().setMaster("local[*]").setAppName("test")
.set("spark.ui.enabled", "false").set("spark.app.id", "testApp")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
case class B(s: String)
case class A(i: Int, b: Option[B])
val df = Seq(1,2,3).map(Tuple1.apply).toDF
// lit with a struct works fine
df.select(col("_1").as("i"), struct(lit("myString").as("s")).as("b")).as[A].show
/*
+---+-----------------+
| i| b|
+---+-----------------+
| 1|Some(B(myString))|
| 2|Some(B(myString))|
| 3|Some(B(myString))|
+---+-----------------+
*/
// lit with a null throws an exception
df.select(col("_1").as("i"), lit(null).as("b")).as[A].show
/*
org.apache.spark.sql.AnalysisException: Can't extract value from b#16;
at org.apache.spark.sql.catalyst.expressions.ExtractValue$.apply(complexTypeExtractors.scala:73)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveReferences$$anonfun$apply$10$$anonfun$applyOrElse$4.applyOrElse(Analyzer.scala:475)
*/
答案 0 :(得分:2)
使用正确的类型:
import org.apache.spark.sql.types._
val s = StructType(Seq(StructField("s", StringType)))
df.select(col("_1").as("i"), lit(null).cast(s).alias("b")).as[A].show
lit(null)
单独表示为NullType
,因此它不会与预期类型匹配。