可以在spark-shell中尝试
case class Employee(id: Int, name: String, department: String, salary: Option[Double])
import org.apache.spark.sql.functions._
import spark.implicits._
case class Employee(id: Int, name: String, department: String, salary: Option[Double])
val data = List(Employee(1, "XYZ", "dep1", Some(1234.0)), Employee(0, null, "unknown", None)).toDS()
data.select($"id", to_json(struct($"id",$"name", $"department", $"salary")).as("json_data")).show(false)
返回=>
|id |json_data |
+---+---------------------------------------------------------+
|1 |{"id":1,"name":"XYZ","department":"dep1","salary":1234.0}|
|0 |{"id":0,"department":"unknown"} |
期望=>
|id |json_data |
+---+------------------------------------------------------------+
|1 |{"id":1,"name":"XYZ","department":"dep1","salary":1234.0} |
|0 |{"id":0,"name": null, "department":"unknown","salary":null} |
空字段(姓名和薪水)也应填充到结果json中。我不想使用lit(“ null”)填充空值
答案 0 :(得分:1)
最近添加了一项功能,用于在生成JSON时保留空值,并且应在即将发布的Spark 3.0版本中提供。有关详情,请参见SPARK-29444。在3.0中,您可以通过以下方式进行控制:
data.select($"id", to_json(struct($"id",$"name", $"department", $"salary"), Map("ignoreNullFields" -> "false")).as("json_data")).show(false)
AFAIK,目前没有计划将此添加到2.x分支。