为什么会触发to_json()不填充空值?

时间:2020-01-31 17:29:29

标签: scala apache-spark apache-spark-sql

可以在spark-shell中尝试

case class Employee(id: Int, name: String, department: String, salary: Option[Double])
import org.apache.spark.sql.functions._
import spark.implicits._

case class Employee(id: Int, name: String, department: String, salary: Option[Double])
val data = List(Employee(1, "XYZ", "dep1", Some(1234.0)), Employee(0, null, "unknown", None)).toDS()
data.select($"id", to_json(struct($"id",$"name", $"department", $"salary")).as("json_data")).show(false)

返回=>

|id |json_data                                                |
+---+---------------------------------------------------------+
|1  |{"id":1,"name":"XYZ","department":"dep1","salary":1234.0}|
|0  |{"id":0,"department":"unknown"}                          |

期望=>

|id |json_data                                                   |
+---+------------------------------------------------------------+
|1  |{"id":1,"name":"XYZ","department":"dep1","salary":1234.0}   |
|0  |{"id":0,"name": null, "department":"unknown","salary":null} |

空字段(姓名和薪水)也应填充到结果json中。我不想使用lit(“ null”)填充空值

1 个答案:

答案 0 :(得分:1)

最近添加了一项功能,用于在生成JSON时保留空值,并且应在即将发布的Spark 3.0版本中提供。有关详情,请参见SPARK-29444。在3.0中,您可以通过以下方式进行控制:

data.select($"id", to_json(struct($"id",$"name", $"department", $"salary"), Map("ignoreNullFields" -> "false")).as("json_data")).show(false)

AFAIK,目前没有计划将此添加到2.x分支。