考虑如下数据框:
+---+----+--------+----+
| c1| c2| c3| c4|
+---+----+--------+----+
| x| n1| [m1]| []|
| y| n3|[m2, m3]|[z3]|
| x| n2| []| []|
+---+----+--------+----+
我想用 null 替换空数组。
+---+----+--------+----+
| c1| c2| c3| c4|
+---+----+--------+----+
| x| n1| [m1]|null|
| y| n3|[m2, m3]|[z3]|
| x| n2| null|null|
+---+----+--------+----+
实现上述目标的有效方法是什么?
答案 0 :(得分:1)
您可以检查数组长度并返回null
usign when...otherwise
函数:
val df = Seq(
("x", "n1", Seq("m1"), Seq()),
("y", "n3", Seq("m2", "m3"), Seq("z3")),
("x", "n2", Seq(), Seq())
).toDF("c1", "c2", "c3", "c4")
df.show
df.select($"c1", $"c2",
when(size($"c3") > 0, $"c3").otherwise(lit(null)) as "c3",
when(size($"c4") > 0, $"c4").otherwise(lit(null)) as "c4"
).show
它返回:
df: org.apache.spark.sql.DataFrame = [c1: string, c2: string ... 2 more fields]
+---+---+--------+----+
| c1| c2| c3| c4|
+---+---+--------+----+
| x| n1| [m1]| []|
| y| n3|[m2, m3]|[z3]|
| x| n2| []| []|
+---+---+--------+----+
+---+---+--------+----+
| c1| c2| c3| c4|
+---+---+--------+----+
| x| n1| [m1]|null|
| y| n3|[m2, m3]|[z3]|
| x| n2| null|null|
+---+---+--------+----+