Question

我有类似的数据，并发布到这里提出的问题： Spark sql how to explode without losing null values

我使用了针对Spark <= 2.1提出的解决方案，并且实际上在拆分之后，空值在我的数据中显示为文字：

df.withColumn("likes", explode(
  when(col("likes").isNotNull, col("likes"))
    // If null explode an array<string> with a single null
    .otherwise(array(lit(null).cast("string")))))

问题在于，在那之后，我需要检查该列中是否有空值，并在这种情况下采取措施。我试图运行我的代码，将作为文字插入的空值识别为字符串而不是空值。

因此，即使该列中的行为空，下面的代码也始终返回0：

df.withColumn("likes", f.when(col('likes').isNotNull(), 0).otherwise(2)).show()

+--------+------+
|likes   |origin|
+--------+------+
|    CARS|     0|
|    CARS|     0|
|    null|     0|
|    null|     0|

我使用cloudera pyspark

Answer 1

您可以通过使用udf来破解：

val empty = udf(() => null: String)

df.withColumn("likes", explode(
  when(col("likes").isNotNull, col("likes"))
    // If null explode an array<string> with a single null
    .otherwise(array(empty()))))

Answer 2

我实际上找到了一种方法。否则必须写成这样：

.otherwise（array（lit（None）.cast（“ string”）））））

分割后Spark SQL无法识别空值

2 个答案: