Question

我有一个包含A-Z列的数据框，我希望根据是否有其他列值为空来分配Z的值。我可以通过以下方式做到这一点：

val df2 = df1.withColumn("Z",
   when(col("A") === lit(null), lit("Y"))
  .when(col("B") === lit(null), lit("Y"))
  .when(col("C") === lit(null), lit("Y"))
  ...
  ...
  .when(col("Y") === lit(null), lit("Y"))
  .otherwise(lit("N")));

是否有更简洁的方法来迭代withColumn方法中的所有其他列？

Answer 1

是的，您可以遍历withColumns中的列并使用foldLeft作为逻辑表达式：

val df2 = df1.withColumn("Z",
      when(
        df.columns
          .filter(name => name.matches("[A-Z]")) // only take these column names
          .map(name => col(name)) // maps String to Column
          .foldLeft(lit(false))((acc, current) => when(acc or current.isNull, lit(true)).otherwise(lit(false)))
        , lit("Y"))
        .otherwise(lit("N"))
    )

测试：

输入：

+---+----+----+
|  A|   B|   C|
+---+----+----+
|  1|   2|   3|
|  1|null|   3|
|  1|null|null|
+---+----+----+

输出：

+---+----+----+---+
|  A|   B|   C|  Z|
+---+----+----+---+
|  1|   2|   3|  N|
|  1|null|   3|  Y|
|  1|null|null|  Y|
+---+----+----+---+

Answer 2

我通过探索spark.sql.functions包

来实现这一目标

val df2 = df1
  .withColumn("Z",when(array_contains(array(df1.columns.map(c=>lower(col(c))):_*),"null"),lit("Y")).otherwise(lit("N")))

通过迭代数据帧的所有其他列来确定列的值

2 个答案: