在spark数据帧

时间:2017-09-21 02:55:15

标签: scala apache-spark apache-spark-sql

我是新来的火花,我有数据帧df:

+----------+------------+-----------+
| Column1  | Column2    | Sub       |                          
+----------+------------+-----------+
| 1        | 2          | 1         |                                         
+----------+------------+-----------+
| 4        | null       | null      |                          
+----------+------------+-----------+
| 5        | null       | null      |                          
+----------+------------+-----------+
| 6        | 8          | 2         |                          
+----------+------------+-----------+

当减去两列时,一列具有null,因此生成的列也导致为null。

df.withColumn("Sub", col(A)-col(B))

预期输出应为:

+----------+------------+-----------+
|  Column1 | Column2    | Sub       |                          
+----------+------------+-----------+
| 1        | 2          | 1         |                                           
+----------+------------+-----------+
| 4        | null       | 4         |                          
+----------+------------+-----------+
| 5        | null       | 5         |                          
+----------+------------+-----------+
| 6        | 8          | 2         |                          
+----------+------------+-----------+

我不想将column2替换为0,它应该只为null。 有人可以帮我吗?

2 个答案:

答案 0 :(得分:5)

您可以将when功能用作

import org.apache.spark.sql.functions._
df.withColumn("Sub", when(col("Column1").isNull, lit(0)).otherwise(col("Column1")) - when(col("Column2").isNull, lit(0)).otherwise(col("Column2")))

你应该得到最终结果

+-------+-------+----+
|Column1|Column2| Sub|
+-------+-------+----+
|      1|      2|-1.0|
|      4|   null| 4.0|
|      5|   null| 5.0|
|      6|      8|-2.0|
+-------+-------+----+

答案 1 :(得分:2)

您可以coalesce在两列上归零,然后进行减法:

val df = Seq((Some(1), Some(2)), 
             (Some(4), null), 
             (Some(5), null), 
             (Some(6), Some(8))
            ).toDF("A", "B")

df.withColumn("Sub", abs(coalesce($"A", lit(0)) - coalesce($"B", lit(0)))).show
+---+----+---+
|  A|   B|Sub|
+---+----+---+
|  1|   2|  1|
|  4|null|  4|
|  5|null|  5|
|  6|   8|  2|
+---+----+---+