Question

我有一个数据框架，其中包含旧数据和更新数据：

我想折叠此数据，以便每当model_update列中的非空值可用时，它将替换同一行中的model列值。如何做到这一点？

数据框：

+----------------------------------------+-------+--------+-----------+------------+
|id                                      |make   |model   |make_update|model_update|
+----------------------------------------+-------+--------+-----------+------------+
|1234                                    |Apple  |iphone  |null       |iphone x    |
|4567                                    |Apple  |iphone  |null       |iphone 8    |
|7890                                    |Apple  |iphone  |null       |null        |
+----------------------------------------+-------+--------+-----------+------------+

理想的结果：

+----------------------------------------+-------+---------+
|id                                      |make   |model    |
+----------------------------------------+-------+---------|
|1234                                    |Apple  |iphone x |
|4567                                    |Apple  |iphone 8 |
|7890                                    |Apple  |iphone   |
+----------------------------------------+-------+---------+

Answer 1

使用合并。

df = df.withColumn（“ model”，coalesce（col（“ model_update”），col（“ model”）））

Answer 2

这是一个快速的解决方案：

val df2 = df1.withColumn("New_Model", when($"model_update".isNull ,Model)
                                  .otherwise(model_update))

其中df1是您的原始数据帧。

Spark：有条件地将col1值替换为col2

2 个答案: