我可以消除Column_3,Column_4中的多个值
+--------+--------+--------+--------+
|Column_1|Column_2|Column_3|Column_4|
+--------+--------+--------+--------+
| 1| x| abc| www|
| 1| x| abc| sdf|
| 1| x| abc| xyz|
| 1| x| def| www|
| 1| x| def| sdf|
| 1| x| def| xyz|
+--------+--------+--------+--------+
预期产量
+--------+--------+--------+--------+
|Column_1|Column_2|Column_3|Column_4|
+--------+--------+--------+--------+
| 1| x| abc| www|
| 1| x| def| sdf|
| 1| x| null| xyz|
+--------+--------+--------+--------+
答案 0 :(得分:0)
使用df.dropDuplicates(Column_3,Column_4)
另外,请复制Removing duplicates from rows based on specific columns in an RDD/Spark DataFrame