我有一个如下数据框:
+--------+----------------+----+----------+
|role_num| email_address|role|counters |
+--------+----------------+----+----------+
| 110| EMAIL2@TEST.COM|null| 2|
| 110| EMAIL2@TEST.COM| P | 2|
| 114|EMAIL10@TEST.COM| A | 2|
| 114|EMAIL10@TEST.COM|null| 2|
+--------+----------------+----+----------+
在此数据框中,我的输出应如下所示:
+--------+----------------+----+----------+
|role_num| email_address|role|counters |
+--------+----------------+----+----------+
| 110| EMAIL2@TEST.COM| P | 2|
| 114|EMAIL10@TEST.COM| A | 2|
+--------+----------------+----+----------+
条件是每当重复计数为2时,我应该选择角色“ P”,但是如果该角色不存在,那么我需要选择“ A”。
我尝试过如下。但这似乎不起作用。
import sc.implicits._
val targetDF = Seq(
("110", "EMAIL2@TEST.COM", "", "2"),
("110", "EMAIL2@TEST.COM", "PAH", "2"),
("114", "EMAIL10@TEST.COM", "AAH", "2"),
("114", "EMAIL10@TEST.COM", "", "2")
)
.toDF(
"role_num",
"email_address",
"role",
"counters")
targetDF.where(
(col("counters") > 1 )
|| ?)
你能帮忙吗?
答案 0 :(得分:1)
此解决方案将与您当前的职位配合使用
targetDF
.withColumn("priority", rank().over(Window.partitionBy("acct_num").orderBy(desc_nulls_last("role"))))
.where(col("priority") === 1)
.drop("priority")