有两个表Customer1和Customer2
Customer1:列出客户的详细信息
Customer2:列出客户的更新详细信息
https://docs.google.com/spreadsheets/d/1GuQaHhZ70D0NHGXuW51B5nNZXrSkthmEduHOhwoZmRg/edit#gid=0
必须从两个表中获取CustomerName。如果更新了客户名,则必须从Customer2表中获取客户名,否则必须从Customer1表中获取。因此,所有客户名都应列出。
扩展结果集:
如何在Spark Scala中实现这一目标?
答案 0 :(得分:0)
您可以在 customer1 表上执行 Left Join
,然后在 customer2 上使用 coalesce
>表以获取first non null value
列的 customername
。
示例:
scala> val customer1=Seq((1,"shiva","9994323565"),(2,"Mani","9994323567"),(3,"Sneha","9994323568")).toDF("customerid","customername","contact")
scala> val customer2=Seq((1,"shivamoorthy","9994323565"),(2,"Manikandan","9994323567")).toDF("customerid","customername","contact")
scala> customer1.as("c1")
.join(customer2.as("c2"),$"c1.customerid" === $"c2.customerid","left")
.selectExpr("c1.customerid",
"coalesce(c2.customername,c1.customername) as customername")
.show()
结果:
+----------+------------+
|customerid|customername|
+----------+------------+
| 1|shivamoorthy|
| 2| Manikandan|
| 3| Sneha|
+----------+------------+