我在scala中有如下所示的数据框。当我在两个不同大小的数据帧上进行完全外连接时,我得到了这个结果。
这些是执行以下查询后得到的键值对
select * from TEMP1 a FULL OUTER JOIN TEMP2 b ON a.T_ROWKEY = b.N_ROWKEY
这下面的df描述了我们需要添加相似键值并创建新数据帧的键值对,如果没有类似值,只需保持值不变。
[2552195C312,100,2552195C312,5]
[null,null,175831A638,1]
[48061B887,1,null,null]
[null,null,171539C177,1]
[null,null,5584D2379,4]
[118732EE7792,3,null,null]
[null,null,8157FF1915,1]
[14310AA872,1000,14310AA872,7]
[148BB41539,5,148BB41539,1]
[40513SS68,1,null,null]
[null,null,199915UY72,11]
[11429401AW5,3,null,null]
[187755CD00,4,null,null]
[834413CV18,1,null,null]
[185475XS2,14,null,null]
[11716817SD8,2,null,null]
[2552998AS99,12,null,null]
[null,null,19792WS37,2]
[153054WE02,1,null,null]
[null,null,8131128ER1,7]
我期待像
这样的结果[2552195C312,105]
[175831A638,1]
[48061B887,1]
[171539C177,1]
[5584D2379,4]
[118732EE7792,3]
[8157FF1915,1]
[14310AA872,1007]
[148BB41539,6]
[40513SS68,1]
[199915UY72,11]
[11429401AW5,3]
[187755CD00,4]
[834413CV18,1]
[185475XS2,14]
[11716817SD8,2]
[2552998AS99,12]
[19792WS37,2]
[153054WE02,1]
[8131128ER1,7]
请一些人帮忙。感谢你的帮助。
答案 0 :(得分:1)
由于您尚未说明值列名,我假设schema
之后的dataframe
outer join
>是
root
|-- T_ROWKEY: string (nullable = true)
|-- T_ROWVALUE: integer (nullable = true)
|-- N_ROWKEY: string (nullable = true)
|-- N_ROWVALUE: integer (nullable = true)
因此,在您schema
作为
outer join
sqlContext.sql("select * from TEMP1 a FULL OUTER JOIN TEMP2 b ON a.T_ROWKEY = b.N_ROWKEY").createOrReplaceTempView("JOINED")
然后简单的case when then else end
应该为您提供您期望的最终结果
sqlContext.sql("select case when T_ROWKEY is null then `N_ROWKEY` else `T_ROWKEY` end as ROWKEY, case when T_ROWVALUE is null then 0 else `T_ROWVALUE` end + case when N_ROWVALUE is null then 0 else `N_ROWVALUE` end as VALUE from JOINED").show(false)
应该给你
+------------+-----+
|ROWKEY |VALUE|
+------------+-----+
|14310AA872 |1007 |
|19792WS37 |2 |
|5584D2379 |4 |
|40513SS68 |1 |
|11716817SD8 |2 |
|11429401AW5 |3 |
|118732EE7792|3 |
|171539C177 |1 |
|187755CD00 |4 |
|8131128ER1 |7 |
|2552998AS99 |12 |
|834413CV18 |1 |
|8157FF1915 |1 |
|2552195C312 |105 |
|48061B887 |1 |
|148BB41539 |6 |
|153054WE02 |1 |
|175831A638 |1 |
|199915UY72 |11 |
|185475XS2 |14 |
+------------+-----+
使用when otherwise
内置函数更简单,更简洁
import org.apache.spark.sql.functions._
joined.select(when('T_ROWKEY.isNull, 'N_ROWKEY).otherwise('T_ROWKEY).as("ROWKEY"),
when('T_ROWVALUE.isNull, 0).otherwise('T_ROWVALUE) + when('N_ROWVALUE.isNull, 0).otherwise('N_ROWVALUE) as "VALUE")
.show(false)
应该给你上面的结果
我希望答案很有帮助