如何与列表一起使用join / filtered spark RDD / DF
我有清单并触发了RDD
val list = List(12345,222222,333333,444444,555555,666666)
val friendPF=Seq(("bob", "2015-01-13", 12345), ("alicsdsdse", "2015-04-23",112120),("alice", "2015-04-23",1021212),("alsddsdsice", "2015-04-23",112120),("four", "2015-04-23",44444),("three", "2015-04-23",333333),("two", "2015-04-23",222222),("five", "2015-04-23",555555),("otowowo", "2015-04-23",1121210),("six", "2015-04-23",666666)).toDF("name","date","id")
friendPF.show
+-----------+----------+-------+
| name| date| id|
+-----------+----------+-------+
| bob|2015-01-13| 12345|
| alicsdsdse|2015-04-23| 112120|
| alice|2015-04-23|1021212|
|alsddsdsice|2015-04-23| 112120|
| four|2015-04-23| 44444|
| three|2015-04-23| 333333|
| two|2015-04-23| 222222|
| five|2015-04-23| 555555|
| otowowo|2015-04-23|1121210|
| six|2015-04-23| 666666|
+-----------+----------+-------+
如何使用join从给定的rdd获取匹配的ID?
答案 0 :(得分:1)
按如下所示将您的list
RDD转换为数据框
val listDF = List(12345,222222,333333,444444,555555,666666).toDF("id")
现在加入两个数据框
friendPF.as("rel").
join(listDF.as("ids"), $"ids.id" === $"rel.id").
select( $"rel.name", $"rel.date",$"rel.id").show()
答案 1 :(得分:1)
您不需要加入,请使用isin
:
friendsPF
.where($"id".isin(list:_*))
.show()