与清单一起加入spark df

时间:2018-07-10 15:57:14

标签: scala apache-spark apache-spark-sql

如何与列表一起使用join / filtered spark RDD / DF

我有清单并触发了RDD

val list = List(12345,222222,333333,444444,555555,666666)

val friendPF=Seq(("bob", "2015-01-13", 12345), ("alicsdsdse", "2015-04-23",112120),("alice", "2015-04-23",1021212),("alsddsdsice", "2015-04-23",112120),("four", "2015-04-23",44444),("three", "2015-04-23",333333),("two", "2015-04-23",222222),("five", "2015-04-23",555555),("otowowo", "2015-04-23",1121210),("six", "2015-04-23",666666)).toDF("name","date","id")

friendPF.show

+-----------+----------+-------+
|       name|      date|     id|
+-----------+----------+-------+
|        bob|2015-01-13|  12345|
| alicsdsdse|2015-04-23| 112120|
|      alice|2015-04-23|1021212|
|alsddsdsice|2015-04-23| 112120|
|       four|2015-04-23|  44444|
|      three|2015-04-23| 333333|
|        two|2015-04-23| 222222|
|       five|2015-04-23| 555555|
|    otowowo|2015-04-23|1121210|
|        six|2015-04-23| 666666|
+-----------+----------+-------+

如何使用join从给定的rdd获取匹配的ID?

2 个答案:

答案 0 :(得分:1)

按如下所示将您的list RDD转换为数据框

val listDF = List(12345,222222,333333,444444,555555,666666).toDF("id")

现在加入两个数据框

friendPF.as("rel").
    join(listDF.as("ids"),  $"ids.id" === $"rel.id").
    select( $"rel.name", $"rel.date",$"rel.id").show()

答案 1 :(得分:1)

您不需要加入,请使用isin

friendsPF
.where($"id".isin(list:_*))
.show()