我想从事件数据框中存在的价格中过滤ID。我的代码在下面,但是在pyspark中不起作用。我该如何解决?
events = spark.createDataFrame([(657,'Conferences'),
(765, 'Seminars '),
(776, 'Meetings'),
(879, 'Conferences'),
(765, 'Meetings'),
(879, 'Seminars'),
(985, 'Meetings'),
(879, 'Meetings'),
(657, 'Seminars'),
(657,'Conferences')]
,['Id', 'event_name'])
events.show()
price = spark.createDataFrame([(657,10),
(879,45),
(776,54),
(879,45),
(765, 65)]
,['Id','Price'])
price[price.Id.isin(events.Id)].show()
答案 0 :(得分:0)
简单的连接将仅获取事件表中存在的ID的价格
events.join(price, "Id").select("Id", "Price").distinct().show()