foreach with Pyspark dataframe

时间:2017-12-18 07:50:02

标签: python apache-spark-sql pyspark-sql

I have two dataframes, "account" and df1.

For each row of df2 I need to lookup for a value in df1. I have been trying something like this - (below function shows sample operations)

df2

def lookup(df2) print df2.name df1.foreach(lookup) is running but not showing any results.

What could be the cause of this?

1 个答案:

答案 0 :(得分:1)

我假设您需要左DF的所有记录和右DF的匹配记录

您可以使用如下所示的加入条件

df1.join(df2,[<column name>],'left_outer')

如果需要更多帮助,请发帖

left_outer加入会返回什么

LEFT OUTER联接包含两个表中符合WHERE子句条件的所有行,与INNER联接结果集相同。此外,左表中任何没有右表中存在的匹配行的行也将包括在结果集中。

enter image description here