PySpark where条件在条件

时间:2019-01-14 13:03:55

标签: python apache-spark pyspark

我在以下查询中遇到语法错误:

df_result = df_checkout.join(df_checkin, 
                                    (
                                    (df_checkout.product == df_checkin.product)
                                    (df_checkout.host == df_checkin.host)
                                    ),
                                    how = 'full_outer').where(df_checkout.rank = 
                                        F.when(((df_checkout.rank = df_checkin.rank) and (F.unix_timestamp(df_checkout.checkout_date, 'MM/dd/YYYY HH:MI:SS') <= F.unix_timestamp(df_checkin.checkin_date, 'MM/dd/YYYY HH:MI:SS'))), (df_checkin.rank - 1)).when(((df_checkout.rank = df_checkin.rank) and (F.unix_timestamp(df_checkout.checkout_date, 'MM/dd/YYYY HH:MI:SS') >= F.unix_timestamp(df_checkin.checkin_date, 'MM/dd/YYYY HH:MI:SS'))), df_checkin.rank).otherwise(None)
                                    )

我遇到什么错误?

1 个答案:

答案 0 :(得分:0)

您有一个=而不是==

(df_checkout.rank = df_checkin.rank)

应该是

(df_checkout.rank == df_checkin.rank)