我想在 Pyspark 中实现以下 SQL 条件
SELECT *
FROM table
WHERE NOT ( ID = 1
AND Event = 1
)
AND NOT ( ID = 2
AND Event = 2
)
AND NOT ( ID = 1
AND Event = 0
)
AND NOT ( ID = 2
AND Event = 0
)
这样做的干净方法是什么?
答案 0 :(得分:2)
对于DataFrame API 版本,您使用filter 或where 函数。
等效代码如下:
df.filter(~((df.ID == 1) & (df.Event == 1)) &
~((df.ID == 2) & (df.Event == 2)) &
~((df.ID == 1) & (df.Event == 0)) &
~((df.ID == 2) & (df.Event == 0)))
答案 1 :(得分:1)
如果你很懒,你可以将 SQL 过滤器表达式复制并粘贴到 pyspark 过滤器中:
df.filter("""
NOT ( ID = 1
AND Event = 1
)
AND NOT ( ID = 2
AND Event = 2
)
AND NOT ( ID = 1
AND Event = 0
)
AND NOT ( ID = 2
AND Event = 0
)
""")