Spark多条件加入

时间:2016-05-25 21:40:40

标签: pyspark-sql

我使用spark sql连接三个表,但是我遇到多列条件错误。

test_table = (T1.join(T2,T1.dtm == T2.kids_dtm, "inner")
          .join(T3, T3.kids_dtm == T1.dtm
                and T2.room_id == T3.room_id
                and T2.book_id == T3.book_id, "inner"))

ERROR:

  Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/opt/cloudera/parcels/CDH-5.7.0-1.cdh5.7.0.p0.45/lib/spark/python/pyspark/sql/column.py", line 447, in __nonzero__
    raise ValueError("Cannot convert column into bool: please use '&' for 'and', '|' for 'or', "
ValueError: Cannot convert column into bool: please use '&' for 'and', '|' for 'or', '~' for 'not' when building DataFrame boolean expressions.

我没有指定“和”,而是尝试了“&amp;”和“&amp;&amp;” ,但这些都不起作用。任何帮助,将不胜感激。

1 个答案:

答案 0 :(得分:1)

Nvm,以下是使用“&amp;”的作品和括号:

test_table = (T1.join(T2,T1.dtm == T2.kids_dtm, "inner")
      .join(T3, (T3.kids_dtm == T1.dtm)
            & (T2.room_id == T3.room_id)
            & (T2.book_id == T3.book_id), "inner"))