PySpark SQL CASE失败

时间:2018-11-14 14:44:20

标签: pyspark pyspark-sql

使用PySpark sqlContext时,我遇到了奇怪的行为。最好在下面的代码中阐明问题。

我正在简单案例中检查COLUMN的值。但是,即使条件检查为TRUE且始终跳到ELSE,也不会触发WHEN。我的语法有误吗?

dataTest = spark.sql("""SELECT 
COLUMN > 1,
CASE COLUMN 
    WHEN COLUMN > 1 THEN 1
    ELSE COLUMN 
    END AS COLUMN_2,
COLUMN
FROM TABLE
""")


dataTest.sort(col("COLUMN").desc()).show(5, False)

+---------------+-------------+---------+ 
|COLUMN >1      |COLUMN_2     |COLUMN   | 
+---------------+-------------+---------+ 
|true           |14           |14       | 
|true           |5            |5        | 
|true           |4            |4        | 
|true           |3            |3        | 
|true           |2            |2        | 
+---------------+-------------+---------+

1 个答案:

答案 0 :(得分:0)

您缺少语法,请尝试:

SELECT id, timestamp FROM table 
WHERE (id = 2 AND 
       timestamp BETWEEN '2018-11-14 10:01:00' AND '2018-11-14 10:02:00') 
      OR 
      (id = 3 AND 
       timestamp BETWEEN '2018-11-14 10:02:00' AND '2018-11-14 10:04:00') 

请注意, CASE WHEN 关键字之间没有 COLUMN