为什么这个布尔选择不计算NULL?

时间:2014-11-26 10:33:55

标签: hive apache-spark

我的表格my_table包含booleanmy_value。当我在Shark中查询表时,我收到了一些令人惊讶的结果:

shark> SELECT my_value, COUNT(*) FROM my_table GROUP BY my_value;
OK
true    182285
false   81968
NULL    7594
Time taken: 14.028 seconds
shark> SELECT COUNT(*) FROM my_table WHERE my_value=true;
OK
182285
Time taken: 13.787 seconds
shark> SELECT COUNT(*) FROM my_table WHERE my_value IS NULL;
OK
7594
Time taken: 13.387 seconds
shark> SELECT COUNT(*) FROM my_table WHERE my_value=true or my_value IS NULL;
OK
182285
Time taken: 13.406 seconds

我希望最后一个查询返回189879(即182285 + 7594)。为什么不呢?

对于好奇的读者来说,这似乎产生了正确的结果:

shark> SELECT COUNT(*) FROM my_table WHERE isnull(my_value) or my_value=true;
OK
189879

此外,这不是运营商优先级问题:

shark> SELECT COUNT(*) FROM my_table WHERE (my_value=true) or (my_value IS NULL);
OK
182285

更新:看起来IS子句中的WHERE运算符没有达到我的预期效果:

shark> SELECT my_value IS NULL FROM my_table WHERE my_value IS NULL LIMIT 10;
14/11/26 11:34:52 WARN parse.ASTRewriteUtil: Query contains a LIMIT. Skipping applicable COUNT DISTINCT rewrites.A LIMIT shouldn't be paired with an aggregation that only returns one line ...
OK
false
false
false
false
false
false
false
false
false
false

这让我更加惊讶SELECT COUNT(*) FROM my_table WHERE my_value IS NULL;返回了正确的结果。

0 个答案:

没有答案