我的表格my_table
包含boolean
列my_value
。当我在Shark中查询表时,我收到了一些令人惊讶的结果:
shark> SELECT my_value, COUNT(*) FROM my_table GROUP BY my_value;
OK
true 182285
false 81968
NULL 7594
Time taken: 14.028 seconds
shark> SELECT COUNT(*) FROM my_table WHERE my_value=true;
OK
182285
Time taken: 13.787 seconds
shark> SELECT COUNT(*) FROM my_table WHERE my_value IS NULL;
OK
7594
Time taken: 13.387 seconds
shark> SELECT COUNT(*) FROM my_table WHERE my_value=true or my_value IS NULL;
OK
182285
Time taken: 13.406 seconds
我希望最后一个查询返回189879
(即182285 + 7594)。为什么不呢?
对于好奇的读者来说,这似乎产生了正确的结果:
shark> SELECT COUNT(*) FROM my_table WHERE isnull(my_value) or my_value=true;
OK
189879
此外,这不是运营商优先级问题:
shark> SELECT COUNT(*) FROM my_table WHERE (my_value=true) or (my_value IS NULL);
OK
182285
更新:看起来IS
子句中的WHERE
运算符没有达到我的预期效果:
shark> SELECT my_value IS NULL FROM my_table WHERE my_value IS NULL LIMIT 10;
14/11/26 11:34:52 WARN parse.ASTRewriteUtil: Query contains a LIMIT. Skipping applicable COUNT DISTINCT rewrites.A LIMIT shouldn't be paired with an aggregation that only returns one line ...
OK
false
false
false
false
false
false
false
false
false
false
这让我更加惊讶SELECT COUNT(*) FROM my_table WHERE my_value IS NULL;
返回了正确的结果。