table1数据示例:
year month day utmsource
2017 03 26 NULL
2017 03 27 NULL
2017 03 27 facebook
2017 03 27 newsletter
2017 03 27 banner
2017 03 27 facebook
预期选择:
year month day utmsource
2017 03 27 NULL
2017 03 27 newsletter
2017 03 27 banner
我的Hive查询:
-- result = 0, it did not include the NULL utmsource record
SELECT SUM(CASE WHEN utmsource IS NULL THEN 1 ELSE 0 END) as amountnull
FROM table1
WHERE year=2017 AND month=03 AND day=27 AND NOT utmsource="facebook"
-- result = 1 the NULL utmsource record is included
SELECT SUM(CASE WHEN utmsource IS NULL THEN 1 ELSE 0 END) as amountnull
FROM table1
WHERE year=2017 AND month=03 AND day=27 AND (utmsource IS NULL OR NOT utmsource="facebook")
-- also returns 0, the NULL utmsource record is not included
SELECT SUM(CASE WHEN utmsource IS NULL THEN 1 ELSE 0 END) as amountnull
FROM table1
WHERE year=2017 AND month=03 AND day=27 AND NOT utmsource <=> 'facebook';
问题:
答案 0 :(得分:2)
您想要的是NULL
- 安全平等(或不等)运算符。在ANSI SQL中,有一个名为is distinct from
的运算符。 Hive似乎使用MySQL版本<=>
。所以,你可以这样做:
SELECT SUM(CASE WHEN utmsource IS NULL THEN 1 ELSE 0 END) as amountnull
FROM tablename
WHERE year=2017 AND month=03 AND day=27 AND NOT utmsource <=> 'facebook';
中描述了此运算符
我还应该指出,您可能会发现这是SELECT
:
SELECT (COUNT(*) - COUNT(utmsource)) as amountnull
FROM tablename
WHERE year=2017 AND month=03 AND day=27 AND NOT utmsource <=> 'facebook';
虽然总的来说,这似乎是最简单的:
SELECT COUNT(*)as amountnull
FROM tablename
WHERE year=2017 AND month=03 AND day=27 AND utmsource IS NULL;
与'Facebook'
的比较是不必要的。