Question

table1数据示例：

year month day utmsource
2017 03    26  NULL
2017 03    27  NULL
2017 03    27  facebook
2017 03    27  newsletter
2017 03    27  banner
2017 03    27  facebook

预期选择：

year month day utmsource
2017 03    27  NULL
2017 03    27  newsletter
2017 03    27  banner

我的Hive查询：

-- result = 0, it did not include the NULL utmsource record
SELECT SUM(CASE WHEN utmsource IS NULL THEN 1 ELSE 0 END) as amountnull
FROM table1
WHERE year=2017 AND month=03 AND day=27 AND NOT utmsource="facebook"

-- result = 1 the NULL utmsource record is included
SELECT SUM(CASE WHEN utmsource IS NULL THEN 1 ELSE 0 END) as amountnull
FROM table1
WHERE year=2017 AND month=03 AND day=27 AND (utmsource IS NULL OR NOT utmsource="facebook")

-- also returns 0, the NULL utmsource record is not included
SELECT SUM(CASE WHEN utmsource IS NULL THEN 1 ELSE 0 END) as amountnull
FROM table1
WHERE year=2017 AND month=03 AND day=27 AND NOT utmsource <=> 'facebook';

问题：

有人可以解释这种行为吗？
我可以将设置更改为检索查询2的结果而不添加额外的OR 我的查询功能？ =＆GT; not equals包含结果中的空值

Answer 1

您想要的是NULL - 安全平等（或不等）运算符。在ANSI SQL中，有一个名为is distinct from的运算符。 Hive似乎使用MySQL版本<=>。所以，你可以这样做：

SELECT SUM(CASE WHEN utmsource IS NULL THEN 1 ELSE 0 END) as amountnull
FROM tablename
WHERE year=2017 AND month=03 AND day=27 AND NOT utmsource <=> 'facebook';

documentation。

中描述了此运算符

我还应该指出，您可能会发现这是SELECT：

的更简单的表述

SELECT (COUNT(*) - COUNT(utmsource)) as amountnull
FROM tablename
WHERE year=2017 AND month=03 AND day=27 AND NOT utmsource <=> 'facebook';

虽然总的来说，这似乎是最简单的：

SELECT COUNT(*)as amountnull
FROM tablename
WHERE year=2017 AND month=03 AND day=27 AND utmsource IS NULL;

与'Facebook'的比较是不必要的。

Hive查询使用“not column = value”where子句删除空值

1 个答案: