BigQuery查询/结果

时间:2016-03-26 09:42:31

标签: google-bigquery

想要与你讨论/讨论BigQuery中的空值行为。

我注意到在NULLABLE列中过滤掉实际值会导致过滤掉请求的值和NULL值。

对ex:

进行此查询
select * from
(select NULL as some_nullable_col, "name1" as name),
(select 4 as some_nullable_col, "name2" as name),
(select 1 as some_nullable_col, "name3" as name),
(select 7 as some_nullable_col, "name4" as name),
(select 3 as some_nullable_col, "name5" as name)
--WHERE some_nullable_col != 3

所有结果均按预期返回,

然后:

select * from
(select NULL as some_nullable_col, "name1" as name),
(select 4 as some_nullable_col, "name2" as name),
(select 1 as some_nullable_col, "name3" as name),
(select 7 as some_nullable_col, "name4" as name),
(select 3 as some_nullable_col, "name5" as name)
WHERE some_nullable_col != 3

将省略2列。值3和null。

我想这是因为BigQuery不会索引空值/不会在where子句上扫描空值以提高效率,但它也会带来麻烦:

每次我对可以为空的列进行过滤时,过滤器都会显示为 WHERE some_nullable_col != 3 OR some_nullable_col IS NULL

这显然不太舒服。

只是想得到一个解释/ BigQuery的路线图是否解决了这个问题?

4 个答案:

答案 0 :(得分:6)

这是SQL中NULL的标准行为,并且所有SQL数据库(Oracle,Microsoft SQL Server,PostgreSQL,MySQL等)都具有完全相同的行为。 如果IS NULL检查过于繁琐,则替代解决方案是使用IFNULLCOALESCE函数将NULL转换为非NULL,即

select * from
(select NULL as some_nullable_col, "name1" as name),
(select 4 as some_nullable_col, "name2" as name),
(select 1 as some_nullable_col, "name3" as name),
(select 7 as some_nullable_col, "name4" as name),
(select 3 as some_nullable_col, "name5" as name)
WHERE ifnull(some_nullable_col,0) != 3

答案 1 :(得分:1)

是的,你是对的,NULL与some_nullable_col != 3之类的比较器不匹配。让我解释原因。

Google正在使用键值存储作为BigQuery的基础数据存储。与传统的关系数据库不同,数据由行和字段分段并存储到许多不同的位置。如果数据为NULL,则BigQuery会认为数据不存在,因此没有任何内容写入数据存储。因此,除了" IS NULL"之外,该字段永远不会与任何比较器匹配。这是设计上的,谷歌目前没有任何改变其工作方式的计划。

解决方法是为这些字段设置特殊值。例如,如果该字段的类型是字符串,那么您可以使用空字符串""而不是NULL。如果字段类型是非负整数,则可以使用" -1"作为特殊价值。我知道这不是最佳选择,最好添加" IS NULL"在很多情况下你的查询中的语句。这只是为了给你另一种选择。

顺便说一句,我在我的MySQL实例上尝试了类似的东西,它的行为方式与BigQuery相同。也就是说,查询不返回带有" =!"的空记录。比较器。

例如,

mysql> select * from test1;
+------+------------+
| id   | num        |
+------+------------+
|    0 | aaa        |
|    1 | bbb        |
|    8 | sdfsdfgsdf |
|    9 | NULL       |
| NULL | sdfsdfsfsf |
+------+------------+
5 rows in set (0.19 sec)

mysql> select * from test1 where id != 8;
+------+------+
| id   | num  |
+------+------+
|    0 | aaa  |
|    1 | bbb  |
|    9 | NULL |
+------+------+
3 rows in set (0.18 sec)

所以我认为这是SQL世界的标准行为。

答案 2 :(得分:1)

只是加入堆:o)

在某些情况下,选项可能很有用

SELECT * FROM
(SELECT NULL AS some_nullable_col, "name1" AS name),
(SELECT 4 AS some_nullable_col, "name2" AS name),
(SELECT 1 AS some_nullable_col, "name3" AS name),
(SELECT 7 AS some_nullable_col, "name4" AS name),
(SELECT 3 AS some_nullable_col, "name5" AS name)
WHERE IFNULL(some_nullable_col != 3, true)

考虑例如你的可空字段是字符串类型
的情况 在这种情况下,您只需要在此处进行一次更改 -

WHERE IFNULL(some_nullable_col!= '3',true)

而如果你直接在nullanble字段上使用IFNULL,如下所示

哪里有IFNULL(some_nullable_col,0)!= 3

您不仅需要反映'3',还需要反映'0'这样需要处理的额外事情

当然,在一天结束时,一切都是相同的,这只是偏好的问题,但有时候实际上取决于具体的使用和实施模式

至于standard behavior - 到目前为止,BigQuery还远远没有遵循标准 - 这并不是我们都爱上它的原因吗?!

答案 3 :(得分:0)

null是一个特殊值。 null的许多表达式本身都返回null,包括不相等的谓词表达式。这是null的属性,是设计使然。如果您想在结果中加入null,则应明确允许,例如IS NULL支票,

您的查询将变为:

select * from
…
WHERE (some_nullable_col != 3 OR some_nullable_col IS NULL)

网上有很多资源,例如Wikipedia