想要与你讨论/讨论BigQuery中的空值行为。
我注意到在NULLABLE列中过滤掉实际值会导致过滤掉请求的值和NULL值。
对ex:
进行此查询select * from
(select NULL as some_nullable_col, "name1" as name),
(select 4 as some_nullable_col, "name2" as name),
(select 1 as some_nullable_col, "name3" as name),
(select 7 as some_nullable_col, "name4" as name),
(select 3 as some_nullable_col, "name5" as name)
--WHERE some_nullable_col != 3
所有结果均按预期返回,
然后:
select * from
(select NULL as some_nullable_col, "name1" as name),
(select 4 as some_nullable_col, "name2" as name),
(select 1 as some_nullable_col, "name3" as name),
(select 7 as some_nullable_col, "name4" as name),
(select 3 as some_nullable_col, "name5" as name)
WHERE some_nullable_col != 3
将省略2列。值3和null。
我想这是因为BigQuery不会索引空值/不会在where子句上扫描空值以提高效率,但它也会带来麻烦:
每次我对可以为空的列进行过滤时,过滤器都会显示为
WHERE some_nullable_col != 3 OR some_nullable_col IS NULL
这显然不太舒服。
只是想得到一个解释/ BigQuery的路线图是否解决了这个问题?
答案 0 :(得分:6)
这是SQL中NULL的标准行为,并且所有SQL数据库(Oracle,Microsoft SQL Server,PostgreSQL,MySQL等)都具有完全相同的行为。
如果IS NULL检查过于繁琐,则替代解决方案是使用IFNULL
或COALESCE
函数将NULL转换为非NULL,即
select * from
(select NULL as some_nullable_col, "name1" as name),
(select 4 as some_nullable_col, "name2" as name),
(select 1 as some_nullable_col, "name3" as name),
(select 7 as some_nullable_col, "name4" as name),
(select 3 as some_nullable_col, "name5" as name)
WHERE ifnull(some_nullable_col,0) != 3
答案 1 :(得分:1)
是的,你是对的,NULL与some_nullable_col != 3
之类的比较器不匹配。让我解释原因。
Google正在使用键值存储作为BigQuery的基础数据存储。与传统的关系数据库不同,数据由行和字段分段并存储到许多不同的位置。如果数据为NULL,则BigQuery会认为数据不存在,因此没有任何内容写入数据存储。因此,除了" IS NULL"之外,该字段永远不会与任何比较器匹配。这是设计上的,谷歌目前没有任何改变其工作方式的计划。
解决方法是为这些字段设置特殊值。例如,如果该字段的类型是字符串,那么您可以使用空字符串""而不是NULL。如果字段类型是非负整数,则可以使用" -1"作为特殊价值。我知道这不是最佳选择,最好添加" IS NULL"在很多情况下你的查询中的语句。这只是为了给你另一种选择。
顺便说一句,我在我的MySQL实例上尝试了类似的东西,它的行为方式与BigQuery相同。也就是说,查询不返回带有" =!"的空记录。比较器。
例如,
mysql> select * from test1;
+------+------------+
| id | num |
+------+------------+
| 0 | aaa |
| 1 | bbb |
| 8 | sdfsdfgsdf |
| 9 | NULL |
| NULL | sdfsdfsfsf |
+------+------------+
5 rows in set (0.19 sec)
和
mysql> select * from test1 where id != 8;
+------+------+
| id | num |
+------+------+
| 0 | aaa |
| 1 | bbb |
| 9 | NULL |
+------+------+
3 rows in set (0.18 sec)
所以我认为这是SQL世界的标准行为。
答案 2 :(得分:1)
只是加入堆:o)
在某些情况下,选项可能很有用
SELECT * FROM
(SELECT NULL AS some_nullable_col, "name1" AS name),
(SELECT 4 AS some_nullable_col, "name2" AS name),
(SELECT 1 AS some_nullable_col, "name3" AS name),
(SELECT 7 AS some_nullable_col, "name4" AS name),
(SELECT 3 AS some_nullable_col, "name5" AS name)
WHERE IFNULL(some_nullable_col != 3, true)
考虑例如你的可空字段是字符串类型
的情况
在这种情况下,您只需要在此处进行一次更改 -
WHERE IFNULL(some_nullable_col!= '3'
,true)
而如果你直接在nullanble字段上使用IFNULL,如下所示
哪里有IFNULL(some_nullable_col,0
)!= 3
您不仅需要反映'3'
,还需要反映'0'
这样需要处理的额外事情
当然,在一天结束时,一切都是相同的,这只是偏好的问题,但有时候实际上取决于具体的使用和实施模式
至于standard behavior
- 到目前为止,BigQuery还远远没有遵循标准 - 这并不是我们都爱上它的原因吗?!
答案 3 :(得分:0)
null
是一个特殊值。 null
的许多表达式本身都返回null
,包括不相等的谓词表达式。这是null
的属性,是设计使然。如果您想在结果中加入null
,则应明确允许,例如IS NULL
支票,
您的查询将变为:
select * from
…
WHERE (some_nullable_col != 3 OR some_nullable_col IS NULL)
网上有很多资源,例如Wikipedia。