我正在使用BigQuery和嵌套表,而SQL不是我的强项。我对要解决的实际生产数据存在真正的问题,同时又试图将一些SQL / BQ概念引入我的脑海。
我的查询与Working with Arrays in Standard SQL页上的某些查询类似,但是对于我来说,查询还不够完善。
让我为您介绍一些示例数据,这些数据的结构与真实数据非常相似,然后描述我需要的数据。
基本上,我有两个表,我想用一个过滤另一个。
表1具有一些两层嵌套,可以这样构建:
WITH data AS (
SELECT "Test 1" AS name, [STRUCT(1 AS id, [20, 21] AS results), STRUCT(2 AS id, [22, 23] AS results)] AS resultset
UNION ALL
SELECT "Test 2" AS name, [STRUCT(1 AS id, [23, 24] AS results), STRUCT(2 AS id, [25, 26] AS results)] AS resultset
UNION ALL
SELECT "Test 3" AS name, [STRUCT(1 AS id, [26, 27] AS results), STRUCT(2 AS id, [28, 29] AS results)] AS resultset
)
SELECT * FROM data
数字的含义无关紧要。重要的是表2包含要用于过滤表1的 ranges 。表2可以按以下方式构建:
ranges AS (
SELECT "Range 1" AS title, 24.0 AS min, 25.0 AS max
UNION ALL
SELECT "Range 2" AS title, 26.0 AS min, 27.0 AS max
)
SELECT * from ranges
我要结束的是第一个表中的行,其中 any 结果与第二个表中的一个或多个范围匹配,但没有一个行没有匹配项。
我知道我可以对两个表进行一些UNNEST()和JOINing处理,以获得过滤后的结果,但是由于嵌套的原因,该结果将包含重复项:
WITH data AS (
SELECT "Test 1" as name, [STRUCT(1 as id, [20, 21] as results), STRUCT(2 as id, [22, 23] as results)] as resultset
UNION ALL
SELECT "Test 2" as name, [STRUCT(1 as id, [23, 24] as results), STRUCT(2 as id, [25, 26] as results)] as resultset
UNION ALL
SELECT "Test 3" as name, [STRUCT(1 as id, [26, 27] as results), STRUCT(2 as id, [28, 29] as results)] as resultset
),
ranges AS (
SELECT "Range 1" AS title, 24.0 as min, 25.0 as max
UNION ALL
SELECT "Range 2" AS title, 26.0 as min, 27.0 as max
)
SELECT data.*
FROM data, UNNEST(resultset), UNNEST(results) r
JOIN ranges
ON r BETWEEN min AND max
这就是我所拥有的:
Row name resultset.id resultset.results
1 Test 2 1 23
24
2 25
26
2 Test 2 1 23
24
2 25
26
3 Test 2 1 23
24
2 25
26
4 Test 3 1 26
27
2 28
29
5 Test 3 1 26
27
2 28
29
我想要是要在SELECT中调用DISTINCT数据。*可以将其缩减为两个唯一的行,并用它来完成。
换句话说,这就是我想要的:
Row name resultset.id resultset.results
1 Test 2 1 23
24
2 25
26
2 Test 3 1 26
27
2 28
29
但是我不能用嵌套数据做到这一点。
所以,我有两个问题:
关于数据:我不能更改第一个表。如果可以导致简单的解决方案,我可以使用第二张桌子。
答案 0 :(得分:1)
以下是用于BigQuery标准SQL
最简单的解决方案是(不更改您已经拥有的查询核心),如下所示添加GROUP BY
#standardSQL
SELECT ANY_VALUE(data).*
FROM data, UNNEST(resultset), UNNEST(results) r
JOIN ranges ON r BETWEEN min AND max
GROUP BY TO_JSON_STRING(data)
这有效!但是我不明白为什么。你能详细说明吗?
好的。
SELECT DISTINCT ... FROM ...
在概念上等同于SELECT ... GROUP BY
所以,任务是为GROUP BY和相应的Aggregation函数(GROUP BY要求)找到合适的值
ANY_VALUE
和TO_JSON_STRING(data)
是我们在这里需要的
答案 1 :(得分:0)
尝试从数据集中选择所需的数据。该查询返回唯一但未嵌套的结果:
SELECT data.name, rs.id, r
FROM data
left join UNNEST(resultset) rs
left join UNNEST(results) as r
JOIN ranges ON r BETWEEN min AND max