我在用户界面上直接运行BigQuery,查询结果重复(每行有一个副本)得到120个结果。我也用相同的语句测试select count(*),结果仍然得到120。即使将结果作为csv文件下载到本地磁盘,数据仍然是重复的。我环顾四周但无法获得任何有益的观点。有什么建议吗?
id是必需的,其他可以为null; budget_start和budget_end是日期类型,total_cost是float,而其他列是字符串
答案 0 :(得分:1)
从您的查询中 - 很明显您使用的是BigQuery Legacy SQL Legacy SQL的输出细节是它变得扁平了 这意味着如果您有嵌套行 - 它们将被展平
见下面的例子
#legacySQL
SELECT id, NEST(x) AS xs
FROM
(SELECT 1 AS id, 2 AS x),
(SELECT 1 AS id, 3 AS x),
(SELECT 1 AS id, 4 AS x),
(SELECT 2 AS id, 5 AS x),
(SELECT 2 AS id, 6 AS x)
GROUP BY id
它创建两行,如下所示
Row id xs
1 1 [2,3,4]
2 2 [5,6]
您可以通过使用目标表运行此查询来检查此项,然后预览此表
现在 - 如果您在Web UI中运行相同的查询(在旧SQL中) - 您将获得5行而不是“预期”2行
Row id xs
1 1 2
2 1 3
3 1 4
4 2 5
5 2 6
请注意:扁平化只发生在最终外层 - 子查询不会变平。例如,下面的查询将为您提供count = 2,如您所期望的那样
#legacySQL
SELECT COUNT(1) AS cnt FROM (
SELECT id, NEST(x) AS xs
FROM
(SELECT 1 AS id, 2 AS x),
(SELECT 1 AS id, 3 AS x),
(SELECT 1 AS id, 4 AS x),
(SELECT 2 AS id, 5 AS x),
(SELECT 2 AS id, 6 AS x)
GROUP BY id
)
Row cnt
1 2
所以,为了解决这个问题,我建议您migrate to BigQuery Standard SQL
请参阅BigQuery Standard SQL的等效示例
#standardSQL
WITH `yourTable` AS (
SELECT 1 AS id, [2,3,4] AS xs UNION ALL
SELECT 2, [5,6]
)
SELECT * FROM `yourTable`
输出只有两行,正如人们所期望的那样
Row id xs
1 1 2
3
4
2 2 5
6
答案 1 :(得分:0)
非常感谢米哈伊尔提出的富有洞察力的建议!我实际上发现了问题,我从Google Storage导入了两次相同的表(在第一次导入时发现一些错误,纠正错误并再次加载)导致一个包含重复内容的表(我认为已被替换但实际合并)我做了没有意识到