有没有办法获得每个{id,date}的总行数和计数>同一查询中每{id,date,columnX} 1个?
例如,有这样一张表:
id date columnX
1 2017-04-20 a
1 2017-04-20 a
1 2017-04-18 b
1 2017-04-17 c
2 2017-04-20 a
2 2017-04-20 a
2 2017-04-20 c
2 2017-04-19 b
2 2017-04-19 b
2 2017-04-19 b
2 2017-04-19 b
2 2017-04-19 c
结果,我想得到下表:
id date columnX count>1 count_total
1 2017-04-20 a 2 2
2 2017-04-20 a 2 3
2 2017-04-19 b 4 5
我尝试用分区来做,但收到奇怪的结果。我听说可能会使用Rollup函数,但它似乎只适用于遗留SQL,这对我来说不是一个选择。
答案 0 :(得分:2)
如果我理解正确,你可以使用窗口功能:
select id, date, columnx, cnt,
(case when cnt > 1 then cnt else 0 end) as cnt_gt_1,
total_cnt
from (select id, date, columnx, count(*) as cnt
sum(count(*)) over (partition by id, date) as total_cnt
from t
group by id, date, columnx
) x
where cnt > 1;
答案 1 :(得分:1)
另一种可能性:
SELECT
id,
date,
data.columnX columnX,
data.count_ count_bigger_1,
count_total
FROM(
SELECT
id,
date,
ARRAY_AGG(columnX) data,
COUNT(1) count_total
FROM
`your_table_name`
GROUP BY
id, date
),
UNNEST(ARRAY(SELECT AS STRUCT columnX, count(1) count_ FROM UNNEST(data) columnX GROUP BY columnX HAVING count(1) > 1)) data
您可以使用模拟数据进行测试:
WITH data AS(
SELECT 1 AS id, '2017-04-20' AS date, 'a' AS columnX UNION ALL
SELECT 1 AS id, '2017-04-20' AS date, 'a' AS columnX UNION ALL
SELECT 1 AS id, '2017-04-18' AS date, 'b' AS columnX UNION ALL
SELECT 1 AS id, '2017-04-17' AS date, 'c' AS columnX UNION ALL
SELECT 2 AS id, '2017-04-20' AS date, 'a' AS columnX UNION ALL
SELECT 2 AS id, '2017-04-20' AS date, 'a' AS columnX UNION ALL
SELECT 2 AS id, '2017-04-20' AS date, 'c' AS columnX UNION ALL
SELECT 2 AS id, '2017-04-19' AS date, 'b' AS columnX UNION ALL
SELECT 2 AS id, '2017-04-19' AS date, 'b' AS columnX UNION ALL
SELECT 2 AS id, '2017-04-19' AS date, 'b' AS columnX UNION ALL
SELECT 2 AS id, '2017-04-19' AS date, 'b' AS columnX UNION ALL
SELECT 2 AS id, '2017-04-19' AS date, 'c' AS columnX
)
SELECT
id,
date,
data.columnX columnX,
data.count_ count_bigger_1,
count_total
FROM(
SELECT
id,
date,
ARRAY_AGG(columnX) data,
COUNT(1) count_total
FROM
data
GROUP BY
id, date
),
UNNEST(ARRAY(SELECT AS STRUCT columnX, count(1) count_ FROM UNNEST(data) columnX GROUP BY columnX HAVING count(1) > 1)) data
此解决方案避免了分析功能(根据输入可能非常昂贵)并可以很好地扩展到大量数据。
答案 2 :(得分:1)
我建议您在示例中添加两行
1 2017-04-20 x
1 2017-04-20 x
并检查前两个答案中的哪些解决方案会给你:
它将如下所示:
id date columnX count>1 count_total
1 2017-04-20 a 2 4
1 2017-04-20 x 2 4
2 2017-04-20 a 2 3
2 2017-04-19 b 4 5
注意id = 1和date = 2017-04-20的两行,并且都有count_total = 4
我不确定这是否是您想要的 - 即使您可能在您的问题中甚至没有考虑过这种情况
无论如何,我觉得要支持更像这样的通用案例,你对输出的期望应该如下所示
Row id date x.columnX x.countX count_total
1 1 2017-04-20 x 2 4
a 2
2 2 2017-04-20 a 2 3
3 2 2017-04-19 b 4 5
其中x是重复字段,每个值表示各自的columnX及其计数
以下查询正是这样做的
#standardSQL
SELECT id, date,
ARRAY(SELECT x FROM UNNEST(x) AS x WHERE countX > 1) AS x,
count_total
FROM (
SELECT id, date, SUM(countX) AS count_total,
ARRAY_AGG(STRUCT<columnX STRING, countX INT64>(columnX, countX) ORDER BY countX DESC) AS X
FROM (
SELECT id, date,
columnX, COUNT(1) countX
FROM `yourTable`
GROUP BY id, date, columnX
)
GROUP BY id, date
HAVING count_total > 1
)
您可以使用问题中的虚拟数据进行/测试
#standardSQL
WITH `yourTable` AS(
SELECT 1 AS id, '2017-04-20' AS date, 'a' AS columnX UNION ALL
SELECT 1, '2017-04-20', 'a' UNION ALL
SELECT 1, '2017-04-20', 'x' UNION ALL
SELECT 1, '2017-04-20', 'x' UNION ALL
SELECT 1, '2017-04-18', 'b' UNION ALL
SELECT 1, '2017-04-17', 'c' UNION ALL
SELECT 2, '2017-04-20', 'a' UNION ALL
SELECT 2, '2017-04-20', 'a' UNION ALL
SELECT 2, '2017-04-20', 'c' UNION ALL
SELECT 2, '2017-04-19', 'b' UNION ALL
SELECT 2, '2017-04-19', 'b' UNION ALL
SELECT 2, '2017-04-19', 'b' UNION ALL
SELECT 2, '2017-04-19', 'b' UNION ALL
SELECT 2, '2017-04-19', 'c'
)
SELECT id, date,
ARRAY(SELECT x FROM UNNEST(x) AS x WHERE countX > 1) AS x,
count_total
FROM (
SELECT id, date, SUM(countX) AS count_total,
ARRAY_AGG(STRUCT<columnX STRING, countX INT64>(columnX, countX) ORDER BY countX DESC) AS X
FROM (
SELECT id, date,
columnX, COUNT(1) countX
FROM `yourTable`
GROUP BY id, date, columnX
)
GROUP BY id, date
HAVING count_total > 1
)