在同一查询中计算总数和计数

时间:2017-08-21 13:12:30

标签: sql google-bigquery

有没有办法获得每个{id,date}的总行数和计数>同一查询中每{id,date,columnX} 1个?

例如,有这样一张表:

 id         date         columnX
1        2017-04-20         a
1        2017-04-20         a
1        2017-04-18         b
1        2017-04-17         c
2        2017-04-20         a
2        2017-04-20         a
2        2017-04-20         c
2        2017-04-19         b
2        2017-04-19         b
2        2017-04-19         b
2        2017-04-19         b
2        2017-04-19         c

结果,我想得到下表:

id         date       columnX         count>1    count_total  
1        2017-04-20       a              2            2
2        2017-04-20       a              2            3
2        2017-04-19       b              4            5

我尝试用分区来做,但收到奇怪的结果。我听说可能会使用Rollup函数,但它似乎只适用于遗留SQL,这对我来说不是一个选择。

3 个答案:

答案 0 :(得分:2)

如果我理解正确,你可以使用窗口功能:

select id, date, columnx, cnt,
       (case when cnt > 1 then cnt else 0 end) as cnt_gt_1,
       total_cnt
from (select id, date, columnx, count(*) as cnt
             sum(count(*)) over (partition by id, date) as total_cnt
      from t
      group by id, date, columnx
     ) x
where cnt > 1;

答案 1 :(得分:1)

另一种可能性:

SELECT
  id,
  date,
  data.columnX columnX,
  data.count_ count_bigger_1,
  count_total
FROM(
  SELECT
    id,
    date,
    ARRAY_AGG(columnX) data,
    COUNT(1) count_total
  FROM
    `your_table_name`
  GROUP BY
    id, date
  ),
UNNEST(ARRAY(SELECT AS STRUCT columnX, count(1) count_ FROM UNNEST(data) columnX GROUP BY columnX HAVING count(1) > 1)) data

您可以使用模拟数据进行测试:

WITH data AS(
  SELECT 1 AS id, '2017-04-20' AS date, 'a' AS columnX UNION ALL
  SELECT 1 AS id, '2017-04-20' AS date, 'a' AS columnX UNION ALL
  SELECT 1 AS id, '2017-04-18' AS date, 'b' AS columnX UNION ALL
  SELECT 1 AS id, '2017-04-17' AS date, 'c' AS columnX UNION ALL
  SELECT 2 AS id, '2017-04-20' AS date, 'a' AS columnX UNION ALL
  SELECT 2 AS id, '2017-04-20' AS date, 'a' AS columnX UNION ALL
  SELECT 2 AS id, '2017-04-20' AS date, 'c' AS columnX UNION ALL
  SELECT 2 AS id, '2017-04-19' AS date, 'b' AS columnX UNION ALL
  SELECT 2 AS id, '2017-04-19' AS date, 'b' AS columnX UNION ALL
  SELECT 2 AS id, '2017-04-19' AS date, 'b' AS columnX UNION ALL
  SELECT 2 AS id, '2017-04-19' AS date, 'b' AS columnX UNION ALL
  SELECT 2 AS id, '2017-04-19' AS date, 'c' AS columnX  
)

SELECT
  id,
  date,
  data.columnX columnX,
  data.count_ count_bigger_1,
  count_total
FROM(
  SELECT
    id,
    date,
    ARRAY_AGG(columnX) data,
    COUNT(1) count_total
  FROM
    data
  GROUP BY
    id, date
  ),
UNNEST(ARRAY(SELECT AS STRUCT columnX, count(1) count_ FROM UNNEST(data) columnX GROUP BY columnX HAVING count(1) > 1)) data

此解决方案避免了分析功能(根据输入可能非常昂贵)并可以很好地扩展到大量数据。

答案 2 :(得分:1)

我建议您在示例中添加两行

1        2017-04-20         x
1        2017-04-20         x
  

并检查前两个答案中的哪些解决方案会给你:
它将如下所示:

id         date       columnX         count>1    count_total  
1        2017-04-20       a              2            4
1        2017-04-20       x              2            4
2        2017-04-20       a              2            3
2        2017-04-19       b              4            5    

注意id = 1和date = 2017-04-20的两行,并且都有count_total = 4
我不确定这是否是您想要的 - 即使您可能在您的问题中甚至没有考虑过这种情况

无论如何,我觉得要支持更像这样的通用案例,你对输出的期望应该如下所示

Row id  date        x.columnX   x.countX    count_total  
1   1   2017-04-20  x           2           4    
                    a           2        
2   2   2017-04-20  a           2           3    
3   2   2017-04-19  b           4           5    

其中x是重复字段,每个值表示各自的columnX及其计数

以下查询正是这样做的

#standardSQL
SELECT id, date,
  ARRAY(SELECT x FROM UNNEST(x) AS x WHERE countX > 1) AS x,
  count_total
FROM (
  SELECT id, date, SUM(countX) AS count_total,
    ARRAY_AGG(STRUCT<columnX STRING, countX INT64>(columnX, countX) ORDER BY countX DESC) AS X    
  FROM (
    SELECT id, date, 
      columnX, COUNT(1) countX
    FROM  `yourTable`
    GROUP BY id, date, columnX
  )
  GROUP BY id, date
  HAVING count_total > 1
)

您可以使用问题中的虚拟数据进行/测试

#standardSQL
WITH `yourTable` AS(
  SELECT 1 AS id, '2017-04-20' AS date, 'a' AS columnX UNION ALL
  SELECT 1, '2017-04-20', 'a' UNION ALL
  SELECT 1, '2017-04-20', 'x' UNION ALL
  SELECT 1, '2017-04-20', 'x' UNION ALL
  SELECT 1, '2017-04-18', 'b' UNION ALL
  SELECT 1, '2017-04-17', 'c' UNION ALL
  SELECT 2, '2017-04-20', 'a' UNION ALL
  SELECT 2, '2017-04-20', 'a' UNION ALL
  SELECT 2, '2017-04-20', 'c' UNION ALL
  SELECT 2, '2017-04-19', 'b' UNION ALL
  SELECT 2, '2017-04-19', 'b' UNION ALL
  SELECT 2, '2017-04-19', 'b' UNION ALL
  SELECT 2, '2017-04-19', 'b' UNION ALL
  SELECT 2, '2017-04-19', 'c'  
)
SELECT id, date,
  ARRAY(SELECT x FROM UNNEST(x) AS x WHERE countX > 1) AS x,
  count_total
FROM (
  SELECT id, date, SUM(countX) AS count_total,
    ARRAY_AGG(STRUCT<columnX STRING, countX INT64>(columnX, countX) ORDER BY countX DESC) AS X    
  FROM (
    SELECT id, date, 
      columnX, COUNT(1) countX
    FROM  `yourTable`
    GROUP BY id, date, columnX
  )
  GROUP BY id, date
  HAVING count_total > 1
)