如何获得每个月与上个月不同的不同ID的计数?

时间:2019-10-24 20:04:16

标签: sql google-bigquery

我正在尝试计算表中每个月的唯一ID数。但是要注意的是,每个月的ID数量应仅包括上个月不存在的ID

我正在尝试编写一个SQL查询,该查询将在Google BigQuery中运行,但到目前为止,我只想出了如何获取每月不同ID的数量。我无法弄清楚如何获得上个月不存在的ID的条件。

例如我有一个像下面的桌子 tbl1:

time_stamp | ID | col3 | col4
-------------------------------
2019-06-10 | 1  |  10  |  20
2019-06-10 | 2  |  11  |  21
2019-06-10 | 3  |  12  |  22
2019-07-10 | 2  |  11  |  21
2019-07-10 | 4  |  13  |  23
2019-08-10 | 4  |  13  |  23
2019-08-10 | 5  |  14  |  24
2019-09-10 | 5  |  14  |  24
2019-09-10 | 6  |  15  |  25

预期产量

time_stamp | count
--------------------
2019-06-10 |   3
2019-07-10 |   1
2019-08-10 |   1
2019-09-10 |   1

2 个答案:

答案 0 :(得分:1)

  

更新

我意识到-您要求的count of IDs for each month should only include IDs which were not present in the 上个月-不是前个月,而是

下面是解决方法

#standardSQL
SELECT month, COUNT(1) users
FROM (
  SELECT *, IFNULL(DATE_DIFF(month, LAG(month) OVER(PARTITION BY ID ORDER BY month), MONTH), 0) != 1 qualified
  FROM (
    SELECT DISTINCT DATE_TRUNC(time_stamp, MONTH) month, ID FROM `project.dataset.table` 
  )
)
WHERE qualified
GROUP BY month

您可以使用下面的示例数据进行测试

#standardSQL
WITH `project.dataset.table` AS (
  SELECT DATE '2019-06-10' time_stamp, 1 ID, 10 col3, 20 col4 UNION ALL
  SELECT '2019-06-10', 2, 11, 21 UNION ALL
  SELECT '2019-06-10', 3, 12, 22 UNION ALL
  SELECT '2019-06-11', 3, 12, 22 UNION ALL
  SELECT '2019-07-10', 2, 11, 21 UNION ALL
  SELECT '2019-07-10', 4, 13, 23 UNION ALL
  SELECT '2019-08-10', 1, 13, 23 UNION ALL
  SELECT '2019-08-10', 4, 13, 23 UNION ALL
  SELECT '2019-08-10', 5, 14, 24 UNION ALL
  SELECT '2019-09-10', 5, 14, 24 UNION ALL
  SELECT '2019-09-10', 6, 15, 25 
)
SELECT month, COUNT(1) users
FROM (
  SELECT *, IFNULL(DATE_DIFF(month, LAG(month) OVER(PARTITION BY ID ORDER BY month), MONTH), 0) != 1 qualified
  FROM (
    SELECT DISTINCT DATE_TRUNC(time_stamp, MONTH) month, ID FROM `project.dataset.table` 
  )
)
WHERE qualified
GROUP BY month
-- ORDER BY month

有结果

Row month   users    
1   2019-06-01  3    
2   2019-07-01  1    
3   2019-08-01  2    
4   2019-09-01  1    

希望,这次是您的要求!

  

初始答案   以下是用于BigQuery标准SQL的信息,它返回前几个月未显示的用户数

#standardSQL
SELECT time_stamp, COUNT(1) `count`
FROM (
  SELECT *, COUNT(1) OVER(PARTITION BY ID ORDER BY time_stamp) = 1 first_entry
  FROM `project.dataset.table`
)
WHERE first_entry
GROUP BY time_stamp

如果要应用于您的问题的样本数据-输出为

Row time_stamp  count    
1   2019-06-10  3    
2   2019-07-10  1    
3   2019-08-10  1    
4   2019-09-10  1    

您可以使用下面的示例进行测试

#standardSQL
WITH `project.dataset.table` AS (
  SELECT DATE '2019-06-10' time_stamp, 1 ID, 10 col3, 20 col4 UNION ALL
  SELECT '2019-06-10', 2, 11, 21 UNION ALL
  SELECT '2019-06-10', 3, 12, 22 UNION ALL
  SELECT '2019-07-10', 2, 11, 21 UNION ALL
  SELECT '2019-07-10', 4, 13, 23 UNION ALL
  SELECT '2019-08-10', 4, 13, 23 UNION ALL
  SELECT '2019-08-10', 5, 14, 24 UNION ALL
  SELECT '2019-09-10', 5, 14, 24 UNION ALL
  SELECT '2019-09-10', 6, 15, 25 
)
SELECT time_stamp, COUNT(1) `count`
FROM (
  SELECT *, COUNT(1) OVER(PARTITION BY ID ORDER BY time_stamp) = 1 first_entry
  FROM `project.dataset.table`
)
WHERE first_entry
GROUP BY time_stamp
-- ORDER BY time_stamp

万一您需要按月还是按日期分组(问题尚不清楚)

#standardSQL
WITH `project.dataset.table` AS (
  SELECT DATE '2019-06-10' time_stamp, 1 ID, 10 col3, 20 col4 UNION ALL
  SELECT '2019-06-11', 2, 11, 21 UNION ALL
  SELECT '2019-06-12', 3, 12, 22 UNION ALL
  SELECT '2019-07-10', 2, 11, 21 UNION ALL
  SELECT '2019-07-11', 4, 13, 23 UNION ALL
  SELECT '2019-08-10', 4, 13, 23 UNION ALL
  SELECT '2019-08-12', 5, 14, 24 UNION ALL
  SELECT '2019-09-10', 5, 14, 24 UNION ALL
  SELECT '2019-09-13', 6, 15, 25 
)
SELECT DATE_TRUNC(time_stamp, MONTH) month, COUNT(1) `count`
FROM (
  SELECT *, COUNT(1) OVER(PARTITION BY ID ORDER BY time_stamp) = 1 first_entry
  FROM `project.dataset.table`
)
WHERE first_entry
GROUP BY month
-- ORDER BY month

以上返回每月用户,不包括前几个月的用户

Row month   count    
1   2019-06-01  3    
2   2019-07-01  1    
3   2019-08-01  1    
4   2019-09-01  1    

答案 1 :(得分:0)

您可以使用两种聚合级别:

select yyyymm, count(*)
from (select id, date_trunc(min(time_stamp), month) as yyyymm
      from tbl1
      group by id
     ) t
group by yyyymm
order by yyyymm;