如何获取表1中当月的ID计数,该计数也出现在表2中的当月或当月+ 1或当月+ 2

时间:2019-10-29 18:40:24

标签: sql google-bigquery

我试图获取表1和表2中都存在的ID的数量,条件是表1中存在的特定月份的ID应该与表2中存在的当月或同一月+ 1的ID相匹配。或月+ 2

我曾尝试在BigQuery中编写查询,但是如果不对月份进行硬编码,我就无法从其他表中获得3个月的计数

当前表格如下:

表1:
time_stamp | ID |


2019-06 | 1
2019-06 | 2
2019-06 | 3
2019-07 | 4
2019-07 | 5
2019-08 | 6
2019-08 | 7
2019-09 | 8
2019-09 | 9
2019-10 | 10

表2:
time_stamp | ID |


2019-06 | 1
2019-06 | 13
2019-06 | 8
2019-07 | 2
2019-07 | 9
2019-08 | 12
2019-08 | 4
2019-09 | 5
2019-09 | 13
2019-10 | 11
2019-10 | 6
2019-10 | 3

预期输出:
time_stamp |计数|


2019-06 | 2
2019-07 | 2
2019-08 | 1
2019-09 | 0
2019-10 | 0

2019-06的输出为2,因为2019-06表1中的ID 1和2位于2019-06、2019-07或2019-08中的表2中。请注意,表(1)的ID 3将不包含在(2019-06 + 2)个月后的表2中,因为它将出现在表2中。

3 个答案:

答案 0 :(得分:1)

我相信这可以实现您的目标,尽管没有提琴手我无法对其进行测试。

 SELECT timestamp,
       Max(id)
FROM   table1 t1
WHERE  EXISTS (SELECT 1
               FROM   table2 t2
               WHERE  Month(t2.timestamp) IN ( Month(t1.timestamp), Month(
                                               Dateadd(mm, 1, t1.timestamp)
                                                       ), Month(
                                                       Dateadd(mm, 2,
                                                       t1.timestamp)) )
                      AND Year(t1.timestamp) = Year(t2.timestamp)
                      AND t1.ID = t2.ID)
GROUP  BY timestamp  

编辑:

我最初看错了,您正在寻找计数。相同的逻辑,不同的格式。

 SELECT timestamp,
       Sum(CASE
             WHEN EXISTS (SELECT 1
                          FROM   table2 t2
                          WHERE  Month(t2.timestamp) IN (
                                 Month(t1.timestamp), Month(
                                 Dateadd(mm, 1, t1.timestamp)
                                         ), Month(
                                         Dateadd(mm, 2,
                                         t1.timestamp)) )
                                 AND Year(t1.timestamp) = Year(t2.timestamp)
                                 AND t1.ID = t2.ID)
           THEN 1
             ELSE 0
           END) AS "count"
FROM   table1 t1
GROUP  BY id, timestamp  

答案 1 :(得分:0)

我想我只复制第二个表中的数据并使用count(distinct)

select timestamp_trunc(t1.time_stamp, month),
       count(distinct t1.id)
from table1 t1 join
     ((select t2.time_stamp, t2.id
       from table2 t2
      ) union all
      (select timestamp_add(t2.time_stamp, interval 1 month), t2.id
       from table2 t2
      ) union all
      (select timestamp_add(t2.time_stamp, interval 2 month), t2.id
       from table2 t2
      )
     ) t2
     on t1.id = t2.id and
        timestamp_trunc(t1.time_stamp, month) = timestamp_trunc(t2.time_stamp, month)
group by timestamp_trunc(t1.time_stamp, month)
order by timestamp_trunc(t1.time_stamp, month)

答案 2 :(得分:0)

以下是用于BigQuery标准SQL

#standardSQL
SELECT time_stamp, COALESCE(`count`, 0) `count` 
FROM (
  SELECT DISTINCT time_stamp FROM `project.dataset.table_1`
) LEFT JOIN (
  SELECT time_stamp, COUNT(id) AS `count`
  FROM (
    SELECT t1.id, t1.time_stamp, ARRAY_AGG(t2.time_stamp) matches
    FROM `project.dataset.table_1` t1
    JOIN `project.dataset.table_2` t2
    ON t1.id = t2.id
    GROUP BY id, time_stamp
  )
  WHERE EXISTS (
    SELECT 1
    FROM UNNEST(matches) match
    WHERE PARSE_DATE('%Y-%m', match) 
      BETWEEN PARSE_DATE('%Y-%m', time_stamp) 
      AND DATE_ADD(PARSE_DATE('%Y-%m', time_stamp), INTERVAL 2 MONTH)
  )
  GROUP BY time_stamp
) USING (time_stamp)  

您可以使用问题中的示例数据来测试,操作以上内容

#standardSQL
WITH `project.dataset.table_1` AS (
  SELECT '2019-06' time_stamp, 1 id UNION ALL
  SELECT '2019-06', 2 UNION ALL
  SELECT '2019-06', 3 UNION ALL
  SELECT '2019-07', 4 UNION ALL
  SELECT '2019-07', 5 UNION ALL
  SELECT '2019-08', 6 UNION ALL
  SELECT '2019-08', 7 UNION ALL
  SELECT '2019-09', 8 UNION ALL
  SELECT '2019-09', 9 UNION ALL
  SELECT '2019-10', 10 
), `project.dataset.table_2` AS (
  SELECT '2019-06' time_stamp, 1 id UNION ALL
  SELECT '2019-06', 13 UNION ALL
  SELECT '2019-06', 8 UNION ALL
  SELECT '2019-07', 2 UNION ALL
  SELECT '2019-07', 9 UNION ALL
  SELECT '2019-08', 12 UNION ALL
  SELECT '2019-08', 4 UNION ALL
  SELECT '2019-09', 5 UNION ALL
  SELECT '2019-09', 13 UNION ALL
  SELECT '2019-10', 11 UNION ALL
  SELECT '2019-10', 6 UNION ALL
  SELECT '2019-10', 3 
)
SELECT time_stamp, COALESCE(`count`, 0) `count` 
FROM (
  SELECT DISTINCT time_stamp FROM `project.dataset.table_1`
) LEFT JOIN (
  SELECT time_stamp, COUNT(id) AS `count`
  FROM (
    SELECT t1.id, t1.time_stamp, ARRAY_AGG(t2.time_stamp) matches
    FROM `project.dataset.table_1` t1
    JOIN `project.dataset.table_2` t2
    ON t1.id = t2.id
    GROUP BY id, time_stamp
  )
  WHERE EXISTS (
    SELECT 1
    FROM UNNEST(matches) match
    WHERE PARSE_DATE('%Y-%m', match) 
      BETWEEN PARSE_DATE('%Y-%m', time_stamp) 
      AND DATE_ADD(PARSE_DATE('%Y-%m', time_stamp), INTERVAL 2 MONTH)
  )
  GROUP BY time_stamp
) USING (time_stamp)

有结果

Row time_stamp  count    
1   2019-06     2    
2   2019-07     2    
3   2019-08     1    
4   2019-09     0    
5   2019-10     0