我试图获取表1和表2中都存在的ID的数量,条件是表1中存在的特定月份的ID应该与表2中存在的当月或同一月+ 1的ID相匹配。或月+ 2
我曾尝试在BigQuery中编写查询,但是如果不对月份进行硬编码,我就无法从其他表中获得3个月的计数
当前表格如下:
表1:
time_stamp | ID |
2019-06 | 1
2019-06 | 2
2019-06 | 3
2019-07 | 4
2019-07 | 5
2019-08 | 6
2019-08 | 7
2019-09 | 8
2019-09 | 9
2019-10 | 10
表2:
time_stamp | ID |
2019-06 | 1
2019-06 | 13
2019-06 | 8
2019-07 | 2
2019-07 | 9
2019-08 | 12
2019-08 | 4
2019-09 | 5
2019-09 | 13
2019-10 | 11
2019-10 | 6
2019-10 | 3
预期输出:
time_stamp |计数|
2019-06 | 2
2019-07 | 2
2019-08 | 1
2019-09 | 0
2019-10 | 0
2019-06的输出为2,因为2019-06表1中的ID 1和2位于2019-06、2019-07或2019-08中的表2中。请注意,表(1)的ID 3将不包含在(2019-06 + 2)个月后的表2中,因为它将出现在表2中。
答案 0 :(得分:1)
我相信这可以实现您的目标,尽管没有提琴手我无法对其进行测试。
SELECT timestamp,
Max(id)
FROM table1 t1
WHERE EXISTS (SELECT 1
FROM table2 t2
WHERE Month(t2.timestamp) IN ( Month(t1.timestamp), Month(
Dateadd(mm, 1, t1.timestamp)
), Month(
Dateadd(mm, 2,
t1.timestamp)) )
AND Year(t1.timestamp) = Year(t2.timestamp)
AND t1.ID = t2.ID)
GROUP BY timestamp
编辑:
我最初看错了,您正在寻找计数。相同的逻辑,不同的格式。
SELECT timestamp,
Sum(CASE
WHEN EXISTS (SELECT 1
FROM table2 t2
WHERE Month(t2.timestamp) IN (
Month(t1.timestamp), Month(
Dateadd(mm, 1, t1.timestamp)
), Month(
Dateadd(mm, 2,
t1.timestamp)) )
AND Year(t1.timestamp) = Year(t2.timestamp)
AND t1.ID = t2.ID)
THEN 1
ELSE 0
END) AS "count"
FROM table1 t1
GROUP BY id, timestamp
答案 1 :(得分:0)
我想我只复制第二个表中的数据并使用count(distinct)
:
select timestamp_trunc(t1.time_stamp, month),
count(distinct t1.id)
from table1 t1 join
((select t2.time_stamp, t2.id
from table2 t2
) union all
(select timestamp_add(t2.time_stamp, interval 1 month), t2.id
from table2 t2
) union all
(select timestamp_add(t2.time_stamp, interval 2 month), t2.id
from table2 t2
)
) t2
on t1.id = t2.id and
timestamp_trunc(t1.time_stamp, month) = timestamp_trunc(t2.time_stamp, month)
group by timestamp_trunc(t1.time_stamp, month)
order by timestamp_trunc(t1.time_stamp, month)
答案 2 :(得分:0)
以下是用于BigQuery标准SQL
#standardSQL
SELECT time_stamp, COALESCE(`count`, 0) `count`
FROM (
SELECT DISTINCT time_stamp FROM `project.dataset.table_1`
) LEFT JOIN (
SELECT time_stamp, COUNT(id) AS `count`
FROM (
SELECT t1.id, t1.time_stamp, ARRAY_AGG(t2.time_stamp) matches
FROM `project.dataset.table_1` t1
JOIN `project.dataset.table_2` t2
ON t1.id = t2.id
GROUP BY id, time_stamp
)
WHERE EXISTS (
SELECT 1
FROM UNNEST(matches) match
WHERE PARSE_DATE('%Y-%m', match)
BETWEEN PARSE_DATE('%Y-%m', time_stamp)
AND DATE_ADD(PARSE_DATE('%Y-%m', time_stamp), INTERVAL 2 MONTH)
)
GROUP BY time_stamp
) USING (time_stamp)
您可以使用问题中的示例数据来测试,操作以上内容
#standardSQL
WITH `project.dataset.table_1` AS (
SELECT '2019-06' time_stamp, 1 id UNION ALL
SELECT '2019-06', 2 UNION ALL
SELECT '2019-06', 3 UNION ALL
SELECT '2019-07', 4 UNION ALL
SELECT '2019-07', 5 UNION ALL
SELECT '2019-08', 6 UNION ALL
SELECT '2019-08', 7 UNION ALL
SELECT '2019-09', 8 UNION ALL
SELECT '2019-09', 9 UNION ALL
SELECT '2019-10', 10
), `project.dataset.table_2` AS (
SELECT '2019-06' time_stamp, 1 id UNION ALL
SELECT '2019-06', 13 UNION ALL
SELECT '2019-06', 8 UNION ALL
SELECT '2019-07', 2 UNION ALL
SELECT '2019-07', 9 UNION ALL
SELECT '2019-08', 12 UNION ALL
SELECT '2019-08', 4 UNION ALL
SELECT '2019-09', 5 UNION ALL
SELECT '2019-09', 13 UNION ALL
SELECT '2019-10', 11 UNION ALL
SELECT '2019-10', 6 UNION ALL
SELECT '2019-10', 3
)
SELECT time_stamp, COALESCE(`count`, 0) `count`
FROM (
SELECT DISTINCT time_stamp FROM `project.dataset.table_1`
) LEFT JOIN (
SELECT time_stamp, COUNT(id) AS `count`
FROM (
SELECT t1.id, t1.time_stamp, ARRAY_AGG(t2.time_stamp) matches
FROM `project.dataset.table_1` t1
JOIN `project.dataset.table_2` t2
ON t1.id = t2.id
GROUP BY id, time_stamp
)
WHERE EXISTS (
SELECT 1
FROM UNNEST(matches) match
WHERE PARSE_DATE('%Y-%m', match)
BETWEEN PARSE_DATE('%Y-%m', time_stamp)
AND DATE_ADD(PARSE_DATE('%Y-%m', time_stamp), INTERVAL 2 MONTH)
)
GROUP BY time_stamp
) USING (time_stamp)
有结果
Row time_stamp count
1 2019-06 2
2 2019-07 2
3 2019-08 1
4 2019-09 0
5 2019-10 0