这是我拥有的数据集(约10 TB)的示例
+----+------------+----------+----------------+--------------+
| id | date | campaign | campaign_start | campaign_end |
+----+------------+----------+----------------+--------------+
| 1 | 2018-01-01 | 1 | 2018-01-01 | 2018-02-03 |
+----+------------+----------+----------------+--------------+
| 1 | 2018-02-01 | 2 | 2018-02-01 | 2018-02-03 |
+----+------------+----------+----------------+--------------+
| 1 | 2018-02-02 | 2 | 2018-02-01 | 2018-02-03 |
+----+------------+----------+----------------+--------------+
| 1 | 2018-02-03 | 2 | 2018-02-01 | 2018-02-03 |
+----+------------+----------+----------------+--------------+
| 2 | 2018-01-23 | 1 | 2018-01-01 | 2018-02-03 |
+----+------------+----------+----------------+--------------+
| 2 | 2018-02-03 | 2 | 2018-02-01 | 2018-02-03 |
+----+------------+----------+----------------+--------------+
我要:
对于每个唯一的ID +广告系列:
我想要的输出是:
+----+----------+--------------------+--------------------------+----------------+--------------+------------+------------+
| id | campaign | campaign_frequency | total_lookback_frequency | campaign_start | campaign_end | first_date | last_date |
+----+----------+--------------------+--------------------------+----------------+--------------+------------+------------+
| 1 | 1 | 1 | 1 | 2018-01-01 | 2018-02-03 | 2018-01-01 | 2018-01-01 |
+----+----------+--------------------+--------------------------+----------------+--------------+------------+------------+
| 1 | 2 | 3 | 4 | 2018-02-01 | 2018-02-03 | 2018-01-01 | 2018-02-03 |
+----+----------+--------------------+--------------------------+----------------+--------------+------------+------------+
| 2 | 1 | 1 | 1 | 2018-01-01 | 2018-02-03 | 2018-01-23 | 2018-01-23 |
+----+----------+--------------------+--------------------------+----------------+--------------+------------+------------+
| 2 | 2 | 1 | 2 | 2018-02-01 | 2018-02-03 | 2018-01-23 | 2018-02-03 |
+----+----------+--------------------+--------------------------+----------------+--------------+------------+------------+
我一直遇到的问题是我无法使total_lookback_frequency正常工作,它总是返回与campaign_frequency(这只是id(广告系列)的count(id)组)相同的结果。
以下是我所拥有的(不起作用):
SELECT
id,
campaign,
min(date) as first_date,
max(date) as end_date,
count(id) as total_lookback_frequency,
WHERE
date >= sub(date, INTERVAL 730 hour)
GROUP BY
id,
campaign,
date
您能在这里帮忙吗?
谢谢!
答案 0 :(得分:1)
以下是用于BigQuery标准SQL
#standardSQL
SELECT
id,
campaign,
COUNT(1) campaign_frequency,
(
SELECT COUNT(1)
FROM `project.dataset.table`
WHERE id = t.id
AND dt BETWEEN DATE_SUB(t.campaign_start, INTERVAL 3 MONTH) AND DATE_SUB(t.campaign_start, INTERVAL 1 DAY)
) total_lookback_frequency,
campaign_start,
campaign_end,
MIN(dt) AS first_date,
MAX(dt) AS end_date
FROM `project.dataset.table` t
GROUP BY id, campaign, campaign_start, campaign_end
您可以使用下面的问题中的虚拟数据进行测试,操作
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 id, DATE '2018-01-01' dt, 1 campaign, DATE '2018-01-01' campaign_start, DATE '2018-02-03' campaign_end UNION ALL
SELECT 1, '2018-02-01', 2, '2018-02-01', '2018-02-03' UNION ALL
SELECT 1, '2018-02-02', 2, '2018-02-01', '2018-02-03' UNION ALL
SELECT 1, '2018-02-03', 2, '2018-02-01', '2018-02-03' UNION ALL
SELECT 2, '2018-01-23', 1, '2018-01-01', '2018-02-03' UNION ALL
SELECT 2, '2018-02-03', 2, '2018-02-01', '2018-02-03'
)
SELECT
id,
campaign,
COUNT(1) campaign_frequency,
(
SELECT COUNT(1)
FROM `project.dataset.table`
WHERE id = t.id
AND dt BETWEEN DATE_SUB(t.campaign_start, INTERVAL 3 MONTH) AND DATE_SUB(t.campaign_start, INTERVAL 1 DAY)
) total_lookback_frequency,
campaign_start,
campaign_end,
MIN(dt) AS first_date,
MAX(dt) AS end_date
FROM `project.dataset.table` t
GROUP BY id, campaign, campaign_start, campaign_end
-- ORDER BY id, campaign