我有一个events
表,其中包含每个广告系列和每个人3种事件类型。这3个事件是“已接收电子邮件”,“已打开电子邮件”和“已单击电子邮件”。我希望能够获得一个人/活动的每个事件的时间戳,作为表中的新列。最好的方法是什么?
样本表数据:
campaign_id person_id event_type timestamp
1 1 Received Email 2018-01-01
1 1 Opened Email 2018-02-01
1 1 Clicked Email 2018-03-01
1 2 Received Email 2018-01-01
1 2 Opened Email 2018-02-01
1 2 Opened Email 2018-02-02
示例输出:
campaign_id person_id event_type timestamp receive_ts open_ts click_ts
1 1 Received Email 2018-01-01 2018-01-01 2018-02-01 2018-03-01
1 1 Opened Email 2018-02-01 2018-01-01 2018-02-01 2018-03-01
1 1 Clicked Email 2018-03-01 2018-01-01 2018-02-01 2018-03-01
1 2 Received Email 2018-01-01 2018-01-01 2018-02-01
1 2 Opened Email 2018-02-01 2018-01-01 2018-02-01
1 2 Opened Email 2018-02-02 2018-01-01 2018-02-01
我想到的唯一解决方案是,将表与基于campaign_id和person_id的表连接3次,对于每种事件类型均一次,但该表包含超过4亿行,因此显然效率不高。
任何建议都值得赞赏!
答案 0 :(得分:3)
您可以在此处尝试使用数据透视查询。例如,如果您希望每个人/每个活动从接收电子邮件到打开该电子邮件的分钟数不同,您可以尝试以下方法:
SELECT
campaign_id,
person_id,
TIMESTAMP_DIFF(
MAX(CASE WHEN event_type = 'Opened Email' THEN timestamp END),
MAX(CASE WHEN event_type = 'Received Email' THEN timestamp END),
MINUTE) AS diff_in_minutes
FROM yourTable
GROUP BY
campaign_id,
person_id;
注意:此答案是针对原始问题的,后来又作了实质性更改。
答案 1 :(得分:1)
以下是用于BigQuery标准SQL的代码,否-您不需要执行三个JOIN-在这里甚至不需要任何JOIN
#standardSQL
SELECT campaign_id, person_id, event_type, ts,
FIRST_VALUE(IF(event_type='Received Email', ts, NULL) IGNORE NULLS) OVER(win) receive_ts,
FIRST_VALUE(IF(event_type='Opened Email', ts, NULL) IGNORE NULLS) OVER(win) open_ts,
FIRST_VALUE(IF(event_type='Clicked Email', ts, NULL) IGNORE NULLS) OVER(win) click_ts
FROM `project.dataset.table`
WINDOW win AS (PARTITION BY campaign_id, person_id ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
您可以使用问题中的伪数据作为
进行上述测试/操作#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 campaign_id, 1 person_id, 'Received Email' event_type, '2018-01-01' ts UNION ALL
SELECT 1, 1, 'Opened Email', '2018-02-01' UNION ALL
SELECT 1, 1, 'Clicked Email', '2018-03-01' UNION ALL
SELECT 1, 2, 'Received Email', '2018-01-01' UNION ALL
SELECT 1, 2, 'Opened Email', '2018-02-01' UNION ALL
SELECT 1, 2, 'Opened Email', '2018-02-02'
)
SELECT campaign_id, person_id, event_type, ts,
FIRST_VALUE(IF(event_type='Received Email', ts, NULL) IGNORE NULLS) OVER(win) receive_ts,
FIRST_VALUE(IF(event_type='Opened Email', ts, NULL) IGNORE NULLS) OVER(win) open_ts,
FIRST_VALUE(IF(event_type='Clicked Email', ts, NULL) IGNORE NULLS) OVER(win) click_ts
FROM `project.dataset.table`
WINDOW win AS (PARTITION BY campaign_id, person_id ORDER BY ts ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
-- ORDER BY campaign_id, person_id, ts
结果应该是
Row campaign_id person_id event_type ts receive_ts open_ts click_ts
1 1 1 Received Email 2018-01-01 2018-01-01 2018-02-01 2018-03-01
2 1 1 Opened Email 2018-02-01 2018-01-01 2018-02-01 2018-03-01
3 1 1 Clicked Email 2018-03-01 2018-01-01 2018-02-01 2018-03-01
4 1 2 Received Email 2018-01-01 2018-01-01 2018-02-01 null
5 1 2 Opened Email 2018-02-01 2018-01-01 2018-02-01 null
6 1 2 Opened Email 2018-02-02 2018-01-01 2018-02-01 null