我有一个表,其中包含用户的操作(例如,查看页面,单击按钮等)。每行包含一个user_id,一个日期(created_on)和该操作的名称。我想创建一个查询,针对每个日期创建一个嵌套字段,其中包含该日期之前(包括该日期)采取的不同操作。例如,我有一个名为user_actions
的表:
-------------------------------------
| user_id | date | action |
-------------------------------------
| 1 | 2018-04-01 | click |
| 2 | 2018-04-01 | view |
| 1 | 2018-04-02 | view |
| 2 | 2018-04-02 | view |
| 2 | 2018-04-03 | buy |
-------------------------------------
would result in
-------------------------------------
| user_id | date | actions |
-------------------------------------
| 1 | 2018-04-01 | click |
| 2 | 2018-04-01 | view |
| 1 | 2018-04-02 | click |
| 2 | 2018-04-02 | view |
| | | view |
| 2 | 2018-04-03 | view |
| 2 | | buy |
-------------------------------------
在第二个表中,操作是嵌套的重复字段。我知道在一个时间点上我可以使用类似于以下内容的东西:
SELECT
user_id,
date,
ARRAY(action)
FROM
user_actions
GROUP BY
1,2
但是,我不确定如何扩展此范围,以便为原始表中的每个日期提供相同的计算,并且仅查看date
字段之前的时间。
任何帮助将不胜感激。谢谢!
答案 0 :(得分:2)
为该日期之前(包括该日期)执行的不同操作创建一个嵌套字段
以下是用于BigQuery Standrad SQL
#standardSQL
SELECT user_id, date,
ARRAY(
SELECT DISTINCT action FROM UNNEST(actions) action
) actions
FROM (
SELECT user_id, date, ARRAY_AGG(action) OVER(win) actions
FROM `project.dataset.table`
WINDOW win AS (
PARTITION BY user_id ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
)
您可以使用问题中的示例数据来进行测试,如上示例所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 user_id, '2018-04-01' date, 'click' action UNION ALL
SELECT 2, '2018-04-01', 'view' UNION ALL
SELECT 1, '2018-04-02', 'view' UNION ALL
SELECT 2, '2018-04-02', 'view' UNION ALL
SELECT 2, '2018-04-03', 'buy'
)
SELECT user_id, date,
ARRAY(
SELECT DISTINCT action FROM UNNEST(actions) action
) actions
FROM (
SELECT user_id, date, ARRAY_AGG(action) OVER(win) actions
FROM `project.dataset.table`
WINDOW win AS (
PARTITION BY user_id ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
)
-- ORDER BY date, user_id
有结果
更新
以下版本支持同一天在同一用户的多个操作下更通用的情况(我意识到最初的回答不是这种情况)
#standardSQL
SELECT user_id, date,
ARRAY(
SELECT DISTINCT action FROM UNNEST(SPLIT(actions)) action
) actions
FROM (
SELECT user_id, date , STRING_AGG(actions) OVER(win) actions
FROM (
SELECT user_id, date, STRING_AGG(DISTINCT action) actions
FROM `project.dataset.table`
GROUP BY user_id, date
)
WINDOW win AS (
PARTITION BY user_id ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
)
您可以使用以下示例数据对其进行测试(请注意具有活动='play'的extyra行)
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 user_id, DATE '2018-04-01' date, 'click' action UNION ALL
SELECT 2, '2018-04-01', 'view' UNION ALL
SELECT 1, '2018-04-02', 'view' UNION ALL
SELECT 1, '2018-04-02', 'play' UNION ALL
SELECT 2, '2018-04-02', 'view' UNION ALL
SELECT 2, '2018-04-03', 'buy'
)
SELECT user_id, date,
ARRAY(
SELECT DISTINCT action FROM UNNEST(SPLIT(actions)) action
) actions
FROM (
SELECT user_id, date , STRING_AGG(actions) OVER(win) actions
FROM (
SELECT user_id, date, STRING_AGG(DISTINCT action) actions
FROM `project.dataset.table`
GROUP BY user_id, date
)
WINDOW win AS (
PARTITION BY user_id ORDER BY date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
)
)
-- ORDER BY date, user_id
有结果