在BigQuery上考虑这个表模式:
Table User
{
user_id: STRING (REQUIRED)
user_name: STRING (REQUIRED)
actions: RECORD (REPEATED)
{
action_id: STRING (REQUIRED)
action_type: INTEGER (REQUIRED)
action_date: TIMESTAMP (REQUIRED)
}
}
我想找到多次创建某种类型操作的所有用户(user_id和user_name),这些操作之间的最短时间少于X天。
未定义每个用户的存储操作数(可以是1,2或n)。这些操作不按任何标准排序(但我认为可以使用ORDER BY
解决这个问题。)
例如,与用户:
{
user_id: "u1",
user_name: "User 1",
actions:
{action_id: "a1", action_type: 1, action_date: "2016-02-22"},
{action_id: "a2", action_type: 1, action_date: "2016-01-22"},
{action_id: "a3", action_type: 1, action_date: "2015-12-22"}
},
{
user_id: "u2",
user_name: "User 2",
actions:
{action_id: "a4", action_type: 1, action_date: "2016-02-22"},
{action_id: "a5", action_type: 2, action_date: "2016-01-22"},
{action_id: "a6", action_type: 1, action_date: "2015-12-22"}
},
{
user_id: "u3",
user_name: "User 3",
actions:
{action_id: "a7", action_type: 1, action_date: "2016-02-22"}
},
{
user_id: "u4",
user_name: "User 4",
actions:
{action_id: "a8", action_type: 1, action_date: "2016-02-22"},
{action_id: "a9", action_type: 1, action_date: "2015-02-22"},
{action_id: "a10", action_type: 1, action_date: "2015-01-22"}
},
查询"选择多次执行1
类型操作的用户,每次执行之间的最短时间小于45
天"应该返回User 1
和User 4
。
关于如何在BigQuery上执行此操作的任何想法?
答案 0 :(得分:2)
请尝试以下
写在路上,因此没有经过测试,但我觉得它应该工作并做你需要的事情
SELECT
user_id,
user_name,
action_type,
MIN(DATEDIFF(action_date_next, action_date)) AS min_distance
FROM (
SELECT
user_id,
user_name,
action_type,
action_date,
LAG(action_date)
OVER(PARTITION BY user_id, action_type
ORDER BY action_date DESC) AS action_date_next
FROM (
SELECT
user_id,
user_name,
actions.action_type AS action_type,
actions.action_date AS action_date
FROM table_users
)
)
WHERE action_date_next IS NOT NULL
GROUP BY user_id, user_name, action_type
HAVING action_type = 1 AND min_distance < 45
以下版本更紧凑 - 尝试一下
SELECT
user_id,
user_name,
action_type,
MIN(DATEDIFF(action_date_next, action_date)) AS min_distance
FROM (
SELECT
user_id,
user_name,
actions.action_type AS action_type,
actions.action_date AS action_date,
LAG(actions.action_date)
OVER(PARTITION BY user_id, actions.action_type
ORDER BY actions.action_date DESC) AS action_date_next
FROM table_users
)
WHERE action_date_next IS NOT NULL
GROUP BY user_id, user_name, action_type
HAVING action_type = 1 AND min_distance < 45