我正在尝试计算Google Big Query中数据集的第N天保留时间。该表包含一个移动应用程序一个月的数据,我想找出每天返回的用户数量。我正在使用standardSQL。到目前为止,我拥有的代码是
SELECT date(d1.eventDate) as dt,
COUNT(distinct d1.userID) as total_users,
COUNT(distinct d2.userID) as retained_users
FROM `dataset` as d1
LEFT JOIN `dataset` as d2 ON
d1.userID = d2.userID
AND date(d1.eventDate) = date(datetime(d2.eventDate, '-1 day'))
GROUP BY 1
ORDER BY 1"
当我尝试执行时,我收到错误消息
Error: Invalid time zone: -1 day [invalidQuery]
我的表结构是
eventDate | UserID |
2016-05-06 00:00:00 UTC | 100000 |
2016-05-06 00:00:00 UTC | 200000 |
2016-05-06 00:00:00 UTC | 300000 |
我应该使用什么代替“ -1天”?
答案 0 :(得分:1)
TIMESTAMP_SUB
可以按照书面形式解决查询问题,但由于性能原因,它可能不足以作为解决方案。但至少可以让您减去1天:
SELECT date(d1.created_at) as dt,
COUNT(distinct d1.actor.id) as total_users,
COUNT(distinct d2.actor.id) as retained_users
FROM `githubarchive.month.201810` as d1
LEFT JOIN `githubarchive.month.201810` as d2 ON
d1.actor.id = d2.actor.id
AND date(d1.created_at) = date(TIMESTAMP_SUB(d2.created_at, INTERVAL -24 HOUR))
GROUP BY 1
ORDER BY 1
要提高性能,请在JOIN之前进行一些重复数据删除:
SELECT day as dt,
COUNT(distinct d1.id) as total_users,
COUNT(distinct d2.id) as retained_users
FROM (SELECT DISTINCT actor.id, DATE(created_at) day FROM `githubarchive.month.201810`)as d1
LEFT JOIN (SELECT DISTINCT actor.id, DATE(TIMESTAMP_SUB(created_at, INTERVAL -24 HOUR)) day FROM `githubarchive.month.201810`) as d2
USING (id, day)
GROUP BY 1
ORDER BY 1
答案 1 :(得分:0)
以下内容适用于BigQuery Standard SQL,并且经过了进一步优化,以不使用任何JOIN而是使用解析函数
#standardSQL
SELECT
day,
COUNT(1) total_users,
COUNTIF(delta = 1) retained_users
FROM (
SELECT
day, id,
DATE_DIFF(day, LAG(day) OVER(PARTITION BY id ORDER BY day), DAY) delta
FROM (
SELECT DISTINCT
DATE(created_at) day,
actor.id
FROM `githubarchive.month.201810`
)
)
GROUP BY day
ORDER BY day
或者,如果使用原始问题的表示法:
#standardSQL
SELECT
day,
COUNT(1) total_users,
COUNTIF(delta = 1) retained_users
FROM (
SELECT
day, userID,
DATE_DIFF(day, LAG(day) OVER(PARTITION BY userID ORDER BY day), DAY) delta
FROM (
SELECT DISTINCT
DATE(eventDate) day,
userID
FROM `project.dataset.table`
)
)
GROUP BY day
ORDER BY day