我有按日期登录的用户。我的要求是跟踪过去90天之内已登录的用户数量。
我对SQL尤其是Teradata都不熟悉,我无法根据需要使用窗口功能。
我需要以下结果,其中ACTIVE是对DATE的前90天窗口中显示的唯一USER_ID的计数。
DATES ACTIVE_IN_WINDOW
12/06/2018 20
13/06/2018 45
14/06/2018 65
15/06/2018 73
17/06/2018 24
18/06/2018 87
19/06/2018 34
20/06/2018 51
当前我的脚本如下。
这条线我无法正确显示
COUNT ( USER_ID) OVER (PARTITION BY USER_ID ORDER BY EVT_DT ROWS BETWEEN 90 PRECEDING AND 0 FOLLOWING)
我怀疑我需要一套不同的功能来完成这项工作。
SELECT b.DATES , a.ACTIVE_IN_WINDOW
FROM
(
SELECT
CAST(CALENDAR_DATE AS DATE) AS DATES FROM SYS_CALENDAR.CALENDAR
WHERE DATES BETWEEN ADD_MONTHS(CURRENT_DATE, - 10) AND CURRENT_DATE
) b
LEFT JOIN
(
SELECT USER_ID , EVT_DT
, COUNT ( USER_ID) OVER (PARTITION BY USER_ID ORDER BY EVT_DT ROWS BETWEEN 90 PRECEDING AND 0 FOLLOWING) AS ACTIVE_IN_WINDOW
FROM ENV0.R_ONBOARDING
) a
ON a.EVT_DT = b.DATES
ORDER BY b.DATES
谢谢您的帮助。
答案 0 :(得分:1)
逻辑与Gordon'相似,但在Teradata上,非等额联接而不是相关标量子查询通常更有效:
SELECT b.DATES , Count(DISTINCT USER_ID)
FROM
(
SELECT CALENDAR_DATE AS DATES
FROM SYS_CALENDAR.CALENDAR
WHERE DATES BETWEEN Add_Months(Current_Date, - 10) AND Current_Date
) b
LEFT JOIN
( -- apply DISTINCT before aggregation to reduce intermediate spool
SELECT DISTINCT USER_ID, EVT_DT
FROM ENV0.R_ONBOARDING
) AS a
ON a.EVT_DT BETWEEN Add_Months(b.DATES,-3) AND b.DATES
GROUP BY 1
ORDER BY 1
当然,这需要一个大的线轴和许多CPU。
编辑:
切换为星期会减少开销,我使用日期而不是星期数(对于其他范围更容易修改):
SELECT b.Week , Count(DISTINCT USER_ID)
FROM
( -- Return only Mondays instead of DISTINCT over all days
SELECT calendar_date AS Week
FROM SYS_CALENDAR.CALENDAR
WHERE CALENDAR_DATE BETWEEN Add_Months(Current_Date, -9) AND Current_Date
AND day_of_week = 2 -- 2 = Monday
) b
LEFT JOIN
(
SELECT DISTINCT USER_ID,
-- td_monday returns the previous Monday, but we need the following monday
-- covers the previous Tuesday up to the current Monday
Td_Monday(EVT_DT+6) AS PERIOD_WEEK
FROM ENV0.R_ONBOARDING
-- You should add another condition to limit the actually covered date range, e.g.
-- where EVT_DT BETWEEN Add_Months(b.DATES,-13) AND b.DATES
) AS a
ON a.PERIOD_WEEK BETWEEN b.Week-(12*7) AND b.Week
GROUP BY 1
ORDER BY 1
解释应复制日历以作为产品加入的准备,否则,您可能需要在挥发表中具体化日期。最好不要使用sys_calendar
,因为没有统计信息,例如优化程序不知道每周/每月/每年多少天,等等。检查您的系统,应该有一个满足您公司需求的日历表(所有列都有统计信息)
答案 1 :(得分:0)
如果数据不太大,则子查询可能是最简单的方法:
SELECT c.dte,
(SELECT COUNT(DISTINCT o.USER_ID)
FROM ENV0.R_ONBOARDING o
WHERE o.EVT_DT > ADD_MONTHS(dte, -3) AND
o.EVT_DT <= dte
) as three_month_count
FROM (SELECT CAST(CALENDAR_DATE AS DATE) AS dte
FROM SYS_CALENDAR.CALENDAR
WHERE CALENDAR_DATE BETWEEN ADD_MONTHS(CURRENT_DATE, - 10) AND CURRENT_DATE
) c;
您可能希望从3个月或更短的时间开始,以查看查询的效果。