我试图在postgres中使用LAG()和LEAD()函数从表中的其他行/记录中检索值,我遇到了一些困难。只要LAG或LEAD函数只查看同一个月内的日期(即6月2日可以回溯到6月1日,但是当我尝试回顾5月31日,我检索到一个NULL值)时,该功能可以正常工作。
这是表格的样子
_date count_daily_active_users count_new_users day1_users users_arriving_today_who_returned_tomrrow day_retained_users
5/27/2013 1742 335 266 207 0.617910448
5/28/2013 1768 241 207 146 0.605809129
5/29/2013 1860 272 146 161 0.591911765
5/30/2013 2596 841 161 499 0.59334126
5/31/2013 2837 703 499 NULL NULL
6/1/2013 12881 10372 0 5446 0.525067489
6/2/2013 14340 6584 5446 2781 0.422387606
6/3/2013 12222 3690 2781 1494 0.404878049
6/4/2013 25861 17254 1494 8912 0.516517909
从那张桌子上你可以看到5月31日我试图向前看#39;到6月1日,检索5月31日第一次到达的用户数,然后在6月1日再次返回,我得到一个NULL值。这发生在每个月的边界,无论我试图向前看的天数如何都会发生这种情况。因此,如果我向前看两天,那么我将在5月30日和5月31日获得NULL。
这是我写的SQL
SELECT
timestamp_session::date AS _date
, COUNT(DISTINCT dim_player_key) AS count_daily_active_users
, COUNT(DISTINCT CASE WHEN days_since_birth = 0 THEN dim_player_key ELSE NULL END) AS count_new_users
, COUNT(DISTINCT CASE WHEN days_since_birth != 0 THEN dim_player_key ELSE NULL END) AS count_returning_users
, COUNT(DISTINCT CASE WHEN days_since_birth = 1 THEN dim_player_key ELSE NULL END) AS day1_users -- note: the function is a LAG function instead of a LEAD function because of the sort order
, (NULLIF(LAG(COUNT(DISTINCT CASE WHEN days_since_birth = 0 THEN dim_player_key ELSE NULL END), 1) OVER (order by _date)::float, 0)) as AA
, (NULLIF(LAG(COUNT(DISTINCT CASE WHEN days_since_birth = 1 THEN dim_player_key ELSE NULL END), 1) OVER (order by _date)::float, 0)) as AB
, (NULLIF(LAG(COUNT(DISTINCT CASE WHEN days_since_birth = 0 THEN dim_player_key ELSE NULL END), 0) OVER (order by _date)::float, 0)) as BB
, (NULLIF(LAG(COUNT(DISTINCT CASE WHEN days_since_birth = 1 THEN dim_player_key ELSE NULL END), 0) OVER (order by _date)::float, 0)) as BA
FROM ( SELECT sessions_table.account_id AS dim_player_key,
sessions_table.session_id AS dim_session_key,
sessions_table.title_id AS dim_title_id,
sessions_table.appid AS dim_app_id,
sessions_table.loginip AS login_ip,
essions_table.logindate AS timestamp_session,
birthdate_table.birthdate AS timestamp_birthdate,
EXTRACT(EPOCH FROM (sessions_table.logindate - birthdate_table.birthdate)) AS count_age_in_seconds,
(date_part('day', sessions_table.logindate)- date_part('day', birthdate_table.birthdate)) AS days_since_birth
FROM
dataset.tablename1 AS sessions_table
JOIN (
SELECT
account_id,
MIN(logindate) AS birthdate
FROM
dataset.tablename1
GROUP BY
account_id )
-- call this sub-table the birthdate_table
birthdate_table ON
sessions_table.account_id = birthdate_table.account_id
-- call this table the outer_sessions_table
) AS outer_sessions_table
GROUP BY
_date
ORDER BY
_date ASC
我认为我可能需要做的是在内部选择中添加一个额外的字段,将日期报告为整数值 - 就像午夜的那个日期的EPOCH时间一样。但是当我尝试过(添加每天的纪元时间)时,它会将输出表中的所有值更改为1.而且我不明白为什么。
任何人都可以帮助我吗?
谢谢, 布拉德
答案 0 :(得分:0)
问题在于days_since_birth计算。我正在使用
(date_part('day',
sessions_table.logindate)- date_part('day',
birthdate_table.birthdate)) AS days_since_birth
好像它正在减去绝对日期以给我这些日期之间的差异,但它只是将日期转换为月中的某一天并减去它,所以在月份滚动时,它返回-27, -29,-30(取决于月份)。我可以通过ABS功能包装来解决这个问题。