使用Lag()函数检索日期

时间:2016-10-03 17:52:43

标签: postgresql

我试图在postgres中使用LAG()和LEAD()函数从表中的其他行/记录中检索值,我遇到了一些困难。只要LAG或LEAD函数只查看同一个月内的日期(即6月2日可以回溯到6月1日,但是当我尝试回顾5月31日,我检索到一个NULL值)时,该功能可以正常工作。

这是表格的样子

_date   count_daily_active_users    count_new_users day1_users  users_arriving_today_who_returned_tomrrow   day_retained_users
5/27/2013   1742    335 266 207 0.617910448
5/28/2013   1768    241 207 146 0.605809129
5/29/2013   1860    272 146 161 0.591911765
5/30/2013   2596    841 161 499 0.59334126 
5/31/2013   2837    703 499 NULL    NULL
6/1/2013    12881   10372   0   5446    0.525067489
6/2/2013    14340   6584    5446    2781    0.422387606
6/3/2013    12222   3690    2781    1494    0.404878049
6/4/2013    25861   17254   1494    8912    0.516517909

从那张桌子上你可以看到5月31日我试图向前看#39;到6月1日,检索5月31日第一次到达的用户数,然后在6月1日再次返回,我得到一个NULL值。这发生在每个月的边界,无论我试图向前看的天数如何都会发生这种情况。因此,如果我向前看两天,那么我将在5月30日和5月31日获得NULL。

这是我写的SQL

SELECT
  timestamp_session::date AS _date
  , COUNT(DISTINCT dim_player_key) AS count_daily_active_users
  , COUNT(DISTINCT CASE WHEN days_since_birth = 0 THEN dim_player_key ELSE NULL END) AS count_new_users
  , COUNT(DISTINCT CASE WHEN days_since_birth != 0 THEN dim_player_key ELSE NULL END) AS count_returning_users
  , COUNT(DISTINCT CASE WHEN days_since_birth = 1 THEN dim_player_key ELSE NULL END) AS day1_users  -- note: the function is a LAG function instead of a LEAD function because of the sort order
  , (NULLIF(LAG(COUNT(DISTINCT CASE WHEN days_since_birth = 0 THEN dim_player_key ELSE NULL END), 1) OVER (order by _date)::float, 0)) as AA
  , (NULLIF(LAG(COUNT(DISTINCT CASE WHEN days_since_birth = 1 THEN dim_player_key ELSE NULL END), 1) OVER (order by _date)::float, 0)) as AB
  , (NULLIF(LAG(COUNT(DISTINCT CASE WHEN days_since_birth = 0 THEN dim_player_key ELSE NULL END), 0) OVER (order by _date)::float, 0)) as BB
  , (NULLIF(LAG(COUNT(DISTINCT CASE WHEN days_since_birth = 1 THEN dim_player_key ELSE NULL END), 0) OVER (order by _date)::float, 0)) as BA    

FROM ( SELECT    sessions_table.account_id AS dim_player_key,
    sessions_table.session_id AS dim_session_key,
    sessions_table.title_id AS dim_title_id,
    sessions_table.appid AS dim_app_id,
    sessions_table.loginip AS login_ip,
    essions_table.logindate AS timestamp_session,    
    birthdate_table.birthdate AS timestamp_birthdate,    
    EXTRACT(EPOCH FROM (sessions_table.logindate - birthdate_table.birthdate)) AS count_age_in_seconds,
    (date_part('day', sessions_table.logindate)- date_part('day', birthdate_table.birthdate)) AS days_since_birth    

  FROM
    dataset.tablename1 AS sessions_table  
    JOIN ( 
      SELECT      
      account_id,
      MIN(logindate) AS birthdate
    FROM
      dataset.tablename1    
      GROUP BY
      account_id )
    -- call this sub-table the birthdate_table
    birthdate_table  ON
    sessions_table.account_id = birthdate_table.account_id
    -- call this table the outer_sessions_table
    ) AS outer_sessions_table
GROUP BY
  _date
ORDER BY
  _date ASC

我认为我可能需要做的是在内部选择中添加一个额外的字段,将日期报告为整数值 - 就像午夜的那个日期的EPOCH时间一样。但是当我尝试过(添加每天的纪元时间)时,它会将输出表中的所有值更改为1.而且我不明白为什么。

任何人都可以帮助我吗?

谢谢, 布拉德

1 个答案:

答案 0 :(得分:0)

问题在于days_since_birth计算。我正在使用

    (date_part('day',
    sessions_table.logindate)- date_part('day',
    birthdate_table.birthdate)) AS days_since_birth

好像它正在减去绝对日期以给我这些日期之间的差异,但它只是将日期转换为月中的某一天并减去它,所以在月份滚动时,它返回-27, -29,-30(取决于月份)。我可以通过ABS功能包装来解决这个问题。