获取连续状态的行号并在更改时重置

时间:2018-10-18 17:17:59

标签: sql amazon-redshift window-functions

因此,我希望能够为用户跟踪过去几周的连续登录次数。我已经尝试过row_number()结束(按周按州顺序划分),但是当状态更改时,row_numbers不会重置。这是一个示例表。

   multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\pool.py", line 121, in worker
    result = (True, func(*args, **kwds))
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\pool.py", line 44, in mapstar
    return list(map(*args))
  File "C:\Users\user\Desktop\parallel_FTP.py", line 20, in ftp_upload
    ftp.storlines('STOR %s' % remote_path+filename, f)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\ftplib.py", line 529, in storlines
    self.voidcmd('TYPE A')
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\ftplib.py", line 277, in voidcmd
    self.putcmd(cmd)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\ftplib.py", line 199, in putcmd
    self.putline(line)
  File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\ftplib.py", line 194, in putline
    self.sock.sendall(line.encode(self.encoding))
AttributeError: 'NoneType' object has no attribute 'sendall'
"""

我希望输出看起来像这样:

user_id |     week     | state  
--------+--------------+-------
1       | 2018-01-01   | Active  
1       | 2018-01-08   | Inactive  
1       | 2018-01-15   | Inactive  
1       | 2018-01-22   | Active  
1       | 2018-01-29   | Active  
2       | 2018-01-01   | Inactive  
2       | 2018-01-08   | Active  
2       | 2018-01-15   | Inactive  
2       | 2018-01-22   | Active  
2       | 2018-01-29   | Active 

这是我当前的查询:

user_id |     week     |  state   | streak
--------+--------------+----------+---------
1000    | 2018-01-01   | Active   |  1
1000    | 2018-01-08   | Inactive |  1
1000    | 2018-01-15   | Inactive |  2
1000    | 2018-01-22   | Active   |  1
1000    | 2018-01-29   | Active   |  2
2000    | 2018-01-01   | Inactive |  1
2000    | 2018-01-08   | Active   |  1
2000    | 2018-01-15   | Inactive |  1
2000    | 2018-01-22   | Active   |  1
2000    | 2018-01-29   | Active   |  2

我的输出当前看起来像这样:

SELECT
    week,
    user_id,
    state,
    row_number()
    OVER(PARTITION BY user_id, state
      order by user_id, week) AS streak
  FROM
    t.data_table
  GROUP BY 1,2,3
  order by week;

这里的任何建议都会有所帮助。

1 个答案:

答案 0 :(得分:0)

这是一个孤岛问题。该策略是定义状态相似的行组,然后使用row_number()对其进行枚举。

一种方法使用不同的行号:

select t.*,
       row_number() over (partition by user_id, status, seqnum - seqnum_s order by week) as streak
from (select t.*,
             row_number() over (partition by user_id order by week) as seqnum,
             row_number() over (partition by user_id, status order by week) as seqnum_s
      from t
     ) t;

解释它是如何工作的有点棘手。如果查看子查询的结果,您将看到行号的差异如何定义状态相同的每个组。