因此,我希望能够为用户跟踪过去几周的连续登录次数。我已经尝试过row_number()结束(按周按州顺序划分),但是当状态更改时,row_numbers不会重置。这是一个示例表。
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\multiprocessing\pool.py", line 44, in mapstar
return list(map(*args))
File "C:\Users\user\Desktop\parallel_FTP.py", line 20, in ftp_upload
ftp.storlines('STOR %s' % remote_path+filename, f)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\ftplib.py", line 529, in storlines
self.voidcmd('TYPE A')
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\ftplib.py", line 277, in voidcmd
self.putcmd(cmd)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\ftplib.py", line 199, in putcmd
self.putline(line)
File "C:\Users\user\AppData\Local\Programs\Python\Python37-32\lib\ftplib.py", line 194, in putline
self.sock.sendall(line.encode(self.encoding))
AttributeError: 'NoneType' object has no attribute 'sendall'
"""
我希望输出看起来像这样:
user_id | week | state
--------+--------------+-------
1 | 2018-01-01 | Active
1 | 2018-01-08 | Inactive
1 | 2018-01-15 | Inactive
1 | 2018-01-22 | Active
1 | 2018-01-29 | Active
2 | 2018-01-01 | Inactive
2 | 2018-01-08 | Active
2 | 2018-01-15 | Inactive
2 | 2018-01-22 | Active
2 | 2018-01-29 | Active
这是我当前的查询:
user_id | week | state | streak
--------+--------------+----------+---------
1000 | 2018-01-01 | Active | 1
1000 | 2018-01-08 | Inactive | 1
1000 | 2018-01-15 | Inactive | 2
1000 | 2018-01-22 | Active | 1
1000 | 2018-01-29 | Active | 2
2000 | 2018-01-01 | Inactive | 1
2000 | 2018-01-08 | Active | 1
2000 | 2018-01-15 | Inactive | 1
2000 | 2018-01-22 | Active | 1
2000 | 2018-01-29 | Active | 2
我的输出当前看起来像这样:
SELECT
week,
user_id,
state,
row_number()
OVER(PARTITION BY user_id, state
order by user_id, week) AS streak
FROM
t.data_table
GROUP BY 1,2,3
order by week;
这里的任何建议都会有所帮助。
答案 0 :(得分:0)
这是一个孤岛问题。该策略是定义状态相似的行组,然后使用row_number()
对其进行枚举。
一种方法使用不同的行号:
select t.*,
row_number() over (partition by user_id, status, seqnum - seqnum_s order by week) as streak
from (select t.*,
row_number() over (partition by user_id order by week) as seqnum,
row_number() over (partition by user_id, status order by week) as seqnum_s
from t
) t;
解释它是如何工作的有点棘手。如果查看子查询的结果,您将看到行号的差异如何定义状态相同的每个组。