使用Pandas在Python中按日期计算序列

时间:2017-04-04 09:34:26

标签: python pandas

我有一个数据框:

Data_c       User  Rank  sequence_in_progress

 15-03-2017   2     0         0
 15-03-2017   1     1         0
 16-03-2017   2     0         0
 17-03-2017   2     1         0
 18-03-2017   1     0         0

现在我将替​​换数据框中的“sequence_in_progress”,探索它,考虑加入日期和已加入的用户的序列。

基本上,结果应该是:

  Data_c     User  Rank  sequence_in_progress

 15-03-2017   2     0         1
 15-03-2017   1     1         1
 16-03-2017   2     0         2
 17-03-2017   2     1         3
 18-03-2017   1     0         2

基本上,“sequence_in_progress”表示用户“x”在给定日期选择某事的顺序。

提前感谢您的帮助

1 个答案:

答案 0 :(得分:1)

我会使用pandas groupby。 请注意,此解决方案适用于任意数量的用户。

cc = ['Data_c', 'User', 'Rank']
vals = [['15-03-2017',   2,     0],
         ['15-03-2017',   1,     1],
         ['16-03-2017',   2,     0],
         ['17-03-2017',   2,     1],
         ['18-03-2017',   1,     0]]

frame = pd.DataFrame(vals, columns = cc)

# Crete the sequence (1,...,N) for each user
users_sequence = [group.assign(sequence = range(1, len(group)+1))
                        for key, group  in frame.groupby('User')]

# Put everything together, using reindex to have same order as the original frame 
result = pd.concat(users_sequence, axis = 0).reindex(frame.index) 

       Data_c  User  Rank  sequence
0  15-03-2017     2     0         1
1  15-03-2017     1     1         1
2  16-03-2017     2     0         2
3  17-03-2017     2     1         3
4  18-03-2017     1     0         2