计算30天内的一系列事件

时间:2019-03-02 06:41:42

标签: python pandas

我正在使用Python和Pandas。我有一个如下所示的数据框:

codename    date         
AAA         13-03-2015   
AAB         20-02-2015   
AAB         15-04-2015  
AAB         20-04-2015  
AAB         21-04-2015  
AAB         21-05-2015  

我正在寻求有关在30天内计算一系列事件的帮助。我试图在下面的表格中说明我希望实现的目标:

codename    date         daysBetween  series
AAA         13-03-2015   NaN          1
AAB         20-02-2015   NaN          1    
AAB         15-04-2015   54           1
AAB         20-04-2015   5            0
AAB         21-04-2015   6            0 
AAB         21-05-2015   36           1

如果从单元格1(20-02-2015)到单元格(15-04-2015)已​​经超过30天,则计算之间的天数(54天),然后将结果放入daysBetween并放入1插入series

如果两个单元格之间的间隔不超过30天,请计算天数并连续输入0。

日期应与序列为1的最后日期进行比较。


我设法按代号和日期排序:

import pandas as pd

file = pd.read_excel('sample.xlsx')

sortedData = file.sort_values(by=['codename', 'date'])

1 个答案:

答案 0 :(得分:0)

我认为您需要将True/False映射到1/0的映射,并按Series.gt比较值并按astype转换为整数:

#convert column to datetimes
df['date'] = pd.to_datetime(df['date'], format='%d-%m-%Y')
#sorting
df = df.sort_values(by=['codename', 'date'])
#get difference between first value of group
df['daysBetween'] = df['date'].sub(df.groupby('codename')['date'].transform('first')).dt.days
#compare by gt (>) and cast to int
df['series'] = df['daysBetween'].gt(30).astype(int)
print (df)
  codename       date  daysBetween  series
0      AAA 2015-03-13            0       0
1      AAB 2015-02-20            0       0
2      AAB 2015-04-15           54       1
3      AAB 2015-04-20           59       1
4      AAB 2015-04-21           60       1
5      AAB 2015-05-21           90       1

如果需要两个值之间的差异:

df['daysBetween'] = df.groupby('codename')['date'].diff().dt.days
df['series'] = df['daysBetween'].gt(30).astype(int)
print (df)
  codename       date  daysBetween  series
0      AAA 2015-03-13          NaN       0
1      AAB 2015-02-20          NaN       0
2      AAB 2015-04-15         54.0       1
3      AAB 2015-04-20          5.0       0
4      AAB 2015-04-21          1.0       0
5      AAB 2015-05-21         30.0       0