计算pandas中不同时间范围内的记录

时间:2017-09-09 14:59:52

标签: python pandas group-by timestamp

我有一个由3列组成的pandas Dataframe:

   no                                 id                  timestamp 
0   4  ab729f70-f3f3-4c57-94e5-e8408b2b0a80 2017-09-09 12:51:56.642810  
1   3  ab729f70-f3f3-4c57-94e5-e8408b2b0a80 2017-09-09 12:35:57.412720 
2   2  ab729f70-f3f3-4c57-94e5-e8408b2b0a80 2017-09-09 12:35:56.559890 
3   1  ab729f70-f3f3-4c57-94e5-e8408b2b0a80 2017-09-09 12:35:54.616122 

我们的想法是根据最后 1分钟,5分钟,15分钟,180分钟,1天,10天和25天计算记录(数据集中的行数)时间戳即可。这应该非常简单,但我没有设法解决它。例如,我使用了TimeGrouper选项,但是这让我在指定的时间范围内出现(假设1分钟),但表示所有记录

df.groupby(pd.TimeGrouper(key='timestamp',freq='1Min')).count() 

输出:

                      no    id 
timestamp                            
2017-09-09 12:35:00   3      3 
2017-09-09 12:36:00   0      0 
2017-09-09 12:37:00   0      0 
2017-09-09 12:38:00   0      0
2017-09-09 12:39:00   0      0 
2017-09-09 12:40:00   0      0
2017-09-09 12:41:00   0      0
2017-09-09 12:42:00   0      0 
2017-09-09 12:43:00   0      0
2017-09-09 12:44:00   0      0 
2017-09-09 12:45:00   0      0 
2017-09-09 12:46:00   0      0 
2017-09-09 12:47:00   0      0 
2017-09-09 12:48:00   0      0 
2017-09-09 12:49:00   0      0 
2017-09-09 12:50:00   0      0 
2017-09-09 12:51:00   1      1 

1 个答案:

答案 0 :(得分:1)

使用DateOffset获取上一个日期时间,然后按between获取boolen mask,True获取最后一次sum

now = pd.datetime.now() 
print (now)
2017-09-09 17:10:29.265217

print (now - pd.offsets.DateOffset(minutes=180))
2017-09-09 14:10:29.265217

a = df['timestamp'].between(now - pd.offsets.DateOffset(minutes=180), now).sum()
print (a)
0
b = df['timestamp'].between(now - pd.offsets.DateOffset(days=1), now).sum()
print (b)
4

如果需要自定义日期时间:

date = pd.to_datetime('2017-09-09 12:45:00')
print (date)
2017-09-09 12:45:00

c = df['timestamp'].between(date - pd.offsets.DateOffset(minutes=15), date).sum()
print (c)
3