我有一个由3列组成的pandas Dataframe:
no id timestamp
0 4 ab729f70-f3f3-4c57-94e5-e8408b2b0a80 2017-09-09 12:51:56.642810
1 3 ab729f70-f3f3-4c57-94e5-e8408b2b0a80 2017-09-09 12:35:57.412720
2 2 ab729f70-f3f3-4c57-94e5-e8408b2b0a80 2017-09-09 12:35:56.559890
3 1 ab729f70-f3f3-4c57-94e5-e8408b2b0a80 2017-09-09 12:35:54.616122
我们的想法是根据列最后 1分钟,5分钟,15分钟,180分钟,1天,10天和25天计算记录(数据集中的行数)时间戳即可。这应该非常简单,但我没有设法解决它。例如,我使用了TimeGrouper选项,但是这让我在指定的时间范围内出现(假设1分钟),但表示所有记录:
df.groupby(pd.TimeGrouper(key='timestamp',freq='1Min')).count()
输出:
no id
timestamp
2017-09-09 12:35:00 3 3
2017-09-09 12:36:00 0 0
2017-09-09 12:37:00 0 0
2017-09-09 12:38:00 0 0
2017-09-09 12:39:00 0 0
2017-09-09 12:40:00 0 0
2017-09-09 12:41:00 0 0
2017-09-09 12:42:00 0 0
2017-09-09 12:43:00 0 0
2017-09-09 12:44:00 0 0
2017-09-09 12:45:00 0 0
2017-09-09 12:46:00 0 0
2017-09-09 12:47:00 0 0
2017-09-09 12:48:00 0 0
2017-09-09 12:49:00 0 0
2017-09-09 12:50:00 0 0
2017-09-09 12:51:00 1 1
答案 0 :(得分:1)
使用DateOffset获取上一个日期时间,然后按between
获取boolen mask,True
获取最后一次sum
:
now = pd.datetime.now()
print (now)
2017-09-09 17:10:29.265217
print (now - pd.offsets.DateOffset(minutes=180))
2017-09-09 14:10:29.265217
a = df['timestamp'].between(now - pd.offsets.DateOffset(minutes=180), now).sum()
print (a)
0
b = df['timestamp'].between(now - pd.offsets.DateOffset(days=1), now).sum()
print (b)
4
如果需要自定义日期时间:
date = pd.to_datetime('2017-09-09 12:45:00')
print (date)
2017-09-09 12:45:00
c = df['timestamp'].between(date - pd.offsets.DateOffset(minutes=15), date).sum()
print (c)
3