可能相关:pandas dataframe group year index by decade
例如,如果我有以下数据
status bytes_sent upstream_cache_status \
timestamp
2014-05-26 23:56:30 200 356 MISS
2014-05-26 23:56:30 200 10517 -
2014-05-26 23:57:05 200 6923 MISS
2014-05-26 23:57:14 200 323 -
2014-05-26 23:57:30 200 356 MISS
2014-05-26 23:57:38 200 8107 HIT
2014-05-26 23:57:43 200 369 MISS
2014-05-26 23:57:56 304 401 HIT
2014-05-26 23:57:56 304 401 HIT
2014-05-26 23:57:56 304 387 MISS
2014-05-26 23:57:57 304 401 HIT
2014-05-26 23:57:58 304 401 HIT
2014-05-26 23:58:08 200 507 EXPIRED
2014-05-26 23:58:29 304 338 HIT
2014-05-26 23:58:31 400 409 -
2014-05-26 23:58:45 200 425 MISS
如果我想将它们分组,使得每个组在30秒内包含日志(时间是用户指定的),我该怎么做?我见过这个
df.groupby(lambda x: x.hour)
但我非常怀疑它在我的案例中是否相关
答案 0 :(得分:1)
df.groupby(pd.Grouper(freq='30S', level=0))
应该这样做;例如
>>> aggr = lambda df: df.apply(tuple)
>>> df.groupby(pd.Grouper(freq='30S', level=0)).aggregate(aggr)
status bytes_sent \
timestamp
2014-06-26 23:56:30 (200, 200) (356, 10517)
2014-06-26 23:57:00 (200, 200) (6923, 323)
2014-06-26 23:57:30 (200, 200, 200, 304, 304, 304, 304, 304) (356, 8107, 369, 401, 401, 387, 401, 401)
2014-06-26 23:58:00 (200, 304) (507, 338)
2014-06-26 23:58:30 (400, 200) (409, 425)
upstream_cache_status
timestamp
2014-06-26 23:56:30 (MISS, -)
2014-06-26 23:57:00 (MISS, -)
2014-06-26 23:57:30 (MISS, HIT, MISS, HIT, HIT, MISS, HIT, HIT)
2014-06-26 23:58:00 (EXPIRED, HIT)
2014-06-26 23:58:30 (-, MISS)