我已经获得了以下格式的pandas数据框:
datetime name mtd code
0 2017-09-07 00:00:08 profile/log GET 300
1 2017-09-07 00:00:17 profile/log PUT 300
3 2017-09-07 00:00:19 unknown PUT 200
4 2017-09-07 00:00:21 extras/dashboard GET 300
5 2017-09-07 00:00:23 extras/stats GET 300
6 2017-09-07 00:00:26 extras/dashboard GET 300
7 2017-09-07 00:00:29 extras/authz-profile/check GET 200
8 2017-09-07 00:00:34 about PUT 300
9 2017-09-07 00:00:36 extras/fav GET 304
2 2017-09-07 00:00:44 extras/store GET 200
我想做的是:
2017-09-07 00:00:10
到2017-09-07 00:00:40
的每5秒间隔内每个名称 - mtd对的响应代码以3 开头的出现次数 理想的输出是:
datetime_start pair 3??_count
2017-09-07 00:00:10 profile/log - GET 2
2017-09-07 00:00:15 - 0
2017-09-07 00:00:20 extras/dashboard - GET 1
2017-09-07 00:00:20 extras/stats - GET 1
2017-09-07 00:00:25 extras/dashboard - GET 1
2017-09-07 00:00:30 about - PUT 1
2017-09-07 00:00:35 extras/fav - GET 1
2017-09-07 00:00:40 - 0
我如何用 pandas 做到这一点?
我编写了一段代码来创建时间段,如desirable output
表中所示,但不知道如何计算3?每个5秒钟的名称 - mtd对。我非常感谢任何帮助!
data['datetime_start'] = pd.date_range(start="2017-09-07 00:00:10", end="2017-09-07 00:00:40", freq="5S")
答案 0 :(得分:1)
创建start_date列
df['start_date']= df[' datetime'].apply(lambda dt: datetime.datetime(dt.year, dt.month, dt.day, dt.hour,dt.minute ,5*(dt.second//5)))
然后你可以聚合
df.groupby(['start_date','name','mtd']).size()
答案 1 :(得分:0)
这是解决这个问题的一种方法
创建一个组合了name-mtd的列,如下所示
df['pair'] = df['name']+' - '+df['mtd']
然后使用PeriodIndex指定将列数据时间分组的时间段,如下所示
res = df.groupby([pd.PeriodIndex(df.datetime.dt.round('5s'),freq='5S'),
'pair'])['pair'].count()
输出
datetime pair
2017-09-07 00:00:10 profile/log - GET 1
2017-09-07 00:00:15 profile/log - PUT 1
2017-09-07 00:00:20 extras/dashboard - GET 1
unknown - PUT 1
2017-09-07 00:00:25 extras/dashboard - GET 1
extras/stats - GET 1
2017-09-07 00:00:30 extras/authz-profile/check - GET 1
2017-09-07 00:00:35 about - PUT 1
extras/fav - GET 1
2017-09-07 00:00:45 extras/store - GET 1
Name: pair, dtype: int64