我希望每分钟对数据进行重新采样,以找出每分钟数据通过失败的次数。
当前数据如下所示:
timeStamp Results
1589443200000 Pass
1589443201000 Fail
1589443202000 Pass
1589443203000 Pass
1589443204000 Pass
1589443321000 Pass
1589443325000 Fail
之后的消息如下:
time Result Count
8:01:00 Pass 4
8:01:00 Fail 1
8:02:00 Pass 1
8:02:00 Fail 1
答案 0 :(得分:2)
首先将to_datetime
的unix时间转换为日期时间,然后使用GroupBy.size
的Grouper
将计数总计:
df['timeStamp'] = pd.to_datetime(df['timeStamp'], unit='ms')
df1 = (df.groupby([pd.Grouper(key='timeStamp', freq='Min'), 'Results'])
.size()
.reset_index(name='Count'))
print (df1)
timeStamp Results Count
0 2020-05-14 08:00:00 Fail 1
1 2020-05-14 08:00:00 Pass 4
2 2020-05-14 08:02:00 Fail 1
3 2020-05-14 08:02:00 Pass 1
或者如果要Series.dt.floor
,并在Series.dt.time
之前乘以时间:
df['timeStamp'] = pd.to_datetime(df['timeStamp'], unit='ms')
df1 = (df.groupby([df['timeStamp'].dt.floor('Min').dt.time, 'Results'])
.size()
.reset_index(name='Count'))
print (df1)
timeStamp Results Count
0 08:00:00 Fail 1
1 08:00:00 Pass 4
2 08:02:00 Fail 1
3 08:02:00 Pass 1