熊猫-合并重采样值续

时间:2020-05-14 10:32:44

标签: python pandas

我希望每分钟对数据进行重新采样,以找出每分钟数据通过失败的次数。

当前数据如下所示:

timeStamp Results    
1589443200000 Pass       
1589443201000 Fail       
1589443202000 Pass       
1589443203000 Pass       
1589443204000 Pass       
1589443321000 Pass       
1589443325000 Fail       

之后的消息如下:

time       Result     Count      
8:01:00    Pass        4
8:01:00    Fail        1
8:02:00    Pass        1
8:02:00    Fail        1

1 个答案:

答案 0 :(得分:2)

首先将to_datetime的unix时间转换为日期时间,然后使用GroupBy.sizeGrouper将计数总计:

df['timeStamp'] = pd.to_datetime(df['timeStamp'], unit='ms')

df1 = (df.groupby([pd.Grouper(key='timeStamp', freq='Min'), 'Results'])
        .size()
        .reset_index(name='Count'))
print (df1)
            timeStamp Results  Count
0 2020-05-14 08:00:00    Fail      1
1 2020-05-14 08:00:00    Pass      4
2 2020-05-14 08:02:00    Fail      1
3 2020-05-14 08:02:00    Pass      1

或者如果要Series.dt.floor,并在Series.dt.time之前乘以时间:

df['timeStamp'] = pd.to_datetime(df['timeStamp'], unit='ms')

df1 = (df.groupby([df['timeStamp'].dt.floor('Min').dt.time, 'Results'])
        .size()
       .reset_index(name='Count'))
print (df1)
  timeStamp Results  Count
0  08:00:00    Fail      1
1  08:00:00    Pass      4
2  08:02:00    Fail      1
3  08:02:00    Pass      1