熊猫,计算时间差是否在x秒以内

时间:2019-08-06 14:03:39

标签: python-3.x pandas

如果值在相同的x秒内,我想对它们进行分组。 例如我是这样做的:

m_failed = df[(df["Signal"] == "Alarm") & (df["State"] == "Active")]
dd_failed = m_failed.groupby(['Country', 'Lane', 'Unit', 'Datetime']).size().to_frame('count').reset_index()

更新: 抱歉,但是我的问题很模糊,我什至忘了包括重要数据,所以我更新了问题并添加了日志的一部分。 我将城市更改为车道,因为它更符合真实数据。 (很抱歉)

Sign Descr  State   Country Lane    Unit    Datetime
Alarm   Active  USA Lane1   00003   2019-08-03 13:32:43
Alarm   Active  USA Lane1   00005   2019-08-03 13:32:43
Alarm   Active  USA Lane1   00006   2019-08-03 13:32:43
Alarm   Active  USA Lane1   00004   2019-08-03 13:32:43
Alarm   Active  USA Lane1   00002   2019-08-03 13:32:43
Alarm   Active  USA Lane1   00007   2019-08-03 13:32:43
Alarm   Active  Spain   Lane1   00003   2019-08-03 07:47:54
Alarm   Active  Spain   Lane1   00002   2019-08-03 07:47:54
Alarm   Active  Spain   Lane1   00005   2019-08-03 07:47:54
Alarm   Active  Spain   Lane1   00007   2019-08-03 07:47:54
Alarm   Active  Spain   Lane1   00004   2019-08-03 07:47:53
Alarm   Active  Spain   Lane1   00006   2019-08-03 07:47:53
Alarm   Active  Spain   Lane1   00004   2019-08-03 07:26:16
Alarm   Active  Spain   Lane1   00003   2019-08-03 07:26:16
Alarm   Active  Italy   Lane2   00002   2019-08-03 12:09:34
Alarm   Active  Italy   Lane2   00004   2019-08-03 09:50:32
Alarm   Active  Italy   Lane2   00006   2019-08-03 09:50:32
Alarm   Active  Italy   Lane2   00002   2019-08-03 09:50:32
Alarm   Active  Italy   Lane1   00007   2019-08-03 07:58:43
Alarm   Active  Italy   Lane2   00002   2019-08-03 07:58:01
Alarm   Active  Germany Lane1   00007   2019-08-03 12:36:48
Alarm   Active  Germany Lane1   00007   2019-08-03 12:31:19
Alarm   Active  Sweden  Lane1   00007   2019-08-03 12:27:33
Alarm   Active  Norway  Lane1   00007   2019-08-03 12:35:21
Alarm   Active  Norway  Lane1   00005   2019-08-03 12:35:21
Alarm   Active  Norway  Lane1   00002   2019-08-03 12:35:21
Alarm   Active  Norway  Lane1   00007   2019-08-03 12:28:50
Alarm   Active  Norway  Lane2   00007   2019-08-03 12:27:31
Alarm   Active  Norway  Lane2   00003   2019-08-03 12:27:31
Alarm   Active  Norway  Lane2   00006   2019-08-03 12:27:31
Alarm   Active  Norway  Lane2   00005   2019-08-03 09:24:53
Alarm   Active  Denmark Lane2   00003   2019-08-03 09:46:23
Alarm   Active  UK  Lane2   00003   2019-08-03 09:56:08
Alarm   Active  UK  Lane2   00004   2019-08-03 09:56:08
Alarm   Active  Brazil  Lane2   00002   2019-08-03 09:47:19
Alarm   Active  Brazil  Lane2   00003   2019-08-03 09:47:19

我希望结果是这样的:

Sign Descr  State   Country Lane    Unit    Datetime    Count
Alarm   Active  USA Lane1       2019-08-03 13:32:43 1
Alarm   Active  Spain   Lane1       2019-08-03 07:47:54 1
Alarm   Active  Spain   Lane1   00004   2019-08-03 07:26:16 1
Alarm   Active  Spain   Lane1   00003   2019-08-03 07:26:16 1
Alarm   Active  Italy   Lane2   00002   2019-08-03 12:09:34 3
Alarm   Active  Italy   Lane2   00004   2019-08-03 09:50:32 1
Alarm   Active  Italy   Lane2   00006   2019-08-03 09:50:32 1
Alarm   Active  Italy   Lane1   00007   2019-08-03 07:58:43 1
Alarm   Active  Germany Lane1   00007   2019-08-03 12:36:48 2
Alarm   Active  Sweden  Lane1   00007   2019-08-03 12:27:33 1
Alarm   Active  Norway  Lane1   00007   2019-08-03 12:35:21 1
Alarm   Active  Norway  Lane1   00005   2019-08-03 12:35:21 1
Alarm   Active  Norway  Lane1   00002   2019-08-03 12:35:21 1
Alarm   Active  Norway  Lane2   00007   2019-08-03 12:27:31 2
Alarm   Active  Norway  Lane2   00003   2019-08-03 12:27:31 1
Alarm   Active  Norway  Lane2   00006   2019-08-03 12:27:31 1
Alarm   Active  Norway  Lane2   00005   2019-08-03 09:24:53 1
Alarm   Active  Denmark Lane2   00003   2019-08-03 09:46:23 1
Alarm   Active  UK  Lane2   00003   2019-08-03 09:56:08 1
Alarm   Active  UK  Lane2   00004   2019-08-03 09:56:08 1
Alarm   Active  Brazil  Lane2   00002   2019-08-03 09:47:19 1
Alarm   Active  Brazil  Lane2   00003   2019-08-03 09:47:19 1

单位可以是00002到00007 车道可以是1车道或2车道,而“国家/地区”可以是-anything- 创建的日志从00:00-> 23:59

如果国家和通道相同,并且如果所有单元在相同的1-2分钟内出现故障,则将它们分组并计数为1,因为这是失败的通道。 如果同一条通道在一天中多次失败,则计算整个通道的失败次数。

如果不是所有单位都失败了,则显示该单位并计算该单位在一天中失败的次数。

??在堆栈溢出中添加表的最佳方法是什么?

2 个答案:

答案 0 :(得分:2)

使用required = false.orElseGet(...new)pd.Grouper作为Country键。我选择City作为频率,但是根据需要更改它。


groupby

60S

答案 1 :(得分:0)

如果您认为某个组表示“同一分钟内发生故障”,即

user3483203's answer有效,即9:00:019:00:59处的故障属于同一组,但10:00:00不是同一组。

如果您的定义是“在上一个失败后60秒钟之内”,请使用其他方法:

def summarize(x):
    s = (x['Datetime'].diff() / pd.Timedelta(seconds=1)).gt(60).cumsum()
    result = x.groupby(s).agg({
        'Unit': 'first',
        'Datetime': ['first', 'count'],
    })
    result.columns = ['Unit', 'Datetime', 'count']

    return result

df = df.sort_values(['Country', 'City', 'Datetime'])
df.groupby(['Country', 'City']).apply(summarize).droplevel(-1)

summarize的作用:

  • 对于每个组(唯一的Country - City元组),计算上一次失败的时间差(以秒为单位)
  • 每当差异大于60秒时,将累计总和增加1
  • 计算每个组中有多少个故障以及该组何时开始