Question

如果值在相同的x秒内，我想对它们进行分组。例如我是这样做的：

m_failed = df[(df["Signal"] == "Alarm") & (df["State"] == "Active")]
dd_failed = m_failed.groupby(['Country', 'Lane', 'Unit', 'Datetime']).size().to_frame('count').reset_index()

更新：抱歉，但是我的问题很模糊，我什至忘了包括重要数据，所以我更新了问题并添加了日志的一部分。我将城市更改为车道，因为它更符合真实数据。（很抱歉）

Sign Descr  State   Country Lane    Unit    Datetime
Alarm   Active  USA Lane1   00003   2019-08-03 13:32:43
Alarm   Active  USA Lane1   00005   2019-08-03 13:32:43
Alarm   Active  USA Lane1   00006   2019-08-03 13:32:43
Alarm   Active  USA Lane1   00004   2019-08-03 13:32:43
Alarm   Active  USA Lane1   00002   2019-08-03 13:32:43
Alarm   Active  USA Lane1   00007   2019-08-03 13:32:43
Alarm   Active  Spain   Lane1   00003   2019-08-03 07:47:54
Alarm   Active  Spain   Lane1   00002   2019-08-03 07:47:54
Alarm   Active  Spain   Lane1   00005   2019-08-03 07:47:54
Alarm   Active  Spain   Lane1   00007   2019-08-03 07:47:54
Alarm   Active  Spain   Lane1   00004   2019-08-03 07:47:53
Alarm   Active  Spain   Lane1   00006   2019-08-03 07:47:53
Alarm   Active  Spain   Lane1   00004   2019-08-03 07:26:16
Alarm   Active  Spain   Lane1   00003   2019-08-03 07:26:16
Alarm   Active  Italy   Lane2   00002   2019-08-03 12:09:34
Alarm   Active  Italy   Lane2   00004   2019-08-03 09:50:32
Alarm   Active  Italy   Lane2   00006   2019-08-03 09:50:32
Alarm   Active  Italy   Lane2   00002   2019-08-03 09:50:32
Alarm   Active  Italy   Lane1   00007   2019-08-03 07:58:43
Alarm   Active  Italy   Lane2   00002   2019-08-03 07:58:01
Alarm   Active  Germany Lane1   00007   2019-08-03 12:36:48
Alarm   Active  Germany Lane1   00007   2019-08-03 12:31:19
Alarm   Active  Sweden  Lane1   00007   2019-08-03 12:27:33
Alarm   Active  Norway  Lane1   00007   2019-08-03 12:35:21
Alarm   Active  Norway  Lane1   00005   2019-08-03 12:35:21
Alarm   Active  Norway  Lane1   00002   2019-08-03 12:35:21
Alarm   Active  Norway  Lane1   00007   2019-08-03 12:28:50
Alarm   Active  Norway  Lane2   00007   2019-08-03 12:27:31
Alarm   Active  Norway  Lane2   00003   2019-08-03 12:27:31
Alarm   Active  Norway  Lane2   00006   2019-08-03 12:27:31
Alarm   Active  Norway  Lane2   00005   2019-08-03 09:24:53
Alarm   Active  Denmark Lane2   00003   2019-08-03 09:46:23
Alarm   Active  UK  Lane2   00003   2019-08-03 09:56:08
Alarm   Active  UK  Lane2   00004   2019-08-03 09:56:08
Alarm   Active  Brazil  Lane2   00002   2019-08-03 09:47:19
Alarm   Active  Brazil  Lane2   00003   2019-08-03 09:47:19

我希望结果是这样的：

Sign Descr  State   Country Lane    Unit    Datetime    Count
Alarm   Active  USA Lane1       2019-08-03 13:32:43 1
Alarm   Active  Spain   Lane1       2019-08-03 07:47:54 1
Alarm   Active  Spain   Lane1   00004   2019-08-03 07:26:16 1
Alarm   Active  Spain   Lane1   00003   2019-08-03 07:26:16 1
Alarm   Active  Italy   Lane2   00002   2019-08-03 12:09:34 3
Alarm   Active  Italy   Lane2   00004   2019-08-03 09:50:32 1
Alarm   Active  Italy   Lane2   00006   2019-08-03 09:50:32 1
Alarm   Active  Italy   Lane1   00007   2019-08-03 07:58:43 1
Alarm   Active  Germany Lane1   00007   2019-08-03 12:36:48 2
Alarm   Active  Sweden  Lane1   00007   2019-08-03 12:27:33 1
Alarm   Active  Norway  Lane1   00007   2019-08-03 12:35:21 1
Alarm   Active  Norway  Lane1   00005   2019-08-03 12:35:21 1
Alarm   Active  Norway  Lane1   00002   2019-08-03 12:35:21 1
Alarm   Active  Norway  Lane2   00007   2019-08-03 12:27:31 2
Alarm   Active  Norway  Lane2   00003   2019-08-03 12:27:31 1
Alarm   Active  Norway  Lane2   00006   2019-08-03 12:27:31 1
Alarm   Active  Norway  Lane2   00005   2019-08-03 09:24:53 1
Alarm   Active  Denmark Lane2   00003   2019-08-03 09:46:23 1
Alarm   Active  UK  Lane2   00003   2019-08-03 09:56:08 1
Alarm   Active  UK  Lane2   00004   2019-08-03 09:56:08 1
Alarm   Active  Brazil  Lane2   00002   2019-08-03 09:47:19 1
Alarm   Active  Brazil  Lane2   00003   2019-08-03 09:47:19 1

单位可以是00002到00007 车道可以是1车道或2车道，而“国家/地区”可以是-anything- 创建的日志从00:00-> 23:59

如果国家和通道相同，并且如果所有单元在相同的1-2分钟内出现故障，则将它们分组并计数为1，因为这是失败的通道。如果同一条通道在一天中多次失败，则计算整个通道的失败次数。

如果不是所有单位都失败了，则显示该单位并计算该单位在一天中失败的次数。

??在堆栈溢出中添加表的最佳方法是什么？

Answer 1

使用required = false和.orElseGet(...new)和pd.Grouper作为Country键。我选择City作为频率，但是根据需要更改它。

groupby

60S

Answer 2

如果您认为某个组表示“同一分钟内发生故障”，即

user3483203's answer有效，即9:00:01和9:00:59处的故障属于同一组，但10:00:00不是同一组。

如果您的定义是“在上一个失败后60秒钟之内”，请使用其他方法：

def summarize(x):
    s = (x['Datetime'].diff() / pd.Timedelta(seconds=1)).gt(60).cumsum()
    result = x.groupby(s).agg({
        'Unit': 'first',
        'Datetime': ['first', 'count'],
    })
    result.columns = ['Unit', 'Datetime', 'count']

    return result

df = df.sort_values(['Country', 'City', 'Datetime'])
df.groupby(['Country', 'City']).apply(summarize).droplevel(-1)

summarize的作用：

对于每个组（唯一的Country - City元组），计算上一次失败的时间差（以秒为单位）
每当差异大于60秒时，将累计总和增加1
计算每个组中有多少个故障以及该组何时开始

熊猫，计算时间差是否在x秒以内

??在堆栈溢出中添加表的最佳方法是什么？

2 个答案: