如果值在相同的x秒内,我想对它们进行分组。 例如我是这样做的:
m_failed = df[(df["Signal"] == "Alarm") & (df["State"] == "Active")]
dd_failed = m_failed.groupby(['Country', 'Lane', 'Unit', 'Datetime']).size().to_frame('count').reset_index()
更新: 抱歉,但是我的问题很模糊,我什至忘了包括重要数据,所以我更新了问题并添加了日志的一部分。 我将城市更改为车道,因为它更符合真实数据。 (很抱歉)
Sign Descr State Country Lane Unit Datetime
Alarm Active USA Lane1 00003 2019-08-03 13:32:43
Alarm Active USA Lane1 00005 2019-08-03 13:32:43
Alarm Active USA Lane1 00006 2019-08-03 13:32:43
Alarm Active USA Lane1 00004 2019-08-03 13:32:43
Alarm Active USA Lane1 00002 2019-08-03 13:32:43
Alarm Active USA Lane1 00007 2019-08-03 13:32:43
Alarm Active Spain Lane1 00003 2019-08-03 07:47:54
Alarm Active Spain Lane1 00002 2019-08-03 07:47:54
Alarm Active Spain Lane1 00005 2019-08-03 07:47:54
Alarm Active Spain Lane1 00007 2019-08-03 07:47:54
Alarm Active Spain Lane1 00004 2019-08-03 07:47:53
Alarm Active Spain Lane1 00006 2019-08-03 07:47:53
Alarm Active Spain Lane1 00004 2019-08-03 07:26:16
Alarm Active Spain Lane1 00003 2019-08-03 07:26:16
Alarm Active Italy Lane2 00002 2019-08-03 12:09:34
Alarm Active Italy Lane2 00004 2019-08-03 09:50:32
Alarm Active Italy Lane2 00006 2019-08-03 09:50:32
Alarm Active Italy Lane2 00002 2019-08-03 09:50:32
Alarm Active Italy Lane1 00007 2019-08-03 07:58:43
Alarm Active Italy Lane2 00002 2019-08-03 07:58:01
Alarm Active Germany Lane1 00007 2019-08-03 12:36:48
Alarm Active Germany Lane1 00007 2019-08-03 12:31:19
Alarm Active Sweden Lane1 00007 2019-08-03 12:27:33
Alarm Active Norway Lane1 00007 2019-08-03 12:35:21
Alarm Active Norway Lane1 00005 2019-08-03 12:35:21
Alarm Active Norway Lane1 00002 2019-08-03 12:35:21
Alarm Active Norway Lane1 00007 2019-08-03 12:28:50
Alarm Active Norway Lane2 00007 2019-08-03 12:27:31
Alarm Active Norway Lane2 00003 2019-08-03 12:27:31
Alarm Active Norway Lane2 00006 2019-08-03 12:27:31
Alarm Active Norway Lane2 00005 2019-08-03 09:24:53
Alarm Active Denmark Lane2 00003 2019-08-03 09:46:23
Alarm Active UK Lane2 00003 2019-08-03 09:56:08
Alarm Active UK Lane2 00004 2019-08-03 09:56:08
Alarm Active Brazil Lane2 00002 2019-08-03 09:47:19
Alarm Active Brazil Lane2 00003 2019-08-03 09:47:19
我希望结果是这样的:
Sign Descr State Country Lane Unit Datetime Count
Alarm Active USA Lane1 2019-08-03 13:32:43 1
Alarm Active Spain Lane1 2019-08-03 07:47:54 1
Alarm Active Spain Lane1 00004 2019-08-03 07:26:16 1
Alarm Active Spain Lane1 00003 2019-08-03 07:26:16 1
Alarm Active Italy Lane2 00002 2019-08-03 12:09:34 3
Alarm Active Italy Lane2 00004 2019-08-03 09:50:32 1
Alarm Active Italy Lane2 00006 2019-08-03 09:50:32 1
Alarm Active Italy Lane1 00007 2019-08-03 07:58:43 1
Alarm Active Germany Lane1 00007 2019-08-03 12:36:48 2
Alarm Active Sweden Lane1 00007 2019-08-03 12:27:33 1
Alarm Active Norway Lane1 00007 2019-08-03 12:35:21 1
Alarm Active Norway Lane1 00005 2019-08-03 12:35:21 1
Alarm Active Norway Lane1 00002 2019-08-03 12:35:21 1
Alarm Active Norway Lane2 00007 2019-08-03 12:27:31 2
Alarm Active Norway Lane2 00003 2019-08-03 12:27:31 1
Alarm Active Norway Lane2 00006 2019-08-03 12:27:31 1
Alarm Active Norway Lane2 00005 2019-08-03 09:24:53 1
Alarm Active Denmark Lane2 00003 2019-08-03 09:46:23 1
Alarm Active UK Lane2 00003 2019-08-03 09:56:08 1
Alarm Active UK Lane2 00004 2019-08-03 09:56:08 1
Alarm Active Brazil Lane2 00002 2019-08-03 09:47:19 1
Alarm Active Brazil Lane2 00003 2019-08-03 09:47:19 1
单位可以是00002到00007 车道可以是1车道或2车道,而“国家/地区”可以是-anything- 创建的日志从00:00-> 23:59
如果国家和通道相同,并且如果所有单元在相同的1-2分钟内出现故障,则将它们分组并计数为1,因为这是失败的通道。 如果同一条通道在一天中多次失败,则计算整个通道的失败次数。
如果不是所有单位都失败了,则显示该单位并计算该单位在一天中失败的次数。
答案 0 :(得分:2)
使用required = false
和.orElseGet(...new)
和pd.Grouper
作为Country
键。我选择City
作为频率,但是根据需要更改它。
groupby
60S
答案 1 :(得分:0)
user3483203's answer有效,即9:00:01
和9:00:59
处的故障属于同一组,但10:00:00
不是同一组。
如果您的定义是“在上一个失败后60秒钟之内”,请使用其他方法:
def summarize(x):
s = (x['Datetime'].diff() / pd.Timedelta(seconds=1)).gt(60).cumsum()
result = x.groupby(s).agg({
'Unit': 'first',
'Datetime': ['first', 'count'],
})
result.columns = ['Unit', 'Datetime', 'count']
return result
df = df.sort_values(['Country', 'City', 'Datetime'])
df.groupby(['Country', 'City']).apply(summarize).droplevel(-1)
summarize
的作用:
Country - City
元组),计算上一次失败的时间差(以秒为单位)