我想计算从数据集中停止服务器的时间长度。 我知道停机时间,但不知道停机时间。
我有这个df:
index a b c reboot stop
2018-06-25 12:49:00 NaN NaN NaN 0 1
2018-06-25 12:50:00 NaN NaN NaN 0 1
2018-06-25 12:51:00 NaN NaN NaN 1 1
2018-06-25 12:52:00 NaN NaN NaN 0 1
2018-06-25 12:53:00 NaN NaN NaN 0 1
2018-06-25 12:54:00 NaN NaN NaN 0 1
2018-06-25 12:55:00 NaN NaN NaN 0 1
2018-06-25 12:56:00 NaN NaN 1.2 0 0
2018-06-25 12:57:00 NaN NaN NaN 0 1
2018-06-25 12:58:00 NaN NaN NaN 1 1
2018-06-25 12:59:00 NaN NaN NaN 0 1
2018-06-25 13:00:00 NaN NaN NaN 0 1
2018-06-25 13:01:00 NaN NaN NaN 0 0
如果a, b, c = NaN
,
reboot, stop = 1
时我的服务器停止了
并从reboot, stop = 0
开始。
所需的输出:
index period
2018-06-25 12:51:00 5
2018-06-25 12:58:00 3
答案 0 :(得分:1)
这将完成您想要的:
# Create a new column which identifies stopped times
df['stopped'] = np.nan
idx_stopped = (pd.isnull(df.a)) & (pd.isnull(df.b)) & (pd.isnull(df.c)) & (df.reboot == 1) & (df.stop == 1)
df.loc[idx_stopped, 'stopped'] = 1
df.loc[(df.reboot == 0) & (df.stop == 0), 'stopped'] = 0
df.stopped = df.stopped.ffill()
df.stopped = df.stopped.fillna(0)
df.loc[df.stopped == 0, 'stopped'] = np.nan
# Count the number of periods for each stop instance
v = df.stopped[::-1]
cumsum = v.cumsum().fillna(method='pad')
reset = -cumsum[v.isnull()].diff().fillna(cumsum)
result = v.where(v.notnull(), reset).cumsum()
df['period'] = result[::-1]
# Identify the time each stop incident began
df['first'] = (df.stopped == 1) & (pd.isnull(df.stopped.shift(1)))
df2 = df[['index', 'period']][df['first']]
df2.period = df2.period.astype(int)
print(df2)
index period
2 2018-06-25 12:51:00 5
9 2018-06-25 12:58:00 3