我有一个时间序列表示对系统功能的常规查询,其中1 = working
和0 = not working
。例如,将时间序列表示为列表
U = [0,0,1,1,1,1,1,1,0,0,0,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0]
我有兴趣计算平均故障时间(系统保持多长时间)和平均修复时间(系统停留时间长短)和其他类似统计数据,所以我想要做的就是计算顺序1
条目和顺序0
条目。我想要修剪开始和结束集,因为,对于上面的例子,我不知道系统最初何时失效,以及何时它将在未来恢复。所以我想在这种情况下生成的输出是
uptime = [6, 4, 9, 2] # 6 ones followed by zeros, then 4 ones followed by zeros, etc.
downtime = [3, 3, 2] # like uptime but ignoring zeros at indices [0,1] and [-1]
我写了一个脚本来做到这一点,但它看起来有点尴尬,我想知道是否有更好,更pythonic的方式来做到这一点。这就是我所拥有的。
def count_times(U, down=False):
if down:
U = [1 - u for u in U]
T = []
# Skip the first entry as you don't know when it started
m = U.index(0)
m += U[m:].index(1)
while m < len(U):
try:
T.append(U[m:].index(0))
m += U[m:].index(0)
m += U[m:].index(1)
except ValueError:
# skip the last entry as you don't know when it will end
return T
得到以下特性:
print count_times(U)
# [6, 4, 9, 2]
print count_times(U, down = True)
# [3, 3, 2]
这样可行,但我不禁想知道是否有更清洁的方法可以做到这一点?
答案 0 :(得分:2)
我的方法类似于Ruben,但它最初在应用groupby
后将上下时间保持在同一列表中,因此更容易修剪开始和结束集。
import itertools
U = [0,0,1,1,1,1,1,1,0,0,0,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0]
run_lengths = [(value, len(list(group))) for value, group in itertools.groupby(U)]
#discard first and last runs
run_lengths = run_lengths[1:-1]
#split runs into separate up and down time lists
uptime = [length for value, length in run_lengths if value == 1]
downtime = [length for value, length in run_lengths if value == 0]
print uptime
print downtime
结果:
[6, 4, 9, 2]
[3, 3, 2]
答案 1 :(得分:1)
您可以使用groupby
模块中的itertools
:
from itertools import groupby
testvalue = [0,0,1,1,1,1,1,1,0,0,0,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0]
def count_times(U, down=False):
if down:
return [len(list(group)) for key, group in groupby(U) if key == 0]
else:
return [len(list(group)) for key, group in groupby(U) if key == 1]
print count_times(testvalues, True) # [2, 3, 3, 2, 1]
print count_times(testvalues, False) # [6, 4, 9, 2]
答案 2 :(得分:1)
使用reduce
。
def groups(U,i):
a = reduce(lambda u,v: (u[0],u[1]+1) if v==i else (u[0] + [u[1]], 0) if u[1]>0 else u, U,([],0))[0]
if U[0]== i: a=a[1:] # truncate begining
if U[-1]==i: a=a[:-1] # truncate end
return a
U = [0,0,1,1,1,1,1,1,0,0,0,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0]
uptime = groups(U,1)
downtime = groups(U,0)
答案 3 :(得分:1)
这有时称为行程编码。 R
为此rle()
提供了一个很好的内置函数。无论如何这里是我的方法,最初考虑使用takewhile()
,但这是我能想到的最干净的方式:
from itertools import chain
def rle(x):
x = chain(x)
last = x.next()
i = 1
for item in x:
if item != last:
yield (last, i)
i = 1
else:
i += 1
last = item
yield (last, i)
然后你可以像这样停机或正常运行:
[L for v,L in rle(U) if v == 1]
[L for v,L in rle(U) if v == 0]