计算python中列表中相同条目的长度

时间:2014-04-11 16:51:20

标签: python

我有一个时间序列表示对系统功能的常规查询,其中1 = working0 = not working。例如,将时间序列表示为列表

U = [0,0,1,1,1,1,1,1,0,0,0,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0]

我有兴趣计算平均故障时间(系统保持多长时间)和平均修复时间(系统停留时间长短)和其他类似统计数据,所以我想要做的就是计算顺序1条目和顺序0条目。我想要修剪开始和结束集,因为,对于上面的例子,我不知道系统最初何时失效,以及何时它将在未来恢复。所以我想在这种情况下生成的输出是

uptime = [6, 4, 9, 2] # 6 ones followed by zeros, then 4 ones followed by zeros, etc.
downtime = [3, 3, 2] # like uptime but ignoring zeros at indices [0,1] and [-1] 

我写了一个脚本来做到这一点,但它看起来有点尴尬,我想知道是否有更好,更pythonic的方式来做到这一点。这就是我所拥有的。

def count_times(U, down=False):
    if down:
        U = [1 - u for u in U]
    T = [] 
    # Skip the first entry as you don't know when it started
    m = U.index(0)
    m += U[m:].index(1)
    while m < len(U):
        try:
            T.append(U[m:].index(0))
            m += U[m:].index(0)
            m += U[m:].index(1)
        except ValueError:
            # skip the last entry as you don't know when it will end
            return T

得到以下特性:

print count_times(U)
# [6, 4, 9, 2]
print count_times(U, down = True)
# [3, 3, 2]

这样可行,但我不禁想知道是否有更清洁的方法可以做到这一点?

4 个答案:

答案 0 :(得分:2)

我的方法类似于Ruben,但它最初在应用groupby后将上下时间保持在同一列表中,因此更容易修剪开始和结束集。

import itertools
U = [0,0,1,1,1,1,1,1,0,0,0,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0]
run_lengths = [(value, len(list(group))) for value, group in itertools.groupby(U)]

#discard first and last runs
run_lengths = run_lengths[1:-1]

#split runs into separate up and down time lists
uptime = [length for value, length in run_lengths if value == 1]
downtime = [length for value, length in run_lengths if value == 0]

print uptime
print downtime

结果:

[6, 4, 9, 2]
[3, 3, 2]

答案 1 :(得分:1)

您可以使用groupby模块中的itertools

from itertools import groupby

testvalue = [0,0,1,1,1,1,1,1,0,0,0,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0]

def count_times(U, down=False):
    if down:
        return [len(list(group)) for key, group in groupby(U) if key == 0]
    else:
        return [len(list(group)) for key, group in groupby(U) if key == 1]

print count_times(testvalues, True) # [2, 3, 3, 2, 1]
print count_times(testvalues, False) # [6, 4, 9, 2]

答案 2 :(得分:1)

使用reduce

def groups(U,i):
    a = reduce(lambda u,v: (u[0],u[1]+1) if v==i else (u[0] + [u[1]], 0) if u[1]>0 else u, U,([],0))[0]
    if U[0]== i: a=a[1:]   # truncate  begining
    if U[-1]==i: a=a[:-1]  # truncate end
    return a


U = [0,0,1,1,1,1,1,1,0,0,0,1,1,1,1,0,0,0,1,1,1,1,1,1,1,1,1,0,0,1,1,0]

uptime = groups(U,1)
downtime = groups(U,0)

答案 3 :(得分:1)

这有时称为行程编码。 R为此rle()提供了一个很好的内置函数。无论如何这里是我的方法,最初考虑使用takewhile(),但这是我能想到的最干净的方式:

from itertools import chain

def rle(x):
    x = chain(x)
    last = x.next()
    i = 1
    for item in x:
        if item != last:
            yield (last, i)
            i = 1
        else:
            i += 1
        last = item
    yield (last, i)

然后你可以像这样停机或正常运行:

[L for v,L in rle(U) if v == 1]
[L for v,L in rle(U) if v == 0]