计算列表中相同长度的项目

时间:2011-12-02 17:57:28

标签: python list

我正在尝试使用pythonic编码方式移植cgi脚本。

sequence = "aaaabbababbbbabbabb"
res = sequence.split("a") + sequence.split("b")
res = [l for l in res if l]

结果是

>>> res
['bb', 'b', 'bbbb', 'bb', 'bb', 'aaaa', 'a', 'a', 'a', 'a']

这在C中是~100loc。现在我想有效地计算res列表中具有相同长度的项目。例如,res包含5个长度为1的元素,3个长度为2的元素和2个长度为4的元素。

问题是序列字符串可能非常大。

2 个答案:

答案 0 :(得分:6)

给定字符串列表生成字符串长度直方图的最简单方法是使用collections.Counter

>>> from collections import Counter
>>> a = ["a", "b", "aaa", "bb", "aa", "bbb", "", "a", "b"]
>>> Counter(map(len, a))
Counter({1: 4, 2: 2, 3: 2, 0: 1})

修改:还有一种更好的方法可以找到相同字符的运行,即itertools.groupby()

>>> sequence = "aaaabbababbbbabbabb"
>>> Counter(len(list(it)) for k, it in groupby(sequence))
Counter({1: 5, 2: 3, 4: 2})

答案 1 :(得分:1)

你可能会做类似

的事情
occurrences_by_length={} # map of length of string->number of strings with that length.
for i in (len(x) for x in (sequence.split("a")+sequence.split("b"))):
    if i in occurrences_by_length:
        occurrences_by_length[i]=occurrences_by_length[i]+1
    else:
        occurrences_by_length[i]=1

现在eventss_by_length有一个每个字符串长度到该长度字符串出现次数的映射。