在Python列表中查找连续重复的字符串

时间:2014-08-22 01:51:36

标签: python

在Python列表中查找连续重复字符串的最有效方法是什么?

例如,假设我有列表 ["a", "a", "b", "c", "b","b","b"]。我想要输出类似于["group of 2 a's found at index 0, group of 3 b's found at index 4']

的输出

是否有内置功能来完成此任务?我确实找到了numpy.bincount,但这似乎只适用于数值。

提前感谢您的帮助。

2 个答案:

答案 0 :(得分:7)

有趣的是你应该把它称为一个小组,因为最适合这个的功能是itertools.groupby

>>> import itertools
>>> items = ["a", "a", "b", "c", "b", "b", "b"]
>>> [(k, sum(1 for _ in vs)) for k, vs in itertools.groupby(items)]
[('a', 2), ('b', 1), ('c', 1), ('b', 3)]

(顺便说一下,sum(1 for _ in vs)是一个计数,因为len对任何迭代都不起作用,len(list(…))是浪费。)

获取索引有点复杂;我只是用循环来做。

import itertools

def group_with_index(l):
    i = 0

    for k, vs in itertools.groupby(l):
        c = sum(1 for _ in vs)
        yield (k, c, i)
        i += c

答案 1 :(得分:1)

这需要循环元素之间的状态信息,因此对列表理解不容易。相反,您可以在循环中跟踪最后一个值:

groups = []
for i, val in enumerate(["a", "a", "b", "c", "b","b","b"]):
    if i == 0:
         cnt = 1
         loc = i
         last_val = val
    elif val == last_val:
         cnt += 1
    else:
         groups.append((cnt, last_val, loc))
         cnt = 1
         loc = i
         last_val = val

for group in groups:
     print("group of {0} {1}'s found at index {2}".format(*group)

输出:

group of 2 a's found at index 0
group of 1 b's found at index 2
group of 1 c's found at index 3