Python列表比较字符并计算它们

时间:2016-04-10 21:23:42

标签: python list count

我对如何检查和比较Python中列表中的两个或多个字符有一点疑问。

例如,我有一个字符串“cdcdccddd”。我从这个字符串中创建了一个列表,以便于比较字符。所需的输出是: c:1 d:1 c:1 d:1 c:2 d:3 所以它是对字符进行计数,如果第一个与第二个不相同,则计数器= 1,如果第二个与第三个相同,则计数器为+1,需要用第四个检查第三个,依此类推。

I got so far this algorithm:
text = "cdcdccddd"
l = []
l = list(text)
print list(text)

for n in range(0,len(l)):
    le = len(l[n])
    if l[n] == l[n+1]:
        le += 1
        if l[n+1] == l[n+2]:
            le += 1
        print l[n], ':' , le
    else: 
        print l[n], ':', le

但它不能正常工作,因为它计算第一和第二个元素,但不计算第二个和第三个元素。对于此输出将是:

c : 1
d : 1
c : 1
d : 1
c : 2
c : 1
d : 3

如何使这个算法更好?

谢谢!

1 个答案:

答案 0 :(得分:3)

您可以使用itertools.groupby

from itertools import groupby
s = "cdcdccddd"

print([(k, sum(1 for _ in v)) for k,v in groupby(s)])
[('c', 1), ('d', 1), ('c', 1), ('d', 1), ('c', 2), ('d', 3)]

连续字符将被组合在一起,因此每个k都是该组的字符,调用sum(1 for _ in v)会给出每个组的长度,因此我们最终得到(char, len(group))对。 / p>

如果我们在ipython中运行它并在每个v上调用list,那么应该非常清楚发生了什么:

In [3]: from itertools import groupby

In [4]: s = "cdcdccddd"

In [5]: [(k, list(v)) for k,v in groupby(s)]
Out[5]: 
[('c', ['c']),
 ('d', ['d']),
 ('c', ['c']),
 ('d', ['d']),
 ('c', ['c', 'c']),
 ('d', ['d', 'd', 'd'])]

我们也可以轻松地推出自己的产品:

def my_groupby(s):
    # create an iterator
    it = iter(s)
    # set consec_count, to one and pull first char from s
    consec_count, prev = 1,  next(it)
    # iterate over the rest of the string
    for ele in it:
        # if last and current char are different
        # yield previous char, consec_count and reset
        if prev != ele:
            yield prev, 
            consec_count, = 0
        prev = ele
        consec_count, += 1
    yield ele, consec_count

这给了我们相同的东西:

In [8]: list(my_groupby(s))
Out[8]: [('c', 1), ('d', 1), ('c', 1), ('d', 1), ('c', 2), ('d', 3)]