计算单词数组中相同的连续字母

时间:2018-08-10 21:42:50

标签: python arrays string

我的目标是计算数组中的重复字母。因此,如果我有任何数组单词= ['capps','bat','hatt']。我要得到一个输出[1,0,1]的计数器数组,或者如果它是['apple aab','gabb','ppl']它是[2,1,1]

我的策略采用此数组并将其转换为str并使用list函数并将其分解为各个字母。因此,我可以迭代此数组并计算获得的重复项数量。这是解决此问题的正确方法吗?

words = ['apple','gabb','ppl']
words = " ".join(str(x) for x in words)
result = [character for character in words]
counter = 0
tmp = []
for i in range(len(result)-1):
    if result[i] == result[i+1]:
        if result[i] and result[i+1] != ' ':
            counter+=1
        else:
            tmp.append(0)
    tmp.append(counter)

print(tmp)

我得到的输出是[0,1,1,1,1,1,1,1,2,2,2,2,3,3]

3 个答案:

答案 0 :(得分:3)

通过重复表示两个连续的字符相同。

您可以使用itertools.groupby对相同的元素进行分组。

计数对

如果要计算连续对的总数,例如'appple' 2 ,则使用以下内容。

from itertools import groupby

words = ['apple aab','gabb','ppl']

counter = []

for word in words:
    counter.append(0)
    for _, group in groupby(word):
        counter[-1] += sum(1 for _ in group) - 1

print(counter) # [2, 1, 1]

计数顺序

如果您需要计算序列数而不论其长度如何,例如'appple'仅具有一个序列,请使用以下方法:

from itertools import groupby

words = ['apppple aab','gabb','ppl']
#          ^----- one long sequence

counter = []

for word in words:
    counter.append(0)
    for _, group in groupby(word):

        # Here we increment only by one for sequence of length 2 or more
        for word in words:
            if sum(1 for _ in group) > 1:
                counter[-1] += 1

print(counter) # [2, 1, 1]

答案 1 :(得分:2)

您可以通过一些实用的魔术来做到这一点:

# counts duplicates in word 
def duplicates(word):
    return sum(1 for x, y in zip(word, word[1:]) if x == y)

result = list(map(duplicates, words))

对于输入['apple aab','gabb','ppl'],结果为[2,1,1]

答案 2 :(得分:0)

这是一种计算单词中连续条纹的方法。我假设您接受“ aaaaaaaa”作为一条“条纹”:

import re

def consecutive_streaks(w):
    w = re.sub(r'(.)(?=\1\1)', '', w)
    return sum([1 for i in range(1, len(w)) if w[i-1] == w[i]])

words = ['appppple aab','gabb','ppl']
print([consecutive_streaks(w) for w in words])

输出

[2, 1, 1]

说明

regex对字符串进行预处理,以压缩超过2至2个字符的条纹。然后,遍历字符串并计算重复字符的每个实例。