我的目标是计算数组中的重复字母。因此,如果我有任何数组单词= ['capps','bat','hatt']。我要得到一个输出[1,0,1]的计数器数组,或者如果它是['apple aab','gabb','ppl']它是[2,1,1]
我的策略采用此数组并将其转换为str并使用list函数并将其分解为各个字母。因此,我可以迭代此数组并计算获得的重复项数量。这是解决此问题的正确方法吗?
words = ['apple','gabb','ppl']
words = " ".join(str(x) for x in words)
result = [character for character in words]
counter = 0
tmp = []
for i in range(len(result)-1):
if result[i] == result[i+1]:
if result[i] and result[i+1] != ' ':
counter+=1
else:
tmp.append(0)
tmp.append(counter)
print(tmp)
我得到的输出是[0,1,1,1,1,1,1,1,2,2,2,2,3,3]
答案 0 :(得分:3)
通过重复表示两个连续的字符相同。
您可以使用itertools.groupby
对相同的元素进行分组。
如果要计算连续对的总数,例如'appple'
有 2 ,则使用以下内容。
from itertools import groupby
words = ['apple aab','gabb','ppl']
counter = []
for word in words:
counter.append(0)
for _, group in groupby(word):
counter[-1] += sum(1 for _ in group) - 1
print(counter) # [2, 1, 1]
如果您需要计算序列数而不论其长度如何,例如'appple'
仅具有一个序列,请使用以下方法:
from itertools import groupby
words = ['apppple aab','gabb','ppl']
# ^----- one long sequence
counter = []
for word in words:
counter.append(0)
for _, group in groupby(word):
# Here we increment only by one for sequence of length 2 or more
for word in words:
if sum(1 for _ in group) > 1:
counter[-1] += 1
print(counter) # [2, 1, 1]
答案 1 :(得分:2)
您可以通过一些实用的魔术来做到这一点:
# counts duplicates in word
def duplicates(word):
return sum(1 for x, y in zip(word, word[1:]) if x == y)
result = list(map(duplicates, words))
对于输入['apple aab','gabb','ppl']
,结果为[2,1,1]
。
答案 2 :(得分:0)
这是一种计算单词中连续条纹的方法。我假设您接受“ aaaaaaaa”作为一条“条纹”:
import re
def consecutive_streaks(w):
w = re.sub(r'(.)(?=\1\1)', '', w)
return sum([1 for i in range(1, len(w)) if w[i-1] == w[i]])
words = ['appppple aab','gabb','ppl']
print([consecutive_streaks(w) for w in words])
[2, 1, 1]
regex对字符串进行预处理,以压缩超过2至2个字符的条纹。然后,遍历字符串并计算重复字符的每个实例。