Question

def count_squences(string):

    i= 0 
    total = 0
    total_char_list = []

    while i < len(string):
        print(string[i])

        if string[i] == "x":
            total += 1
        if string[i] == "y":

            total_char_list.append(total)
            total = 0

        i = i + 1

    return total_char_list


print(count_squences("xxxxyyxyxx"))

我正在尝试以文件格式返回最常用的x个字符。例如，此函数应返回[4，1，2]。

例如，如果字符串为“ xxxxxyxxyxxx”，则应返回[5，2，3]

我的函数未返回正确的列表。任何帮助将不胜感激。谢谢

Answer 1

遇到y字符时，您不会重置计数器，并且只有在找到一个total_char_list字符时，您才应该追加x y个字符（y个字符也可以重复）：

total = 0
while i < len(string):
    if string[i] == "x":
        total += 1
    if string[i] == "y":
        if total:
            total_char_list.append(total)
        total = 0

    i = i + 1

接下来，当循环结束并且total不为零时，您也需要附加该值，否则末尾将不计算'x'个字符的顺序：

while ...:
    # ...

if total:
    # x characters at the end
    total_char_list.append(total)

接下来，您真的想使用for循环遍历序列。这样会给您单个字符：

total = 0
for char in string:
    if char == 'x':
        total += 1
    if char == 'y':
        if total:
            total_charlist.append(total)
        total = 0

if total:
    # x characters at the end
    total_char_list.append(total)

您可以使用itertools.groupby()来加快速度：

from itertools import groupby

def count_squences(string):
    return [sum(1 for _ in group) for char, group in groupby(string) if char == 'x']

groupby()将可迭代的输入（例如字符串）划分为每个组单独的迭代器，其中，一个组定义为具有相同key(value)结果的任何连续值。默认的key()函数仅返回该值，因此groupby(string)为您提供了相同的连续字符组。 char是重复字符，sum(1 for _ in group)占用迭代器的长度。

然后您可以使其更通用，并计算所有组：

def count_all_sequences(string):
    counts = {}
    for char, group in groupby(string):
        counts.setdefault(char, []).append(sum(1 for _ in group))
    return counts

使用正则表达式也可以做到这一点：

import re

def count_all_sequences(string):
    counts = {}
    # (.)(\1*) finds repeated characters; (.) matching one, \1 matching the same
    # This gives us (first, rest) tuples, so len(rest) + 1 is the total length
    for char, group in re.findall(r'(.)(\1*)', string):
        counts.setdefault(char, []).append(len(group) + 1)
    return counts

Answer 2

您不必在序列之间初始化total的值，因此它会不断计数。

def count_squences(string):
    i= 0 
    total = 0
    total_char_list = []
    while i < len(string):
        if string[i] == "x":
            total += 1
        if string[i] == "y":
            if total != 0:
                total_char_list.append(total)
                total = 0
        i = i + 1
    if total != 0:
       total_char_list.append(total)
    return total_char_list

更新（17:00）-修复了原始程序，我想到了一个更好的解决方案-

my_str = "xxxxyyxyxx"
[len(z) for z in re.split("y+", my_str)]

Answer 3

针对功能格式进行了编辑：

def count_sequences(string):
    return [len(x) for x in re.findall(r"x+", string)]

count_sequences("xxxxyyxyxx")

返回[4,1,2]

计算顺序出现的相同字符

3 个答案: