Question

我有一个以下格式的文本文件：

AAAAATTTTTT
AAATTTTTTGGG
TTTDDDCCVVVVV

我试图计算在行的开头和结尾依次出现的字符数。

我写了以下函数：

def getStartEnd(sequence):
    start = sequence[0]
    end = sequence[-1]
    startCount = 0
    endCount = 0

    for char in sequence:
        if char == start:
            startCount += 1
            if ( char != start):
                break

    for char in reversed(sequence):
        if char == end:
            endCount += 1
            if ( char != end):
                break

    return startCount, endCount

此函数独立于字符串。例如：

seq = "TTTDDDCCVVVVV"
a,b = getStartEnd(seq)
print a,b

但是当我在for循环中插入时，它只在文件的最后一行给出正确的值。

file = open("Test.txt", 'r')

for line in file:
    a,b = getStartEnd(str(line))
    print a, b

Answer 1

因为除最后一行之外的行包含换行符。

尝试以下（剥离尾随空格）：

with open("Test.txt", 'r') as f:
    for line in f:
        a, b = getStartEnd(line.rstrip())
        print a, b

BTW，以下代码中的( char != end )始终为False。（( char != start)）

相同

for char in reversed(sequence):
    if char == end:
        endCount += 1
        if ( char != end): # always False because char == end
            break

你是说这个吗？

for char in reversed(sequence):
    if char == end:
        endCount += 1
    else:
        break

如何使用itertools.takewhile：

import itertools

def getStartEnd(sequence):
    start = sequence[0]
    end = sequence[-1]
    start_count = sum(1 for _ in itertools.takewhile(lambda ch: ch == start, sequence))
    end_count = sum(1 for _ in itertools.takewhile(lambda ch: ch == end, reversed(sequence)))
    return start_count, end_count

Answer 2

三件事。首先，在您的函数中，您可能希望break使用以下结构。

for char in sequence:
    if char == start:
        startCount += 1
    else:
        break

for char in reversed(sequence):
    if char == end:
        endCount += 1
    else:
        break

其次，当您循环文件中的行时，不需要使用str函数将行转换为字符串。他们已经是字符串！

第三，这些行包括换行符，如下所示：'\n'它们用于告诉计算机何时结束一行并开始一个新行。要摆脱它们，您可以使用字符串的rstrip方法，如下所示：

file = open("Test.txt", 'r')

for line in file:
    a,b = getStartEnd(line.rstrip())
    print a, b
file.close()

函数在for循环中无法正常工作 - Python

2 个答案: