我有一个以下格式的文本文件:
AAAAATTTTTT
AAATTTTTTGGG
TTTDDDCCVVVVV
我试图计算在行的开头和结尾依次出现的字符数。
我写了以下函数:
def getStartEnd(sequence):
start = sequence[0]
end = sequence[-1]
startCount = 0
endCount = 0
for char in sequence:
if char == start:
startCount += 1
if ( char != start):
break
for char in reversed(sequence):
if char == end:
endCount += 1
if ( char != end):
break
return startCount, endCount
此函数独立于字符串。例如:
seq = "TTTDDDCCVVVVV"
a,b = getStartEnd(seq)
print a,b
但是当我在for循环中插入时,它只在文件的最后一行给出正确的值。
file = open("Test.txt", 'r')
for line in file:
a,b = getStartEnd(str(line))
print a, b
答案 0 :(得分:3)
因为除最后一行之外的行包含换行符。
尝试以下(剥离尾随空格):
with open("Test.txt", 'r') as f:
for line in f:
a, b = getStartEnd(line.rstrip())
print a, b
BTW,以下代码中的( char != end )
始终为False。 (( char != start)
)
for char in reversed(sequence):
if char == end:
endCount += 1
if ( char != end): # always False because char == end
break
你是说这个吗?
for char in reversed(sequence):
if char == end:
endCount += 1
else:
break
如何使用itertools.takewhile
:
import itertools
def getStartEnd(sequence):
start = sequence[0]
end = sequence[-1]
start_count = sum(1 for _ in itertools.takewhile(lambda ch: ch == start, sequence))
end_count = sum(1 for _ in itertools.takewhile(lambda ch: ch == end, reversed(sequence)))
return start_count, end_count
答案 1 :(得分:1)
三件事。首先,在您的函数中,您可能希望break
使用以下结构。
for char in sequence:
if char == start:
startCount += 1
else:
break
for char in reversed(sequence):
if char == end:
endCount += 1
else:
break
其次,当您循环文件中的行时,不需要使用str
函数将行转换为字符串。他们已经是字符串!
第三,这些行包括换行符,如下所示:'\n'
它们用于告诉计算机何时结束一行并开始一个新行。要摆脱它们,您可以使用字符串的rstrip
方法,如下所示:
file = open("Test.txt", 'r')
for line in file:
a,b = getStartEnd(line.rstrip())
print a, b
file.close()