Question

我有一个名为dna.txt的文本文件，其中包含：

>A
ACG
>B
CCG
>C
CCG
>D
TCA

我想用Python创建一个程序，将第一个序列（ACG）之后的文本文件的所有行与第一个序列（ACG）进行比较，如果序列匹配则打印出“conserved”，并且如果序列不匹配，则“不保守”。我使用一种非常低效的方式来做到这一点，文件中只有30个序列，我想知道如何利用循环来简化这段代码。这只是我使用的低效方法的一个简短示例：

f = open("dna.txt")
sequence_1 = linecache.getline('dna.txt', 2)
sequence_2 = linecache.getline('dna.txt', 4)
sequence_3 = linecache.getline('dna.txt', 6)
sequence_x = linecache.getline('dna.txt', 2x)
f.close()
if sequence_2 == sequence_1:
    print("Conserved")
else:
    print("Not Conserved")
if sequence_3 == sequence_1:
    print("Conserved")
else:
    print("Not Conserved")
if sequence_x == sequence_1
    print("Conserved")
else:
    print("Not Conserved")

正如您可以明显看出的那样，这可能是尝试完成我想要做的最糟糕的方式。非常感谢帮助，谢谢！

Answer 1

循环肯定会提高效率。这有可能：

f = open("dna.txt","r")
sequence_1 = f.readline()
sequence_1 = f.readline()  # Get the actual sequence.
sequence_line = False      # This will switch back and forth to skip every other line.
for line in f:             # Iterate over all remaining lines.
  if sequence_line:        # Only test this every other line.
    if line == sequence_1:
      print("Conserved")
    else:
      print("Not Conserved")
  sequence_line = not sequence_line   # Switch the boolean every iteration.
f.close()

sequence_line布尔表示我们是否正在查看序列行。对于每次循环迭代，行sequence_line = not sequence_line将来回翻转它，因此它每隔一段时间True。这就是我们如何跳过其他所有行，只比较我们关心的行。

此方法可能没有列表推导那么快，但它会阻止您将整个文件存储在内存中，如果它非常大。如果你能把它放在记忆中，那么Emanuele Paolini的解决方案可能会非常快。

Answer 2

f = open("dna.txt")
lines = [line for line in f.readlines() if line[0] != '>']
for line in lines[1:]:
  if line == lines[0]:
    print "Conserved"
  else:
    print "Not Conserved"

Python比较条件中的字符串

2 个答案: