Question

此脚本的目的是解析文本文件（sys.argv [1]），提取某些字符串，并在列中打印它们。我首先打印标题。然后我打开文件，逐行扫描。我确保该行具有特定的开始或包含特定的字符串，然后我使用正则表达式来提取特定的值。匹配和提取工作正常。

我的最终打印声明无效。

import re
import sys

print("{}\t{}\t{}\t{}\t{}".format("#query", "target", "e-value",
"identity(%)", "score"))



with open(sys.argv[1], 'r') as blastR:
    for line in blastR:
        if line.startswith("Query="):
            queryIDMatch = re.match('Query= (([^ ])+)', line)
            queryID = queryIDMatch.group(1)
            queryID.rstrip
        if line[0] == '>':
            targetMatch = re.match('> (([^ ])+)', line)
            target = targetMatch.group(1)
            target.rstrip
        if "Score = " in line:
            eValue = re.search(r'Expect = (([^ ])+)', line)
            trueEvalue = eValue.group(1)
            trueEvalue = trueEvalue[:-1]
            trueEvalue.rstrip()
            print('{0}\t{1}\t{2}'.format(queryID, target, trueEvalue), end='')

当我尝试打印列时出现问题。当我打印前两列时，它按预期工作（除了它仍在打印新行）：

#query  target  e-value identity(%) score
YAL002W Paxin1_129011
YAL003W Paxin1_167503
YAL005C Paxin1_162475
YAL005C Paxin1_167442

第3列是科学记数法中的数字，如2e-34

但是当我添加第3列eValue时，它会崩溃：

#query  target  e-value identity(%) score
YAL002W Paxin1_129011
    4e-43YAL003W    Paxin1_167503
    1e-55YAL005C    Paxin1_162475
    0.0YAL005C      Paxin1_167442
    0.0YAL005C      Paxin1_73182

据我所知，我已经使用rstrip（）方法删除了所有新行。

Answer 1

至少有三个问题：

1）queryID.rstrip和target.rstrip缺少结束()

2）trueEValue.rstrip()之类的东西不会改变字符串，你需要

trueEValue = trueEValue.rstrip()

如果您想保留更改。

3）这个可能是一个问题，但如果没有看到您的数据，我就无法100％确定。 r中的rstrip代表＆＃34;右＆＃34;。如果trueEvalue为4e-43\n，那么trueEValue.rstrip()就会没有新行。但问题是你的价值观似乎是\n43-43。如果您只是使用.strip()，则会从任意一方删除换行符。

str.format将最后一个变量放在打印中

1 个答案: