Question

如果符合某些条件，我试图从文件中提取某些行。具体来说，列[3]需要以Chr3：开头，列[13]需要为“是”。

以下是匹配且与标准不匹配的行的示例：

XLOC_004170   XLOC_004170 -   Ch3:14770-25031 SC_JR32_Female  SC_JR32_Male    OK  55.8796 9.2575  -2.59363    -0.980118   0.49115 0.897554    no
XLOC_004387   XLOC_004387 -   Ch3:3072455-3073591 SC_JR32_Female  SC_JR32_Male    OK  0   35.4535 inf -nan    5e-05   0.0149954   yes

我使用的python脚本是：

with open(input_file) as fp: # fp is the file handle
    for line in fp: #line is the iterator
        line=line.split("\t")
        locus = str(line[3])
        significance = str(line[13])
        print(locus)
        print(significance)

        if (re.match('Chr3:[0-9]+-[0-9]+',locus,flags=0) and re.match('yes',significance,flags=0)):
            output.write(("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n")%(line[0],line[1],line[2],line[3],line[4],line[5],line[6],line[7],line[8],line[9],line[10],line[11],line[12],line[13]))

如果有人能解释为什么这个脚本没有返回输出，我真的很感激。

Answer 1

这种简单的检查不需要正则表达式。更好地使用startswith()和==：

if locus.startswith('Chr3:') and significance == 'yes':

UPD：您需要在if条件之前对strip()和locus变量应用significance：

locus = str(line[3]).strip()
significance = str(line[13]).strip()

Answer 2

这里没有理由使用正则表达式：

with open(input_file) as handle:
    for line in handle:
        cells = line.split('\t')

        locus = cells[2]
        significance = cells[12]

        if locus.startswith('Ch3:') and significance == 'yes':
            output.write('\t'.join(cells) + '\n')

Python中的RegEx不返回匹配项

2 个答案: