如何使用python删除某些行到另一行的某些行

时间:2017-03-23 06:15:58

标签: python

我的文件显示如下。我想删除从>rev_开始到>的下一行的行,而不是删除>行。我想要一个python代码来实现它。 输入文件:

>name1
fgrsagrhshsjtdkj
jfsdljgagdahdrah
gsag
>rev_name1                # delete from here
jfdsfjdlsgrgagrehdsah
fsagasfd                  # until here
>name2
jfosajgreajljioesfg
fjsdsagjljljlj
>rev_name2                # delete from here
jflsajgljkop
ljljasffdsa               # until here
>name3
.......

输出文件:

>name1
fgrsagrhshsjtdkj
jfsdljgagdahdrah
gsag
>name2
jfosajgreajljioesfg
fjsdsagjljljlj
>name3
.......

我的代码如下,但无法正常工作。

mark = {}
with open("human.fasta") as inf, open("human_norev.fasta",'w') as outf:
    for line in inf:
        if line[0:5] == '>rev_':
            mark[line] = 1
        elif line[0] == '>':
            mark[line] = 0
    if mark[line] == 0:
        outf.write(line)

3 个答案:

答案 0 :(得分:3)

我建议至少尝试自己提出一个解决方案,然后再向我们询问。问自己有关我可以采用哪种不同方法解决问题的问题,逐字符/逐行/正则表达式解析这个问题就足够了。

但是在这种情况下,由于确定何时开始和停止删除行总是在行的开头,因此只需逐行进行并检查起始的几个字符就有意义。

i = """>name1
fgrsagrhshsjtdkj
jfsdljgagdahdrah
gsag
>rev_name1                # delete from here
jfdsfjdlsgrgagrehdsah
fsagasfd                  # until here
>name2
jfosajgreajljioesfg
fjsdsagjljljlj
>rev_name2                # delete from here"""

final_string = ""
keep_line = True

for line in i.split('\n'):

    if line[0:5] == ">rev_":
        keep_line = False
    elif line[0] == '>':
        keep_line = True

    if keep_line:
        final_string += line + '\n'

print(final_string)

如果您希望这些线条直接转到控制台,您可以删除最后的打印件,并将final_string += line + '\n'替换为print(line)

答案 1 :(得分:1)

您的代码不起作用,因为(除其他外)您不会标记既不以>rev也不以>开头的行。此外,您需要另一个循环来输出已标记为输出的所有行 - 现在您只输出最后一行。

Alec的答案很好,但我建议使用正则表达式采用不同的方法:

import re
regex = re.compile(r">rev_[^>]*")
with open("human.fasta") as inf, open("human_norev.fasta", "w") as outf:
    outf.write(regex.sub("", inf.read()))

测试正则表达式live on regex101.com

答案 2 :(得分:1)

代码也可以如下:

with open("human.fasta") as inf, open("human_norev.fasta",'w') as outf:
    del_start = False
    for line in inf:
        if line.startswith('>rev_'):
            del_start = True
        elif line.startswith('>'):
            del_start = False

        if not del_start:
            outf.write(line)