从文本文件中删除特定文本

时间:2013-07-25 13:30:30

标签: python python-2.7

我有一个文本文件 Text file

>E8|E2|E9D
Football is a good game
Its good for health
you can play it every day
>E8|E2|E10D
Sequence unavailable
>E8|E2|EKB
Cricket

我编写了以下代码,用于检测文本文件中的不可用序列,并将其写入新的文本文件

lastline = None
with open('output.txt', 'w') as W:
    with open('input.txt', 'r') as f:
        for line in f.readlines():
            if not lastline:
                lastline = line.rstrip('\n')
                continue
            if line.rstrip('\n') == 'Sequence unavailable':
                _, _, id = lastline.split('|')
                data= 'Sequence unavailable|' + id
                W.write(data)
                W.write('\n')
            lastline = None

它工作正常,它检测文本文件中的序列不可用并将其写入新文件,但我希望它从它读取的文件中删除它

input.txt中

>E8|E2|E9D
Football is a good game
Its good for health
you can play it every day
>E8|E2|E10D
Sequence unavailable
>E8|E2|EKB
Cricket

代码后的输入应该是这样的

>E8|E2|E9D
Football is a good game
Its good for health
you can play it every day
>E8|E2|EKB
Cricket

3 个答案:

答案 0 :(得分:2)

这里我没有使用file.readlines方法,因为它将文件中的所有行都提取到列表中。因此,它不具有内存效率。

方法1:使用临时文件。

import os
with open('input.txt') as f1, open('output.txt', 'w') as f2,\
                                                  open('temp_file','w') as f3:
    lines = []       # store lines between two `>` in this list
    for line in f1:
        if line.startswith('>'):
            if lines:
                f3.writelines(lines)
                lines = [line]
            else:
                lines.append(line)
        elif line.rstrip('\n') == 'Sequence unavailable':
            f2.writelines(lines + [line])
            lines = []
        else:
            lines.append(line)

    f3.writelines(lines)

os.remove('input.txt')
os.rename('temp_file', 'input.txt')

<强>演示:

$ cat input.txt
>E8|E2|E9D
Football is a good game
Its good for health
you can play it every day
>E8|E2|E10D
Sequence unavailable
>E8|E2|EKB
Cricket

$ python so.py

$ cat input.txt
>E8|E2|E9D
Football is a good game
Its good for health
you can play it every day
>E8|E2|EKB
Cricket
$ cat output.txt
>E8|E2|E10D
Sequence unavailable

要生成临时文件,您还可以使用tempfile模块。

方法2:fileinput模块

使用此方法无需临时文件:

import fileinput
with open('output.txt', 'w') as f2:
    lines = []
    for line in fileinput.input('input.txt', inplace = True):
        if line.startswith('>'):
             if lines:
                 print "".join(lines),
                 lines = [line]
             else:
                 lines.append(line)
        elif line.rstrip('\n') == 'Sequence unavailable':
             f2.writelines(lines + [line])
             lines = []
        else:
             lines.append(line)

    with open('input.txt','a') as f:
        f.writelines(lines)

答案 1 :(得分:0)

你是以正确的方式做到的。

完成后你需要的是将'output.txt'文件重命名为'input.txt'。

(不,没有简单的方法直接从您打开的文件中剪切一行。)

答案 2 :(得分:0)

import os
os.system("cp output.txt input.txt")

这将使用包含已删除行的输出文件覆盖您的输入。 mv也可用于重命名

os.system("mv output.txt input.txt")

这只会保留一个文件,而cp将保留两个文件

You should probably use os.rename()