Question

我有一个文件，其中包含以下行。（注意新行）

blah blah blah

ID:name1:1bj409ju9
how are you

Im good 100
blah blah

ID:name2:987krjtu
not so good

too bad 900
blah blah

some words blah blah

正如您所注意到以“ID”开头的行有一个模式。我的尝试是搜索ID：name [x]并删除5行（包括空格）。例如，我想删除文件中的下面一行。

ID:name1:10.1.1.10
how are you

I'm good 100
blah blah

我尝试了以下代码，但只删除了匹配“somename1”

的行

#!/usr/bin/python
import fileinput

filename = r"file.txt"
counter = -1
for linenum,line in enumerate(fileinput.FileInput(filename, inplace=1)):
    if "name1" in line:
        counter = linenum + 6
        if linenum == counter:
            line.strip()
    else:
        print line,

请注意，我想摆脱“blah blah”和“ID：somename2：987krjtu”之间的新空行。

Answer 1

你可以尝试：

def delete_lines(name, finput):
    for line in finput:
        if line.startswith('ID:') and line.contains(name):
            # iterate finput five times
            for i in range(5):
                next(finput) 
        else:
            # print the other lines
            print(line)
            # if you want to have the remaining lines in a variable you could also yield them
            yield(line)

然后调用函数：

lines = list(delete_lines('name1', fileinput.FileInput(filename, inplace=1)))

行将包含尚未删除的所有行。

请注意，相同的方法也应该与打开的文件描述符一起使用：

with open(filename, 'rt') as finput:
    delete_lines('name1', finput)

或内存行列表（如果您不关心在内存中加载完整文件）：

with open(filename, 'rt') as finput:
    lines = finput.readlines()
delete_lines('name1', finput)

Answer 2

如果您的文件可以放入内存，请使用正则表达式

如果要在two patterns之间删除：

import re
with open(fn) as f:
    result=re.sub(r'^ID:name1[\s\S]*(?=^ID:name2.*)','',f.read(),0,re.M)
    print result

模式说明：

^ID:name1[\s\S]*(?=^ID:name2.*)
^                                   Start of line
    ^                               First pattern
           ^                        A space and not a space - 
                                      a way of saying anything including new lines
               ^.                   greedy -- all of them
                     ^              stop before the end pattern

如果您希望匹配行后面有n行数（对两个锚点），您可以使用this regex：

with open(fn) as f:
    result=re.sub(r'^ID:name1.*\s(^.*$\s){1,5}','',f.read(),0,re.M)
    print result

这种模式的移植：

 ^ID:name1.*\s(^.*$\s){1,5}
    ^        ^                     start pattern line

                 ^                 1 to five lines following

python从给定关键字的文件中删除一定数量的行

2 个答案: