我已经检查并玩过各种示例,看起来我的问题比我能找到的要复杂得多。我需要做的是搜索一个特定的字符串,然后删除以下行并继续删除行,直到找到另一个字符串。以下是一个例子:
a
b
color [
0 0 0,
1 1 1,
3 3 3,
] #color
y
z
此处,"color ["
为match1,"] #color"
为match2。那么所需要的是以下内容:
a
b
color [
] #color
y
z
答案 0 :(得分:2)
这个“简单易懂”的代码示例将帮助您入门..您可以根据需要进行调整。请注意,它逐行处理文件,因此这适用于任何大小的文件。
start_marker = 'startdel'
end_marker = 'enddel'
with open('data.txt') as inf:
ignoreLines = False
for line in inf:
if start_marker in line:
print line,
ignoreLines = True
if end_marker in line:
ignoreLines = False
if not ignoreLines:
print line,
它使用startdel
和enddel
作为“标记”来开始和结束忽略数据。
更新:
根据评论中的请求修改了代码,将现在包含/打印包含“标记”的行。
鉴于此输入数据(从@drewk借来):
Beginning of the file...
stuff
startdel
delete this line
delete this line also
enddel
stuff as well
the rest of the file...
它产生:
Beginning of the file...
stuff
startdel
enddel
stuff as well
the rest of the file...
答案 1 :(得分:1)
您可以使用nongreedy *
使用单个正则表达式执行此操作。例如,假设您要同时保留"look for this line"
和"until this line is found"
行,并且只丢弃其间的行,则可以执行以下操作:
>>> my_regex = re.compile("(look for this line)"+
... ".*?"+ # match as few chars as possible
... "(until this line is found)",
... re.DOTALL)
>>> new_str = my_regex.sub("\1\2", old_str)
一些注意事项:
re.DOTALL
标志告诉Python“。”可以匹配换行符 - 默认情况下,它匹配除换行符之外的任何字符my_regex.sub("\1", old_str)
;或者摆脱使用my_regex.sub("", old_str)
有关详细说明,请参阅:http://docs.python.org/library/re.html或在您最喜爱的搜索引擎中搜索“非贪婪的正则表达式”。
答案 2 :(得分:1)
这有效:
s="""Beginning of the file...
stuff
look for this line
delete this line
delete this line also
until this line is found
stuff as well
the rest of the file... """
import re
print re.sub(r'(^look for this line$).*?(^until this line is found$)',
r'\1\n\2',s,count=1,flags=re.DOTALL | re.MULTILINE)
打印:
Beginning of the file...
stuff
look for this line
until this line is found
stuff as well
the rest of the file...
您还可以使用列表切片执行此操作:
mStart='look for this line'
mStop='until this line is found'
li=s.split('\n')
print '\n'.join(li[0:li.index(mStart)+1]+li[li.index(mStop):])
相同的输出。
我喜欢re
(对于Perl家伙而言......)