我正在尝试实现一种方法来查找两个字符串之间的所有行,例如
'X_0_Gui_Menu_322_Menu_Outputs_SDI_overlays'
'X_0_Gui_Menu_321_Menu_Outputs_SDI_processing'
12:Button 11 released.
Wheel 4 turned from 31 to 30.
Button 9 pressed.
'X_0_Gui_Menu_321_Menu_Outputs_SDI_processing'
'X_0_Gui_Menu_322_Menu_Outputs_SDI_overlays'
Button 9 released.
Setting parameters saved.
Wheel 4 turned from 29 to 34.
Button 9 pressed.
'X_0_Gui_Menu_322_Menu_Outputs_SDI_overlays'
'X_0_Gui_Menu_3231_Menu_Outputs_SDI_status'
因此文件的输出应为
文件1
按钮11发布。 车轮4从31变为30。 按下按钮9。
文件2
按钮9发布。 设置参数已保存。 车轮4从29变为34。 按下按钮9。
依旧......
我尝试在以下代码中实现但未按预期工作
with open("messages", "r") as fin:
lines = (line.strip('PAGE(enter)\n') for line in fin)
blocks = [list(g) for k, g in groupby(lines, bool) if k]
file = 0
for block in blocks:
file = file + 1
with open("commands_executed" + str(file), "a") as data_inside_page:
data_inside_page.write(str(blocks))
答案 0 :(得分:0)
import re
p = re.compile(ur'PAGE\(enter\)[\S ]+\s((?:(?![^\n]+PAGE\(leave\)).)*)', re.IGNORECASE | re.DOTALL)
test_str = u"Jan 01 01:25:08 AMIRA-134500021 user.notice gui-monitor[770]: ACTION:401b0836:8:PAGE(leave) 'X_0_Gui_Menu_322_Menu_Outputs_SDI_overlays'\nJan 01 01:25:08 AMIRA-134500021 user.notice gui-monitor[770]: ACTION:401b0836:8:PAGE(enter) 'X_0_Gui_Menu_321_Menu_Outputs_SDI_processing'\nJan 01 01:25:09 AMIRA-134500021 user.notice butler[774]: LOG:200708a0:12:Button 11 released.\nJan 01 01:25:12 AMIRA-134500021 user.notice butler[774]: LOG:200708a0:12:Wheel 4 turned from 31 to 30.\nJan 01 01:25:12 AMIRA-134500021 user.notice butler[774]: LOG:200708a0:12:Button 9 pressed.\nJan 01 01:25:12 AMIRA-134500021 user.notice gui-monitor[770]: ACTION:401b0836:8:PAGE(leave) 'X_0_Gui_Menu_321_Menu_Outputs_SDI_processing'\nJan 01 01:25:12 AMIRA-134500021 user.notice gui-monitor[770]: ACTION:401b0836:8:PAGE(enter) 'X_0_Gui_Menu_322_Menu_Outputs_SDI_overlays'\nJan 01 01:25:25 AMIRA-134500021 user.notice butler[774]: LOG:200708a0:12:Button 9 released.\nJan 01 01:25:25 AMIRA-134500021 user.notice concen[655]: LOG:2011088a:4:Setting parameters saved.\nJan 1 01:25:27 AMIRA-134500021 daemon.warn dnsmasq-dhcp[738]: DHCP packet received on eth0 which has no address\nJan 01 01:25:28 AMIRA-134500021 user.notice butler[774]: LOG:200708a0:12:Wheel 4 turned from 29 to 34.\nJan 01 01:25:28 AMIRA-134500021 user.notice butler[774]: LOG:200708a0:12:Button 9 pressed.\nJan 01 01:25:28 AMIRA-134500021 user.notice gui-monitor[770]: ACTION:401b0836:8:PAGE(leave) 'X_0_Gui_Menu_322_Menu_Outputs_SDI_overlays'\nJan 01 01:25:28 AMIRA-134500021 user.notice gui-monitor[770]: ACTION:401b0836:8:PAGE(enter) 'X_0_Gui_Menu_3231_Menu_Outputs_SDI_status'"
ll=re.findall(p, test_str)
试试这个。将列表ll
的每个元素写入文件。参见演示。
答案 1 :(得分:0)
你不需要正则表达式,只需用readlines()
读取所有行,然后将行数'PAGE(start)'
切换到行'PAGE(leave)'
所在的位置:
new=[]
start=0
with open('your_file.txt','r') as f :
a= f.readlines()
for i,j in enumerate(a):
if 'PAGE(enter)' in j :
start=i
continue
if 'PAGE(leave)' in j :
if len (a[start+1:i]):
new.append(a[start+1:i])
with open('new_file1','w') as f1 ,open('new_file2','w') as f2:
for line in new[0] :
f1.write(line)
for line in new[1] :
f2.write(line)
new_file1:
Button 11 released.
Wheel 4 turned from 31 to 30.
Button 9 pressed.
new_file2:
Button 9 released.
Setting parameters saved.
Wheel 4 turned from 29 to 34.
Button 9 pressed.
答案 2 :(得分:0)
试试这个(Python 2.7):
import re
START_REGEX = re.compile(r"PAGE\(enter\)")
END_REGEX = re.compile(r"PAGE\(leave\)")
file = 0
with open("messages", "r") as fin:
for line in fin:
if not START_REGEX.search(line): continue
commands = []
for included_line in fin:
if END_REGEX.search(included_line):
file += 1
with open("commands_executed" + str(file), "a") as data_inside_page:
for command in commands: data_inside_page.write(command)
break
else:
commands.append(included_line)
答案 3 :(得分:0)
我建议使用这样的生成器使用Pythonic:
def eachBlock(inFile):
lineIterator = iter(inFile)
def eachBlockLine():
while True: # until an end-line is found
line = lineIterator.next()
if matchesEnd(line):
break
yield line
while True: # until the end of file
line = lineIterator.next()
if matchesStart(line):
yield eachBlockLine()
with open('input-file') as inFile:
for i, block in enumerate(eachBlock(inFile)):
with open('output-file-%d' % i, 'w') as outFile:
for line in block:
outFile.write(line)
这样的解决方案是完全流式传输整个输入,没有(大)存储在内存中。但也许你的输入(文件,其中的块)永远不会那么大,这很重要。