获取两个字符串之间的所有行并将数据存储在python 2.7中的另一个文件中

时间:2014-11-26 14:46:53

标签: python

我正在尝试实现一种方法来查找两个字符串之间的所有行,例如

'X_0_Gui_Menu_322_Menu_Outputs_SDI_overlays'
'X_0_Gui_Menu_321_Menu_Outputs_SDI_processing'
12:Button 11 released.
Wheel 4 turned from 31 to 30.
Button 9 pressed.
'X_0_Gui_Menu_321_Menu_Outputs_SDI_processing'
'X_0_Gui_Menu_322_Menu_Outputs_SDI_overlays'
Button 9 released.
  Setting parameters saved.
     Wheel 4 turned from 29 to 34.
     Button 9 pressed.
     'X_0_Gui_Menu_322_Menu_Outputs_SDI_overlays'
     'X_0_Gui_Menu_3231_Menu_Outputs_SDI_status'

因此文件的输出应为

文件1

按钮11发布。  车轮4从31变为30。  按下按钮9。

文件2

按钮9发布。  设置参数已保存。  车轮4从29变为34。  按下按钮9。

依旧......

我尝试在以下代码中实现但未按预期工作

with open("messages", "r") as fin:
    lines = (line.strip('PAGE(enter)\n') for line in fin)
    blocks = [list(g) for k, g in groupby(lines, bool) if k]
    file = 0
    for block in blocks:
        file = file + 1
        with open("commands_executed" + str(file), "a") as data_inside_page:
            data_inside_page.write(str(blocks))

4 个答案:

答案 0 :(得分:0)

import re
p = re.compile(ur'PAGE\(enter\)[\S ]+\s((?:(?![^\n]+PAGE\(leave\)).)*)', re.IGNORECASE | re.DOTALL)
test_str = u"Jan 01 01:25:08 AMIRA-134500021 user.notice gui-monitor[770]: ACTION:401b0836:8:PAGE(leave) 'X_0_Gui_Menu_322_Menu_Outputs_SDI_overlays'\nJan 01 01:25:08 AMIRA-134500021 user.notice gui-monitor[770]: ACTION:401b0836:8:PAGE(enter) 'X_0_Gui_Menu_321_Menu_Outputs_SDI_processing'\nJan 01 01:25:09 AMIRA-134500021 user.notice butler[774]: LOG:200708a0:12:Button 11 released.\nJan 01 01:25:12 AMIRA-134500021 user.notice butler[774]: LOG:200708a0:12:Wheel 4 turned from 31 to 30.\nJan 01 01:25:12 AMIRA-134500021 user.notice butler[774]: LOG:200708a0:12:Button 9 pressed.\nJan 01 01:25:12 AMIRA-134500021 user.notice gui-monitor[770]: ACTION:401b0836:8:PAGE(leave) 'X_0_Gui_Menu_321_Menu_Outputs_SDI_processing'\nJan 01 01:25:12 AMIRA-134500021 user.notice gui-monitor[770]: ACTION:401b0836:8:PAGE(enter) 'X_0_Gui_Menu_322_Menu_Outputs_SDI_overlays'\nJan 01 01:25:25 AMIRA-134500021 user.notice butler[774]: LOG:200708a0:12:Button 9 released.\nJan 01 01:25:25 AMIRA-134500021 user.notice concen[655]: LOG:2011088a:4:Setting parameters saved.\nJan 1 01:25:27 AMIRA-134500021 daemon.warn dnsmasq-dhcp[738]: DHCP packet received on eth0 which has no address\nJan 01 01:25:28 AMIRA-134500021 user.notice butler[774]: LOG:200708a0:12:Wheel 4 turned from 29 to 34.\nJan 01 01:25:28 AMIRA-134500021 user.notice butler[774]: LOG:200708a0:12:Button 9 pressed.\nJan 01 01:25:28 AMIRA-134500021 user.notice gui-monitor[770]: ACTION:401b0836:8:PAGE(leave) 'X_0_Gui_Menu_322_Menu_Outputs_SDI_overlays'\nJan 01 01:25:28 AMIRA-134500021 user.notice gui-monitor[770]: ACTION:401b0836:8:PAGE(enter) 'X_0_Gui_Menu_3231_Menu_Outputs_SDI_status'"

ll=re.findall(p, test_str)

试试这个。将列表ll的每个元素写入文件。参见演示。

http://regex101.com/r/oE6jJ1/37

答案 1 :(得分:0)

你不需要正则表达式,只需用readlines()读取所有行,然后将行数'PAGE(start)'切换到行'PAGE(leave)'所在的位置:

new=[]
start=0
with open('your_file.txt','r') as f :
    a= f.readlines()
    for i,j in enumerate(a):
        if 'PAGE(enter)' in j :
            start=i
            continue
        if 'PAGE(leave)' in j :
            if len (a[start+1:i]):
               new.append(a[start+1:i])

with open('new_file1','w') as f1 ,open('new_file2','w') as f2:
 for line in new[0] :
    f1.write(line)
 for line in new[1] :
    f2.write(line)

new_file1:

Button 11 released.
Wheel 4 turned from 31 to 30.
Button 9 pressed.

new_file2:

Button 9 released.
Setting parameters saved.
Wheel 4 turned from 29 to 34.
Button 9 pressed.

答案 2 :(得分:0)

试试这个(Python 2.7):

import re

START_REGEX = re.compile(r"PAGE\(enter\)")
END_REGEX = re.compile(r"PAGE\(leave\)")
file = 0
with open("messages", "r") as fin:
  for line in fin:
    if not START_REGEX.search(line): continue
    commands = []
    for included_line in fin:
      if END_REGEX.search(included_line):
        file += 1
        with open("commands_executed" + str(file), "a") as data_inside_page:
          for command in commands: data_inside_page.write(command)
        break
      else:
        commands.append(included_line)

答案 3 :(得分:0)

我建议使用这样的生成器使用Pythonic:

def eachBlock(inFile):

    lineIterator = iter(inFile)

    def eachBlockLine():
        while True:  # until an end-line is found
            line = lineIterator.next()
            if matchesEnd(line):
                break
            yield line

    while True:  # until the end of file
        line = lineIterator.next()
        if matchesStart(line):
            yield eachBlockLine()

with open('input-file') as inFile:
    for i, block in enumerate(eachBlock(inFile)):
        with open('output-file-%d' % i, 'w') as outFile:
            for line in block:
                outFile.write(line)

这样的解决方案是完全流式传输整个输入,没有(大)存储在内存中。但也许你的输入(文件,其中的块)永远不会那么大,这很重要。