Python:提取随机数。标记之间的文本文件中的行

时间:2018-04-19 06:41:43

标签: python python-2.7 extraction text-extraction

我有一些包含1000多行的文本文件。它包含以下格式的一些行:

seq open @ 2018/02/26 23:07:51 node: \nodes\wroot.nod (wroot)
seq call @ 2018/02/26 23:07:51 node: ttt
retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
BCU is working
seq done @ 2018/02/26 23:07:55 node:ttt

seq call @ 2018/02/26 23:07:55 node: fff
Open the firewall
Firewall opened
seq done @ 2018/02/26 23:07:57 node: fff

seq call @ 2018/02/26 23:07:57 node: \nodes\wchkefierror.bat (wroot#9^wchkefierror)
seq done @ 2018/02/26 23:07:57 node: \nodes\wchkefierror.bat (wroot#9^wchkefierror)

seq call @ 2018/02/26 23:07:57 node: \nodes\wuutmont.bat PTEFIE (wroot#12^wuutmont)

SENDING UUTMonitor.exe /timeevent:PTEFIE
seq done @ 2018/02/26 23:07:58 node: \nodes\wuutmont.bat PTEFIE (wroot#12^wuutmont)

seq call @ 2018/02/26 23:07:58 node: \nodes\wProcessInit.bat (wroot#13^wProcessInit)

02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat

<BISCON Version=xxxx">
x
y
</BISCON>
\process\ProcessInit.bat:::Parsing branding variables from INI files...
found \flags\custom.ini
PRODUCTIONLOCK not defined in custom.ini
\process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data...
02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat
seq done @ 2018/02/26 23:08:04 node: \nodes\wProcessInit.bat (wroot#13^wProcessInit)

seq log @ 2018/02/26 23:08:04 node: skipping wroot#14^wbios as \flags\bios_flash_wnd.trg file not exists

seq call @ 2018/02/26 23:08:04 node: aaa

Get SkeletonPO from \working\ubera.ini
seq done @ 2018/02/26 23:08:04 node: aaa

我想在列表中提取seq调用和seq之间的行,如果行以seq open或seq log开头,则在列表中插入NULL。

正如你所看到的,可能有任何随机的没有。 seq call和seq done之间的行甚至为0。我一直试图找到答案但无济于事。我也是python的新手。

上述样本的预期输出:

NULL
retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
BCU is working
Open the firewall
Firewall opened
NULL
SENDING UUTMonitor.exe /timeevent:PTEFIE
02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat

<BISCON Version=xxxx">
x
y
</BISCON>
\process\ProcessInit.bat:::Parsing branding variables from INI files...
found \flags\custom.ini
PRODUCTIONLOCK not defined in custom.ini
\process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data...
02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat
NULL
Get SkeletonPO from \working\ubera.ini

2 个答案:

答案 0 :(得分:1)

这是一种快速而肮脏的方法来获得你想要的东西:

def extractTxt(fpth, joinchar=' '):
    loglines = []
    with open(fpth) as f:
        incall = False
        calllines = []

        for line in f:
            if line.startswith('seq open') or line.startswith('seq log'):
                loglines.append('NULL')
            elif line.startswith('seq call'):
                incall = True
            elif incall:
                if line.startswith('seq done'):
                    incall = False
                    call = joinchar.join(l for l in calllines if l)
                    calllines = []

                    if not call.strip():
                        loglines.append('NULL')
                    else:
                        loglines.append(call)
                else:
                    calllines.append(line.strip())

    return loglines

extractTxt('seq.txt')

输出:

['NULL',
 'retrieve BIOS data using F:\\tools64\\BiosConfigUtility64.exe /GetConfig:\\working\\bcudump.txt BCU is working',
 'Open the firewall Firewall opened',
 'NULL',
 'SENDING UUTMonitor.exe /timeevent:PTEFIE',
 '02/26/2018 23:07:59 : @@@@ begin_\\process\\ProcessInit.bat <BISCON Version=xxxx"> x y </BISCON> \\process\\ProcessInit.bat:::Parsing branding variables from INI files... found \\flags\\custom.ini PRODUCTIONLOCK not defined in custom.ini \\process\\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data... 02/26/2018 23:08:04 : @@@@ end\\process\\ProcessInit.bat',
 'NULL',
 'Get SkeletonPO from \\working\\ubera.ini']

您可以通过将不同的joinchar参数传递给extractTxt来更改每个调用中单独行在列表条目中的连接方式。我将把任何进一步的造型/组织任务留作练习。

细节

该行:

call = joinchar.join(l for l in calllines if l)

做了几件不同的事情。 join method将使用其前面的字符串将字符串列表连接在一起。例如,以下表达式:

', '.join(['foo', 'bar', 'baz', 'bof'])

将产生此输出:

'foo, bar, baz, bof'

括号内的部分行:

l for l in calllines if l

是一种称为generator expression的东西。解释起来有点复杂,但基本上它在这里所做的就是在calllines中创建一个非空的所有行的“列表”。如果您感到好奇,请查看链接页面以获取更多详细信息。您可以通过扩展来简化线路。总之,以下几行:

call = ''
for l in calllines:
    # l evaluates to False if it is empty
    if l:
        call += l + joinchar

# remove any trailing joinchar
if call.endswith(joinchar):
    call = call[:-len(joinchar)]

与单行call = joinchar.join(l for l in calllines if l)具有相同的效果。

答案 1 :(得分:0)

import re

begins_with_open_or_log=re.compile(r'seq open|seq log')
begins_with_call_and_done=re.compile(r'seq call|seq done')

with open('log.txt') as f:
    lines=f.readlines()
i=0
for line in lines:
    if re.match(begins_with_open_or_log,line):
        lines[i]='NULL\n'
    elif re.match(begins_with_call_and_done,line):
        lines[i]=''
    elif line=='\n':
        lines[i]=''
    i+=1
print (''.join(lines),end='')
  

我想在列表中提取seq调用和seq之间的行,如果行以seq open或seq log开头,则在列表中插入NULL。

这可能是您想要的输出:

NULL
retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
BCU is working
Open the firewall
Firewall opened
SENDING UUTMonitor.exe /timeevent:PTEFIE
02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat
<BISCON Version=xxxx">
x
y
</BISCON>
\process\ProcessInit.bat:::Parsing branding variables from INI files...
found \flags\custom.ini
PRODUCTIONLOCK not defined in custom.ini
\process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data...
02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat
NULL
Get SkeletonPO from \working\ubera.ini

但是,如果你对此很认真:

  

我想提取seq调用和seq done之间的行

请注意,例如,行

retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt

不属于您的输出......您需要尽可能精确

注意:对于python 2.7,请更改此行

print (''.join(lines),end='')

这个:

print ''.join(lines)