我有一些包含1000多行的文本文件。它包含以下格式的一些行:
seq open @ 2018/02/26 23:07:51 node: \nodes\wroot.nod (wroot)
seq call @ 2018/02/26 23:07:51 node: ttt
retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
BCU is working
seq done @ 2018/02/26 23:07:55 node:ttt
seq call @ 2018/02/26 23:07:55 node: fff
Open the firewall
Firewall opened
seq done @ 2018/02/26 23:07:57 node: fff
seq call @ 2018/02/26 23:07:57 node: \nodes\wchkefierror.bat (wroot#9^wchkefierror)
seq done @ 2018/02/26 23:07:57 node: \nodes\wchkefierror.bat (wroot#9^wchkefierror)
seq call @ 2018/02/26 23:07:57 node: \nodes\wuutmont.bat PTEFIE (wroot#12^wuutmont)
SENDING UUTMonitor.exe /timeevent:PTEFIE
seq done @ 2018/02/26 23:07:58 node: \nodes\wuutmont.bat PTEFIE (wroot#12^wuutmont)
seq call @ 2018/02/26 23:07:58 node: \nodes\wProcessInit.bat (wroot#13^wProcessInit)
02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat
<BISCON Version=xxxx">
x
y
</BISCON>
\process\ProcessInit.bat:::Parsing branding variables from INI files...
found \flags\custom.ini
PRODUCTIONLOCK not defined in custom.ini
\process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data...
02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat
seq done @ 2018/02/26 23:08:04 node: \nodes\wProcessInit.bat (wroot#13^wProcessInit)
seq log @ 2018/02/26 23:08:04 node: skipping wroot#14^wbios as \flags\bios_flash_wnd.trg file not exists
seq call @ 2018/02/26 23:08:04 node: aaa
Get SkeletonPO from \working\ubera.ini
seq done @ 2018/02/26 23:08:04 node: aaa
我想在列表中提取seq调用和seq之间的行,如果行以seq open或seq log开头,则在列表中插入NULL。
正如你所看到的,可能有任何随机的没有。 seq call和seq done之间的行甚至为0。我一直试图找到答案但无济于事。我也是python的新手。
上述样本的预期输出:
NULL
retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
BCU is working
Open the firewall
Firewall opened
NULL
SENDING UUTMonitor.exe /timeevent:PTEFIE
02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat
<BISCON Version=xxxx">
x
y
</BISCON>
\process\ProcessInit.bat:::Parsing branding variables from INI files...
found \flags\custom.ini
PRODUCTIONLOCK not defined in custom.ini
\process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data...
02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat
NULL
Get SkeletonPO from \working\ubera.ini
答案 0 :(得分:1)
这是一种快速而肮脏的方法来获得你想要的东西:
def extractTxt(fpth, joinchar=' '):
loglines = []
with open(fpth) as f:
incall = False
calllines = []
for line in f:
if line.startswith('seq open') or line.startswith('seq log'):
loglines.append('NULL')
elif line.startswith('seq call'):
incall = True
elif incall:
if line.startswith('seq done'):
incall = False
call = joinchar.join(l for l in calllines if l)
calllines = []
if not call.strip():
loglines.append('NULL')
else:
loglines.append(call)
else:
calllines.append(line.strip())
return loglines
extractTxt('seq.txt')
输出:
['NULL',
'retrieve BIOS data using F:\\tools64\\BiosConfigUtility64.exe /GetConfig:\\working\\bcudump.txt BCU is working',
'Open the firewall Firewall opened',
'NULL',
'SENDING UUTMonitor.exe /timeevent:PTEFIE',
'02/26/2018 23:07:59 : @@@@ begin_\\process\\ProcessInit.bat <BISCON Version=xxxx"> x y </BISCON> \\process\\ProcessInit.bat:::Parsing branding variables from INI files... found \\flags\\custom.ini PRODUCTIONLOCK not defined in custom.ini \\process\\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data... 02/26/2018 23:08:04 : @@@@ end\\process\\ProcessInit.bat',
'NULL',
'Get SkeletonPO from \\working\\ubera.ini']
您可以通过将不同的joinchar
参数传递给extractTxt
来更改每个调用中单独行在列表条目中的连接方式。我将把任何进一步的造型/组织任务留作练习。
该行:
call = joinchar.join(l for l in calllines if l)
做了几件不同的事情。 join
method将使用其前面的字符串将字符串列表连接在一起。例如,以下表达式:
', '.join(['foo', 'bar', 'baz', 'bof'])
将产生此输出:
'foo, bar, baz, bof'
括号内的部分行:
l for l in calllines if l
是一种称为generator expression的东西。解释起来有点复杂,但基本上它在这里所做的就是在calllines
中创建一个非空的所有行的“列表”。如果您感到好奇,请查看链接页面以获取更多详细信息。您可以通过扩展来简化线路。总之,以下几行:
call = ''
for l in calllines:
# l evaluates to False if it is empty
if l:
call += l + joinchar
# remove any trailing joinchar
if call.endswith(joinchar):
call = call[:-len(joinchar)]
与单行call = joinchar.join(l for l in calllines if l)
具有相同的效果。
答案 1 :(得分:0)
import re
begins_with_open_or_log=re.compile(r'seq open|seq log')
begins_with_call_and_done=re.compile(r'seq call|seq done')
with open('log.txt') as f:
lines=f.readlines()
i=0
for line in lines:
if re.match(begins_with_open_or_log,line):
lines[i]='NULL\n'
elif re.match(begins_with_call_and_done,line):
lines[i]=''
elif line=='\n':
lines[i]=''
i+=1
print (''.join(lines),end='')
我想在列表中提取seq调用和seq之间的行,如果行以seq open或seq log开头,则在列表中插入NULL。
这可能是您想要的输出:
NULL
retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
BCU is working
Open the firewall
Firewall opened
SENDING UUTMonitor.exe /timeevent:PTEFIE
02/26/2018 23:07:59 : @@@@ begin_\process\ProcessInit.bat
<BISCON Version=xxxx">
x
y
</BISCON>
\process\ProcessInit.bat:::Parsing branding variables from INI files...
found \flags\custom.ini
PRODUCTIONLOCK not defined in custom.ini
\process\ProcessInit.bat:::Calling SETVAR.BAT generated from INI data...
02/26/2018 23:08:04 : @@@@ end\process\ProcessInit.bat
NULL
Get SkeletonPO from \working\ubera.ini
但是,如果你对此很认真:
我想提取seq调用和seq done之间的行
请注意,例如,行
retrieve BIOS data using F:\tools64\BiosConfigUtility64.exe /GetConfig:\working\bcudump.txt
不属于您的输出......您需要尽可能精确
注意:对于python 2.7,请更改此行
print (''.join(lines),end='')
这个:
print ''.join(lines)