我有一个包含SOAP请求/响应条目的日志:
[2015-02-03 19:05:13] TIME:03.02.2015 19:05:13,
RAW_REQUEST:<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns1="pay_parent" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns2="providers"><SOAP-ENV:Body><!-- ... -->
</SOAP-ENV:Body></SOAP-ENV:Envelope>
,
uid:0de7d51a-abb6-11e4-a436-005056936d96,
===
我想将所有xmls提取到一个大的xml文件中(提取块并用root ...标记包装)。但我也需要一个日志记录日期。
我想(我可以用手添加root xmlns属性)来实现相同的结果:
<Records xmlns="" ...>
<Record datetime="2015-02-03 19:05:13">
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns1="pay_parent" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns2="providers"><SOAP-ENV:Body>
<!-- Other xml data -->
</SOAP-ENV:Body></SOAP-ENV:Envelope>
</Record>
...
</Records>
答案 0 :(得分:1)
您可以使用awk
执行此操作 例如,创建一个名为awkscript
的文件并添加以下代码
BEGIN{print "\n<Records xmlns=\""}
$0~/^\[[0-9]{1,4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\]/{
print "\t<Record datetime=\"" substr($1,2,19),substr($3,1)"\">"
getline
while ($0!~/^\[[0-9]{1,4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2}\]/ && $0!~/^<\/*SOAP-ENV:.*/){getline}
while($0~/^<\/*SOAP-ENV:.*/){print "\t\t" $0;getline};{print "\t </Record>"}}
END{print "<\/Records>"}
在shell中运行带有文件的脚本
awk -f path_to_awkscript path_to_xml_file > path_to_new_file
示例强>
将脚本与包含以下数据的xml文件一起使用
[2015-02-03 19:05:13] TIME:03.02.2015 19:05:13,
RAW_REQUEST:<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns1="pay_parent" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns2="providers"><SOAP-ENV:Body><!-- ... -->
</SOAP-ENV:Body></SOAP-ENV:Envelope>
,
uid:0de7d51a-abb6-11e4-a436-005056936d96,
===
[2014-11-03 19:05:13] TIME:03.02.2015 19:05:13,
RAW_REQUEST:<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns1="pay_parent" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns2="providers"><SOAP-ENV:Body><!-- ... -->
</SOAP-ENV:Body></SOAP-ENV:Envelope>
,
uid:0de7d51a-abb6-11e4-a436-005056936d96,
===
[2014-12-15 19:05:13] TIME:03.02.2015 19:05:13,
RAW_REQUEST:<?xml version="1.0" encoding="UTF-8"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns1="pay_parent" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns2="providers"><SOAP-ENV:Body><!-- ... -->
</SOAP-ENV:Body></SOAP-ENV:Envelope>
,
uid:0de7d51a-abb6-11e4-a436-005056936d96,
===
</SOAP-ENV:Body></SOAP-ENV:Envelope>
<强>结果
<Records xmlns="
<Record datetime="2015-02-03 TIME:03.02.2015">
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns1="pay_parent" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns2="providers"><SOAP-ENV:Body><!-- ... -->
</SOAP-ENV:Body></SOAP-ENV:Envelope>
</Record>
<Record datetime="2014-11-03 TIME:03.02.2015">
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns1="pay_parent" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns2="providers"><SOAP-ENV:Body><!-- ... -->
</SOAP-ENV:Body></SOAP-ENV:Envelope>
</Record>
<Record datetime="2014-12-15 TIME:03.02.2015">
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/" xmlns:ns1="pay_parent" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:ns2="providers"><SOAP-ENV:Body><!-- ... -->
</SOAP-ENV:Body></SOAP-ENV:Envelope>
</Record>
</Records>
答案 1 :(得分:0)
我找不到像grep或sed这样的linux控制台工具的解决方案。 所以我写了一个python脚本。
import sys
import re
def write_xml_log(out_path, lines):
u"""
Joins xml chunks into one document.
"""
out_fh = open(out_path, 'w+')
out_fh.write('<?xml version="1.0" encoding="UTF-8"?>\n')
out_fh.write('<LogRecords>\n')
out_fh.writelines((
'<LogRecord>\n{}\n</LogRecord>\n'.format(line) for line in lines))
out_fh.write('</LogRecords>')
out_fh.close()
def prepare_xml_chunks(log_path):
u"""
Prepares xml-chunks.
"""
log_fh = open(log_path)
record_date_re = re.compile('^\[(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})\]')
envelope_start_re = re.compile('(<(?:[\w_-]+:)?Envelope)(.*)$')
envelope_end_re = re.compile('(.*</(?:[\w_-]+:)?Envelope>)')
envelope_complete_re = re.compile(
'(<(?:[\w_-]+:)?Envelope)(.*?>.*?</(?:[\w_-]+:)?Envelope>)')
record_date = ''
record_envelope = ''
state_in_envelope = False
for line in log_fh:
match_date = record_date_re.match(line)
match_envelope_start = envelope_start_re.match(line)
match_envelope_end = envelope_end_re.match(line)
match_envelope_complete = envelope_complete_re.match(line)
if match_date:
record_date = match_date.group(1)
if not state_in_envelope:
# One-line envelope
if match_envelope_complete:
state_in_envelope = False
record_envelope = ''
yield '{} datetime="{}" {}\n'.format(
match_envelope_complete.group(1),
record_date,
match_envelope_complete.group(2))
# Multi-line envelope start.
elif match_envelope_start:
state_in_envelope = True
record_envelope = '{} datetime="{}" {}\n'.format(
match_envelope_start.group(1),
record_date,
match_envelope_start.group(2))
# Problem situation.
elif match_envelope_end:
raise Exception('Envelope close tag without open tag.')
else:
# Multi-line envelope continue.
if not match_envelope_end:
record_envelope += line
# Multi-line envelope end.
else:
record_envelope += match_envelope_end.group(1)
yield '{}\n'.format(record_envelope)
record_envelope = ''
state_in_envelope = False
log_fh.close()
write_xml_log(sys.argv[2], prepare_xml_chunks(sys.argv[1]))