我有一个包含多个XML摘录的日志文件。它们主要是对SOAP服务的服务调用的请求和响应。我想使用正则表达式提取这些摘录,然后使用一些XML解析库解析它们。以下是日志文件的示例部分
DEBUG - GeronimoLog.debug(66) | GET MEX property org.apache.ode.bpel.myRoleSessionId = null
DEBUG - GeronimoLog.debug(66) | My-Role EPR not specified, SEP will not be used.
DEBUG - GeronimoLog.debug(66) | Axis2 sending message to http://localhost:8000/MagentoWS/services/InputsReceiver using MEX {PartnerRoleMex#hqejbhcnphrcdpwa4i059f [PID {ws.test}LogTestProc-558] calling org.apache.ode.bpel.epr.WSAEndpoint@4ced5499.receiveInputs(...)}
DEBUG - GeronimoLog.debug(66) | Message: <?xml version='1.0' encoding='utf-8'?><soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"><soapenv:Body><receiveInputs xmlns="http://InputsReceiver.magento.ws">
<impl:input xmlns:impl="http://InputsReceiver.magento.ws">
<sku>24-MB03</sku>
<customer_id>2</customer_id>
<first_name>Micheal</first_name>
<last_name>Bowman</last_name>
<qty>1</qty>
<shipping_method>ground</shipping_method>
<street>117 Park Ave</street>
<city>Newark</city>
<postcode>07104</postcode>
<country_id>US</country_id>
<country>United State</country>
<region_id>41</region_id>
<region_code>NJ</region_code>
<telephone>+123456789</telephone>
<email>micheal@bowman.com</email>
<base_currency_code>USD</base_currency_code>
<cc_cid>123</cc_cid>
<cc_owner>Micheal Bowman</cc_owner>
<cc_number>5105105105105100</cc_number>
<cc_type>MasterCard</cc_type>
<cc_exp_year>2019</cc_exp_year>
<cc_exp_month>5</cc_exp_month>
<payment_method>checkmo</payment_method>
<companyEmail>ourcompany@ourcompany.com</companyEmail>
<subject>Order Email</subject>
<body>Email body</body>
<PayerID>Micheal123</PayerID>
</impl:input>
</receiveInputs></soapenv:Body></soapenv:Envelope>
DEBUG - GeronimoLog.debug(66) | replyAsync mex=hqejbhcnphrcdpwa4i059f
DEBUG - GeronimoLog.debug(66) | Setting execution state on instance 177567
DEBUG - GeronimoLog.debug(66) | Sending stateful TO epr in message header using session null
DEBUG - GeronimoLog.debug(66) | Sending a message containing wsa endpoints in headers for session passing.
我想从<?xml...
开始,到</soapenv:Envelope>
结束时提取部分。但是我没有想出一个可以给我预期结果的正则表达式。请注意,文件中有多个此类XML摘录。
到目前为止我尝试过但失败的是以下regex
模式
Message: \<.*\>\n
Message: (<.*>\n)+
<\?xml.*Envelope?
Message: (<.*>(\n))+