使用Python和&amp ;;从文本文件中提取XML片段。正则表达式

时间:2017-07-10 13:19:08

标签: python regex xml python-3.x

我有一个包含多个XML摘录的日志文件。它们主要是对SOAP服务的服务调用的请求和响应。我想使用正则表达式提取这些摘录,然后使用一些XML解析库解析它们。以下是日志文件的示例部分

DEBUG - GeronimoLog.debug(66) | GET MEX property org.apache.ode.bpel.myRoleSessionId = null
DEBUG - GeronimoLog.debug(66) | My-Role EPR not specified, SEP will not be used.
DEBUG - GeronimoLog.debug(66) | Axis2 sending message to http://localhost:8000/MagentoWS/services/InputsReceiver using MEX {PartnerRoleMex#hqejbhcnphrcdpwa4i059f [PID {ws.test}LogTestProc-558] calling org.apache.ode.bpel.epr.WSAEndpoint@4ced5499.receiveInputs(...)}
DEBUG - GeronimoLog.debug(66) | Message: <?xml version='1.0' encoding='utf-8'?><soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"><soapenv:Body><receiveInputs xmlns="http://InputsReceiver.magento.ws">
                            <impl:input xmlns:impl="http://InputsReceiver.magento.ws">
                                <sku>24-MB03</sku>
                                <customer_id>2</customer_id>
                                <first_name>Micheal</first_name>
                                <last_name>Bowman</last_name>
                                <qty>1</qty>
                                <shipping_method>ground</shipping_method>
                                <street>117 Park Ave</street>
                                <city>Newark</city>
                                <postcode>07104</postcode>
                                <country_id>US</country_id>
                                <country>United State</country>
                                <region_id>41</region_id>
                                <region_code>NJ</region_code>
                                <telephone>+123456789</telephone>
                                <email>micheal@bowman.com</email>
                                <base_currency_code>USD</base_currency_code>
                                <cc_cid>123</cc_cid>
                                <cc_owner>Micheal Bowman</cc_owner>
                                <cc_number>5105105105105100</cc_number>
                                <cc_type>MasterCard</cc_type>
                                <cc_exp_year>2019</cc_exp_year>
                                <cc_exp_month>5</cc_exp_month>
                                <payment_method>checkmo</payment_method>
                                <companyEmail>ourcompany@ourcompany.com</companyEmail>
                                <subject>Order Email</subject>
                                <body>Email body</body>
                                <PayerID>Micheal123</PayerID>
                            </impl:input>
                        </receiveInputs></soapenv:Body></soapenv:Envelope>
DEBUG - GeronimoLog.debug(66) | replyAsync mex=hqejbhcnphrcdpwa4i059f
DEBUG - GeronimoLog.debug(66) | Setting execution state on instance 177567
DEBUG - GeronimoLog.debug(66) | Sending stateful TO epr in message header using session null
DEBUG - GeronimoLog.debug(66) | Sending a message containing wsa endpoints in headers for session passing.

我想从<?xml...开始,到</soapenv:Envelope>结束时提取部分。但是我没有想出一个可以给我预期结果的正则表达式。请注意,文件中有多个此类XML摘录。

到目前为止我尝试过但失败的是以下regex模式

Message: \<.*\>\n
Message: (<.*>\n)+
<\?xml.*Envelope?
Message: (<.*>(\n))+

0 个答案:

没有答案