使用 for 循环解析深度嵌套的 xml 文件

时间:2021-02-21 00:07:43

标签: python xml parsing xml-parsing elementtree

如何有效地从嵌套的 xml 中提取数据? 有效地,我的意思是例如使用 for 循环。 我需要使用新的数据结构吗?

解析函数:

<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope
    xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"
    xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <soapenv:Body>
        <ns:OTA_AirSeatMapRS Version="1"
            xmlns:ns="http://www.opentravel.org/OTA/2003/05/common/">
            <ns:Success/>
            <ns:SeatMapResponses>
                <ns:SeatMapResponse>
                    <ns:FlightSegmentInfo DepartureDateTime="2020-11-22T15:30:00" FlightNumber="1179">
                        <ns:DepartureAirport LocationCode="LAS"/>
                        <ns:ArrivalAirport LocationCode="IAH"/>
                        <ns:Equipment AirEquipType="739"/>
                    </ns:FlightSegmentInfo>
                    <ns:SeatMapDetails>
                        <ns:CabinClass Layout="AB EF" UpperDeckInd="false">
                            <ns:RowInfo CabinType="First" OperableInd="true" RowNumber="1">
                                <ns:SeatInfo BlockedInd="false" BulkheadInd="false" ColumnNumber="1" ExitRowInd="false" GalleyInd="false" GridNumber="1" PlaneSection="Left">
                                    <ns:Summary AvailableInd="false" InoperativeInd="false" OccupiedInd="false" SeatNumber="1A"/>
                                    <ns:Features>Window</ns:Features>
                                </ns:SeatInfo>

这是 XML 文件的一部分:

/usr/local/bin/geckodriver: 2: /usr/local/bin/geckodriver: YM9sHIiHMIi԰: not found
/usr/local/bin/geckodriver: 1: /usr/local/bin/geckodriver: Syntax error: end of file unexpected (expecting ")")
/usr/local/bin/geckodriver: 1: /usr/local/bin/geckodriver: 6: not found
/usr/local/bin/geckodriver: 1: /usr/local/bin/geckodriver: : not found
/usr/local/bin/geckodriver: 1: /usr/local/bin/geckodriver: : not found
/usr/local/bin/geckodriver: 1: /usr/local/bin/geckodriver: 6: not found
/usr/local/bin/geckodriver: 1: /usr/local/bin/geckodriver: $/: not found
/usr/local/bin/geckodriver: 1: /usr/local/bin/geckodriver: GNUu¬wת,岹PXH: not found
/usr/local/bin/geckodriver: 1: /usr/local/bin/geckodriver: ELF: not found
Exception in thread "main" org.openqa.selenium.WebDriverException: java.net.ConnectException: Failed to connect to localhost/0:0:0:0:0:0:0:1:29555
Build info: version: 'unknown', revision: 'unknown', time: 'unknown'
System info: host: 'raspy', ip: '192.168.1.33', os.name: 'Linux', os.arch: 'aarch64', os.version: '5.9.0-0.bpo.5-arm64', java.version: '11.0.9.1'
Driver info: driver.version: FirefoxDriver
        at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:92)
        at org.openqa.selenium.remote.RemoteWebDriver.execute(RemoteWebDriver.java:552)
        at org.openqa.selenium.remote.RemoteWebDriver.startSession(RemoteWebDriver.java:213)
        at org.openqa.selenium.remote.RemoteWebDriver.<init>(RemoteWebDriver.java:131)
        at org.openqa.selenium.firefox.FirefoxDriver.<init>(FirefoxDriver.java:147)
        at net.ddns.creepercrack.Main.main(Main.java:40)
Caused by: java.net.ConnectException: Failed to connect to localhost/0:0:0:0:0:0:0:1:29555
        at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:247)
        at okhttp3.internal.connection.RealConnection.connect(RealConnection.java:165)
        at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:257)
        at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
        at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
        at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
        at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
        at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:200)
        at okhttp3.RealCall.execute(RealCall.java:77)
        at org.openqa.selenium.remote.internal.OkHttpClient.execute(OkHttpClient.java:103)
        at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:105)
        at org.openqa.selenium.remote.ProtocolHandshake.createSession(ProtocolHandshake.java:74)
        at org.openqa.selenium.remote.HttpCommandExecutor.execute(HttpCommandExecutor.java:136)
        at org.openqa.selenium.remote.service.DriverCommandExecutor.execute(DriverCommandExecutor.java:83)
        ... 5 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
        at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
        at java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
        at java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
        at java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
        at java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
        at java.base/java.net.Socket.connect(Socket.java:609)
        at okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
        at okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:245)
        ... 27 more

我的最终目标是使用解析后的数据存储在 JSON 中。

1 个答案:

答案 0 :(得分:0)

查看代码中的以下说明:

for x in element.find(Service):

您的代码示例中的第一个缺陷是:

  • Service 是一个变量(不是字符串文字),
  • 可能你将此变量初始化为某个字符串,但失败了 将此说明放入您的代码示例中。

另一个缺陷的来源是 find 找到了 first 元素 匹配给定的路径,所以你不应该在循环中使用它。 也许你还应该检查 find 是否返回了一些 not-None 内容,但这是另一个细节。

你得到空输出的第三个原因是 print(x) 实际上只打印相关元素的文本

所以有一个更一般的例子,运行:

Service = 'Summary'
x = root.find(f'.//{Service}')
print(f'{x.tag}, {x.text}, {x.attrib}')

第一条指令设置标签名称。

第二条指令调用 find,但请注意我添加了 './/' 到 XPath,查看源 XML 树的任何深度

最后一条指令不仅打印找到的元素的文本, 还有标签名称和属性。

我得到的结果(对于您的输入 XML)是:

Summary, None, {'AvailableInd': 'false', 'InoperativeInd': 'false', 'OccupiedInd': 'false', 'SeatNumber': '1A'}

text 只是 None,所以您在原始文件中没有看到任何结果 输出)。