Question

我有多个具有内联架构的XML文件。我尝试使用python解析xml数据，但这确实给了我任何结果。

我想获取元素的值

ogrid_cde 角色

所有Ogridroles标签中的

<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <xsd:schema targetNamespace="urn:schemas-microsoft-com:sql:SqlRowSet1" xmlns:schema="urn:schemas-microsoft-com:sql:SqlRowSet1" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:sqltypes="http://schemas.microsoft.com/sqlserver/2004/sqltypes" elementFormDefault="qualified">
        <xsd:import namespace="http://schemas.microsoft.com/sqlserver/2004/sqltypes" schemaLocation="http://schemas.microsoft.com/sqlserver/2004/sqltypes/sqltypes.xsd"/>
        <xsd:element name="ogridroles">
            <xsd:complexType>
                <xsd:sequence>
                    <xsd:element name="ogrid_cde" type="sqltypes:int" nillable="1"/>
                    <xsd:element name="role" nillable="1">
                        <xsd:simpleType>
                            <xsd:restriction base="sqltypes:char" sqltypes:localeId="1033" sqltypes:sqlCompareOptions="IgnoreCase IgnoreKanaType IgnoreWidth" sqltypes:sqlSortId="52">
                                <xsd:maxLength value="1"/>
                            </xsd:restriction>
                        </xsd:simpleType>
                    </xsd:element>
                </xsd:sequence>
            </xsd:complexType>
        </xsd:element>
    </xsd:schema>
    <ogridroles xmlns="urn:schemas-microsoft-com:sql:SqlRowSet1">
        <ogrid_cde>28</ogrid_cde>
        <role>T</role>
    </ogridroles>
    <ogridroles xmlns="urn:schemas-microsoft-com:sql:SqlRowSet1">
        <ogrid_cde>75</ogrid_cde>
        <role>T</role>
    </ogridroles>
    <ogridroles xmlns="urn:schemas-microsoft-com:sql:SqlRowSet1">
        <ogrid_cde>93</ogrid_cde>
        <role>O</role>
    </ogridroles>
    <ogridroles xmlns="urn:schemas-microsoft-com:sql:SqlRowSet1">
        <ogrid_cde>135</ogrid_cde>
        <role>O</role>
    </ogridroles>
</root>

Python代码

import xml.etree.ElementTree as ET

tree = ET.parse('ogridroles.xml')
root = tree.getroot()

for a in root.findall('{urn:schemas-microsoft-com:sql:SqlRowSet1}ogridroles'):
    print (a.attrib)

Answer 1

除非我弄错了，否则将返回一个xmlns值作为标记，而不是作为元素。尝试改为检查标签。

Answer 2

此代码有效。

import xml.etree.ElementTree as ET

tree = ET.parse('ogridroles.xml')
root = tree.getroot()

for child in root:
    print(child[0].text, "==", child[1].text)

多亏了笑维吉尔的线索

Answer 3

对于大文件，将lxml.etree.iterparse（）与zipfile模块结合使用可直接从压缩文件中读取。这将返回必须显式打开的迭代器，以防止将整个文件读入内存。

from lxml import etree
from zipfile import ZipFile

zipped_file = ZipFile(<your file>, 'r')
for event, element in etree.iterparse(zipped_file.open(<filename inside zip>)):
    for item in [ogrid_cde, role]:
        if item in element.tag:
            print('{}: {}'.format(item, element.text))

这应该打印出字段名称（与元素标签匹配的项目）和值（从元素中检索的文本）对。

在python中使用内联架构读取XML

3 个答案: