Question

所以我有以下XML文档它更长：

<?xml version ="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE fmresultset PUBLIC "-//FMI//DTD fmresultset//EN" "http://localhost:16020/fmi/xml/fmresultset.dtd">
<fmresultset xmlns="http://www.filemaker.com/xml/fmresultset" version="1.0">
<error code="0">
</error>
<product build="11/11/2014" name="FileMaker Web Publishing Engine" version="13.0.5.518">
</product>

我使用以下python来提取一些标记名称：

doc = etree.fromstring(resulttxt)
print( doc.attrib)
print(doc.tag)
print(doc[4][0][0].tag)
if(doc[4][0][0].tag == 'field'):
    print 'hi'

我得到的是：

{'version': '1.0'}
{http://www.filemaker.com/xml/fmresultset}fmresultset
{http://www.filemaker.com/xml/fmresultset}field

xmlns不会显示为根标记的属性，但它存在。

它被放置在每个标签名称的前面，这使得难以循环并使用条件。我希望doc.tag只显示标记而不是命名空间和标记。

对于我来说这是第1天。有人可以帮忙吗？

Answer 1

您需要处理namespaces ，在您的情况下需要empty one：

from lxml import etree as ET

data = """<?xml version ="1.0" encoding="UTF-8" standalone="no" ?>
<!DOCTYPE fmresultset PUBLIC "-//FMI//DTD fmresultset//EN" "http://localhost:16020/fmi/xml/fmresultset.dtd">
<fmresultset xmlns="http://www.filemaker.com/xml/fmresultset" version="1.0">
    <error code="0">
    </error>
    <product build="11/11/2014" name="FileMaker Web Publishing Engine" version="13.0.5.518">
    </product>
</fmresultset>
"""

namespaces = {
  "myns": "http://www.filemaker.com/xml/fmresultset"
}

tree = ET.fromstring(data)
print tree.find("myns:product", namespaces=namespaces).attrib.get("name")

打印：

FileMaker Web Publishing Engine

我想使用lxml删除花括号和XML命名空间，只需报告标记名称

1 个答案: