Question

我有一个非常复杂的XML文档，我想要解析。以下是该XML的简化版本：

<file
    xmlns="http://www.namespace.co.il"
    Media="MetTTV"
    Date="2015-03-29"
    FileType="Consolidated"
    SchemaVersion="1.2">

    <H Id="1012532" W="2198.05">
        ///more tags
    </H>
    <H Id="623478" W="3215.05">
        ///more tags
    </H>
   etc.
</file>

我希望能够访问＆lt; H>标签，以便计算它们。

这是我的代码：

import import lxml.etree
tree=lxml.etree.parse(xml_file)
count=1 
for HH in tree.xpath('//H'):
   print count
   count=count+1

如果删除

，此代码可以正常工作

xmlns="http://www.namespace.co.il"

线。

但是，如果我不 - 它不会向控制台打印任何东西。

我尝试在许多组合中更改循环，例如

for HH in tree.xpath('//{http://www.namespace.co.il}H'):

或

ns={'nmsp':'http://www.namespace.co.il'}
for HH in tree.xpath('//nmsp:H', ns)

但似乎没有任何效果。

Answer 1

lxml＆＃39; s xpath方法需要一个名为namespaces的命名参数（关键字参数）。

findall方法类似，但有点不同（它不需要命名的namespaces参数，它适用于大括号内的命名空间URI。）

所有这些变体都有效：

for HH in tree.xpath('//nmsp:H', namespaces=ns):

for HH in tree.findall('//{http://www.namespace.co.il}H'):

for HH in tree.findall('//nmsp:H', namespaces=ns):

for HH in tree.findall('//nmsp:H', ns):

另见http://lxml.de/xpathxslt.html#xpath。

如何使用Python中的命名空间访问XML中的标记

1 个答案: