Question

我正在尝试解析此XML。这是YouTube Feed。我的工作基于tutorial中的代码。我希望获得嵌套在entry下的所有feed个节点。

from lxml import etree
root = etree.fromstring(text)
entries = root.xpath("/feed/entry")
print entries

由于某种原因，entries是一个空列表。为什么呢？

Answer 1

feed及其所有子节点实际上位于http://www.w3.org/2005/Atom命名空间中。你需要告诉你的xpath：

entries = root.xpath("/atom:feed/atom:entry", 
                     namespaces={'atom': 'http://www.w3.org/2005/Atom'})

或者，如果要更改默认的空命名空间：

entries = root.xpath("/feed/entry", 
                     namespaces={None: 'http://www.w3.org/2005/Atom'})

或者，如果你根本不想使用短句：

entries = root.xpath("/{http://www.w3.org/2005/Atom}feed/{http://www.w3.org/2005/Atom}entry")

据我所知，对于您正在使用的节点，隐式假设“本地命名空间”，以便对同一命名空间中的子节点的操作不需要您再次设置它。所以你应该能够按照以下方式做点什么：

feed = root.find("/atom:feed",
                     namespaces={'atom': 'http://www.w3.org/2005/Atom'})

title = feed.xpath("title")
entries = feed.xpath("entries")
# etc...

Answer 2

这是因为XML中的命名空间。以下是解释：http://www.edankert.com/defaultnamespaces.html#Conclusion。

为什么这个xpath表达式返回一个空列表？

2 个答案: