Question

我正在尝试在django中使用lxml.etree.parse和tree.xpath来解析来自外部RSS Feed的一些内容。但由于某种原因，我无法得到任何结果。我之前能够在其他xml文件上成功使用以下方法，但似乎对此有困难。

以下是我试图从中获取的xml文件：

<feed xmlns="http://www.w3.org/2005/Atom">
    <title>Open Library : Author Name</title>
    <link href="http://www.somedomain.org/people/atom/author_name" rel="self"/>
    <updated>2012-03-20T16:41:00Z</updated>
    <author>
        <name>somedomain.org</name>
    </author>
    <id>tag:somedomain.org,2007:/person_feed/123456</id>
    <entry>
        <link href="http://www.somedomain.org/roll_call/show/1234" rel="alternate"/>
        <id>
        tag:somedomain.org,2012-03-20:/roll_call_vote/1234
        </id>
        <updated>2012-03-20T16:41:00Z</updated>
        <title>Once upon a time</title>
        <content type="html">
        This os a book full of words
        </content>
    </entry>
</feed>

以下是我在django中的观点：

def openauthors(request):

    tree = lxml.etree.parse("http://www.somedomain.org/people/atom/author_name")
    listings = tree.xpath("//author")

    listings_info = []

    for listing in listings:
        this_value = {
            "name":listing.findtext("name"),
            }

        listings_info.append(this_value)


    json_listings = '{"listings":' + simplejson.dumps(listings_info) + '}'

    if("callback" in request.GET.keys()):
        callback = request.GET["callback"]
    else:
        callback = None

    if(callback):
        response = HttpResponse("%s(%s)" % (
                callback,
                simplejson.dumps(listings_info)
                ), mimetype="application/json"
            )
    else:
        response = HttpResponse(json_listings, mimetype="application/json")
    return response

我也试过以下一些路径，希望它们可能会有所帮助但却没有成功。

    listings = tree.xpath("feed/author")
    listings = tree.xpath("/feed/author")
    listings = tree.xpath("/author")
    listings = tree.xpath("author")

任何正确方向的帮助都将受到赞赏。

Answer 1

问题可能与命名空间有关。 lxml模块在标记名称的开头添加名称空间名称，因此问题可能是xpath表达式与此名称空间前缀不匹配。如果你遍历查看标签名称的元素，并得到类似的东西，那就是问题所在：

>>> for element in tree:
...     element
[...]
<Element {http://www.w3.org/2005/Atom}author at 7f14e75d1788>
[...]

在标记名“author”之前查看前缀“{http://www.w3.org/2005/Atom}”。如果是这样，请看一下：

Need Help using XPath in ElementTree 在这里：

python: xml.etree.ElementTree, removing "namespaces"

并查看官方文档，因为可能有一个没有名称空间前缀的解析选项。

GL

为什么我在django中使用lxml.etree.parse和tree.xpath没有得到任何结果？

1 个答案: