Question

我如何用python解析这个网站（http://www.tvspielfilm.de/tv-programm/rss/heute2015.xml）以获取今天20:15的SAT电视节目？我已经尝试过Python库lxml.etree，但是我失败了：

#!/usr/bin/python
import lxml.etree as ET 
import urllib2

response = urllib2.urlopen('http://www.tvspielfilm.de/tv-programm/rss/heute2015.xml')
xml = response.read()

root = ET.fromstring(xml)

for item in root.findall('SAT'):
    title = item.find('title').text
    print title

Answer 1

方法Element.findall使用xpath表达式作为参数。 'SAT'只找到名为SAT的根节点的直接子节点，其为'rss'。如果您需要在文档中找到标签，请使用'.//SAT'。

表达式'.//items'正是您所寻求的：

#!/usr/bin/python
import lxml.etree as ET 
import urllib2

response = urllib2.urlopen('some/url/to.xml')
xml = response.read()

root = ET.fromstring(xml)

for item in root.findall('.//item'):
    title = item.find('title').text
    print title

使用lxml.etree在Python中解析XML

1 个答案: