为什么我无法从XPath查询中检索URL?

时间:2014-10-24 13:37:36

标签: python xpath lxml

我有以下脚本,在页面上查找图像并下载:

from lxml import html
import urllib
import urllib2

url = 'http://www.example.com/pages/page0987/'
usock = urllib2.urlopen(url)
data = usock.read()
usock.close()

tree = html.fromstring(data)

src = tree.xpath('/html/body/div[2]/div[4]/div/div/img/@src')
urllib.urlretrieve(src, "local-filename.jpg")

我得到一个网页,访问此页面上的<img>元素(我使用XPath查询找到它),然后我获得此元素的src属性,然后尝试下载使用来自源的此URL的图像。

但是出了点问题; Python说:

Traceback (most recent call last):
  File "C:\Users\Sergey\Desktop\dlImg.py", line 15, in <module>
    urllib.urlretrieve(src, "local-filename.jpg")
  File "C:\Python27\lib\urllib.py", line 94, in urlretrieve
    return _urlopener.retrieve(url, filename, reporthook, data)
  File "C:\Python27\lib\urllib.py", line 228, in retrieve
    url = unwrap(toBytes(url))
  File "C:\Python27\lib\urllib.py", line 1060, in unwrap
    url = url.strip()
AttributeError: 'list' object has no attribute 'strip'

1 个答案:

答案 0 :(得分:2)

您的tree.xpath()查询会返回列表,而非一次匹配。至少是第一项的索引:

urllib.urlretrieve(src[0], "local-filename.jpg")

或在结果上使用循环。考虑到列表也可以为空(未找到匹配项)。