Question

我试图从网址上抓取stockcharts.com获取图表图片。例如来自：http://stockcharts.com/h-sc/ui?s=AMZN

然而，在检查有问题的元素时，它不是具有.jpg，.png等后缀的正确图像src。例如，上述链接中的元素是：http://stockcharts.com/c-sc/sc?s=AMZN&p=D&b=5&g=0&i=0&r=1479451634864

因此当我尝试在python 2.7中使用以下代码时，我在共享脚本的目录中得到一个空文件：

import urllib
url = "http://stockcharts.com/c-sc/sc?s=AMZN&p=D&b=5&g=0&i=0&r=1479451634864"
filename = "testimg.jpg"
urllib.urlretrieve(url, filename)

这是一个javascript呈现的页面，还是有些东西我不见了？对其他地方的引用？

Answer 1

网站检查User-Agent标题;它只允许特定的用户代理。

您需要更改标题以获取图像。否则，该站点将返回403 Forbidden响应。

urllib.urlretrieve不接受其他标头，您需要使用urllib2.urlopen / urllib2.Request来指定自定义标头并自行保存文件：

import urllib2

url = "http://stockcharts.com/c-sc/sc?s=AMZN&p=D&b=5&g=0&i=0&r=1479451634864"
filename = "sc.png"
req = urllib2.Request(url, headers={'User-Agent': 'Mozilla/5.0'})
u = urllib2.urlopen(req)
with open(filename, 'wb') as f:
    f.write(u.read())

使用python对图像进行webscraping但无法找到图像

1 个答案: