如何从网页本地保存图像

时间:2017-08-18 11:27:37

标签: python selenium-webdriver web-scraping

如何从以下链接下载图像

https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?SceneView&ImageID=247572955&Version=-1

我试过的代码

import urllib.request
import sys
import shutil
imglink = "https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?ImageView&ImageID=247247011&Desc=Front%2FLeft+Oblique&Title=Vehicle+1+-+Frontleftoblique&Version=0&Extend=jpg"
savelink = "C:/Users/VM82958/Desktop/Nass_Extract/abcd.jpg"

if sys.version_info[0] < 3:
   with urllib.urlopen(imglink) as response, open(savelink, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)
else:
    with urllib.request.urlopen(imglink) as response, open(savelink, 'wb') as out_file:
    shutil.copyfileobj(response, out_file)

图像未下载仅下载1kb文件。

任何帮助,请

1 个答案:

答案 0 :(得分:0)

您从imglink获得的回复不是图片文件,而是显示图片的HTML页面。

<script type="text/javascript">
function init()
{
 document.getElementById("loading").src = "GetBinary.aspx?Image&ImageID=247247011&CaseID=&Version=0";
 document.getElementById("loading").onload = ";"
}
</script>
<body>
<table width="100%">
<tr><td align="center" style="font-size:large">Images may not be to scale.</td></tr>
<tr><td align="center">Vehicle 1 - Frontleftoblique</td></tr>
<tr><td align="center">Front/Left Oblique</td></tr>
<tr><td align="center"><img onload="javascript:init();" id="loading" width="640px"  heigth="480px" src="img/loading.gif"/></td></tr>
<tr><td align="center">Image ID: 247247011</td></tr>
<tr><td align="center"><a href='javascript:close()'>Close</a></td></tr>
</table>
</body>

图片的实际网址为https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?Image&ImageID=247247011&CaseID=&Version=0,以便让JavaScript运行并将图片的实际位置插入img标记,您最有可能需要使用{{3}然后Selenium解析HTML。

这是一个下载图像的脚本(虽然不使用Selenium),所以你可以看到一种方法。

import urllib
import sys

imglink = "https://www-nass.nhtsa.dot.gov/nass/cds/GetBinary.aspx?Image&ImageID=247247011&CaseID=&Version=0"
savelink = "C:/Users/John/Desktop/abcd.jpg"

with open(savelink, 'wb') as out_file:
    response = urllib.urlopen(imglink)
    out_file.write(response.read())