Question

我正在尝试使用python 3和urllib从NGA.gov网站下载图像。

网站不以标准.jpg方式显示图片，但我收到错误。

import urllib.request
from bs4 import BeautifulSoup


try:
    with urllib.request.urlopen("http://images.nga.gov/?service=asset&action=show_preview&asset=33643") as url:
        s = url.read()

    soup = BeautifulSoup(s, 'html.parser') 


    img = soup.find("img")
    urllib.request.urlretrieve(img,"C:\art.jpg")

except Exception as e:
    print (e)

错误：某些字符无法解码，并被替换为REPLACEMENT CHARACTER。期望的字符串或类似字节的对象

有人可以请我为什么会收到此错误以及如何将图像传送到我的电脑。

Answer 1

BeautifulSoup是用于html / xml解析的库。在这个网址上你已经收到了图片，所以你想要解析什么？这项工作正常：urllib.request.urlretrieve("http://images.nga.gov/?service=asset&action=show_preview&asset=33643" ,"C:\art.jpg")

Answer 2

没有必要使用BeautifulSoup！只是做：

with urllib.request.urlopen("http://images.nga.gov/?service=asset&action=show_preview&asset=33643") as url:
    s = url.read()
with open("art.jpg", 'wb') as fp:
    fp.write(url.read())

如何从网站下载没有明确延伸的图像？

2 个答案: