from urllib.request import Request, urlopen, urlretrieve
from bs4 import BeautifulSoup
def save_picture(self, word):
search_string = "https://www.google.nl/search?q={}&tbm=isch&tbs=isz:m".format(word)
request = Request(search_string, headers={'User-Agent': 'Mozilla/5.0'})
raw_website = urlopen(request).read()
soup = BeautifulSoup(raw_website, "html.parser")
image = soup.find("img").get("src")
urlretrieve(image, "{}.jpg".format(word))
我编写了上面的函数来保存Google Images中的第一个tumbnail图像。然而问题是,当我输入一个非ansii字时,它会失败,例如:mañana
错误消息来自urllib模块。我正在使用python 3.6
Traceback(最近一次调用最后一次):文件 “c:\ users \ xxx \ Desktop \ script.py”,第19行,in main()文件“c:\ users \ xxx \ Desktop \ script.py”,第16行,在main中 save_picture(“mañana”)文件“c:\ users \ xxx \ Desktop \ script.py”,第8行,在save_picture中 raw_website = urlopen(request).read()文件“C:\ Users \ xxx \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ urllib \ request.py”, 第223行,在urlopen中 return opener.open(url,data,timeout)文件“C:\ Users \ xxx \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ urllib \ request.py”, 第526行,公开 response = self._open(req,data)文件“C:\ Users \ xxx \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ urllib \ request.py”, 第544行,在_open '_open',req)文件“C:\ Users \ xxx \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ urllib \ request.py”, 第504行,在_call_chain中 result = func(* args)文件“C:\ Users \ xxx \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ urllib \ request.py”, 第1361行,在https_open中 context = self._context,check_hostname = self._check_hostname)文件“C:\ Users \ xxx \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ urllib \ request.py”, 第1318行,在do_open中 encode_chunked = req.has_header('Transfer-encoding'))文件“C:\ Users \ xxx \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ http \ client.py”, 第1239行,请求 self._send_request(method,url,body,headers,encode_chunked)文件 “C:\用户\ XXX \应用程序数据\本地\程序\ Python的\ Python36 \ LIB \ HTTP \ client.py” 第1250行,在_send_request中 self.putrequest(method,url,** skip)文件“C:\ Users \ xxx \ AppData \ Local \ Programs \ Python \ Python36 \ lib \ http \ client.py”, 第1117行,在putrequest self._output(request.encode('ascii'))UnicodeEncodeError:'ascii'编解码器无法编码16位的字符'\ xf1':序号不在 范围(128)
编辑:阅读后我发现这个任务有几个库,urllib,urllib2和requests(以及pip:urllib3)。我收到此错误是因为我使用的是折旧库吗?
edit2:添加了完整的追溯
答案 0 :(得分:0)
import requests
import mimetypes
from bs4 import BeautifulSoup
def save_picture(self, word):
search_string = "https://www.google.nl/search?q={}&tbm=isch&tbs=isz:m".format(word)
response = requests.get(search_string, headers={'User-Agent': 'Mozilla/5.0'})
#find the tumbnail for first hit
soup = BeautifulSoup(response.text, "html.parser")
image_location = soup.find("img").get("src")
# download image
image = requests.get(image_location)
content_type = image.headers.get('content-type')
ext = mimetypes.guess_extension(content_type)
with open(f"{word}{ext}", 'wb') as fd:
for chunk in image.iter_content(chunk_size=128):
fd.write(chunk)
我使用请求重写了该函数,它按预期处理unicode字符串。保存文件有点冗长