如何使用PIL或其他图像处理库下载this验证码图像,我尝试了几种方法,但无法下载图像。
from PIL import Image
import urllib2 as urllib
import io
fd = urllib.urlopen("https://notacarioca.rio.gov.br/senhaweb/CaptchaImage.aspx?guid=9759fc80-d385-480a-aa6e-8e00ef20be7b&s=1")
image_file = io.BytesIO(fd.read())
im = Image.open(image_file)
print im
答案 0 :(得分:0)
您尝试下载的图片没有静态网址。
这意味着您无法使用静态网址来引用图片(urllib.urlopen("https://notacarioca.rio.gov.br/senhaweb/CaptchaImage.aspx?guid=9759fc80-d385-480a-aa6e-8e00ef20be7b&s=1")
无效)。
以下是使用Requests和BeautifulSoup的解决方案:
import requests
from mimetypes import guess_extension
from bs4 import BeautifulSoup
from urllib.parse import urljoin
# from PIL import Image
# from io import BytesIO
s = requests.session()
r = s.get("https://notacarioca.rio.gov.br/senhaweb/login.aspx")
if r.status_code == 200:
soup = BeautifulSoup(r.content, "html.parser")
div = soup.find("div", attrs={"class": "captcha", "style": "color:Red;width:100%;"})
r = s.get(urljoin("https://notacarioca.rio.gov.br/senhaweb/", div.img["src"]))
if r.status_code == 200:
guess = guess_extension(r.headers['content-type'])
if guess:
with open("captcha" + guess, "wb") as f:
f.write(r.content)
# Image.open(BytesIO(r.content)).show()