如何抓取图像并保存到文件

时间:2019-09-05 03:44:54

标签: python image screen-scraping

我不确定如何将抓取的图像保存到桌面上的文件中。

我正在尝试从代码中列出的站点下载图像,但是我只知道诸如导入BeautifulSoup和Request之类的基础知识。我不明白一切意味着什么。

from bs4 import BeautifulSoup
import urllib.request as request


folder = r'C:\Users\rlook\Desktop\scrape' + '\\'
url = "https://www.butterfliesofamerica.com/t/Phocides_belus_a.htm"
response = request.urlopen(url)
soup = BeautifulSoup(response, 'html.parser')
for res in soup.findAll('img')

我可以在其他站点上遵循一些代码,但是不能使其达到我的目的。enter code here

from urllib.request as request
from bs4 import BeautifulSoup

folder = r'C:\Users\rlook\Desktop\scrape' + '\\'
URL ='https://www.butterfliesofamerica.com/t/Phocides_belus_a.htm'
response = request.urlopen(URL)
soup = BeautifulSoup(response, 'html.parser') 

iconTable = soup.find('a', {'class' : 'y'})

request.urlretrieve(icon.img['src'], folder + icon.img['alt'] + '.jpg')

1 个答案:

答案 0 :(得分:0)

我使用请求和beautifulsoup(以及re和closeil)来获取完整尺寸的图像,而不仅仅是缩略图。您必须在命令行中pip install requestspip install bs4才能使该解决方案生效。

代码

import requests, re, shutil
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36',
}
base_url = 'https://www.butterfliesofamerica.com'
all_imgs = requests.get(base_url + '/t/Phocides_belus_a.htm', headers=headers)
parsed_imgs = BeautifulSoup(all_imgs.text, 'html.parser')

img_hrefs = [img['href'] for img in parsed_imgs.find_all('a', class_='y')]
for img_href in img_hrefs:
    real_img_href = img_href.replace('..', base_url)
    image_page = requests.get(real_img_href, headers=headers)

    page_soup = BeautifulSoup(image_page.text, 'html.parser')
    source_image = page_soup.find('img')['src']
    img_name = re.search(r'/([\w\-\.]+?\.(?:jpg|JPG))', source_image).group(1)

    img = requests.get(base_url + source_image, stream=True, headers=headers)
    with open(img_name, 'wb') as img_file:
        shutil.copyfileobj(img.raw, img_file)
        print(img_name, ' found')