我不确定如何将抓取的图像保存到桌面上的文件中。
我正在尝试从代码中列出的站点下载图像,但是我只知道诸如导入BeautifulSoup和Request之类的基础知识。我不明白一切意味着什么。
from bs4 import BeautifulSoup
import urllib.request as request
folder = r'C:\Users\rlook\Desktop\scrape' + '\\'
url = "https://www.butterfliesofamerica.com/t/Phocides_belus_a.htm"
response = request.urlopen(url)
soup = BeautifulSoup(response, 'html.parser')
for res in soup.findAll('img')
我可以在其他站点上遵循一些代码,但是不能使其达到我的目的。enter code here
from urllib.request as request
from bs4 import BeautifulSoup
folder = r'C:\Users\rlook\Desktop\scrape' + '\\'
URL ='https://www.butterfliesofamerica.com/t/Phocides_belus_a.htm'
response = request.urlopen(URL)
soup = BeautifulSoup(response, 'html.parser')
iconTable = soup.find('a', {'class' : 'y'})
request.urlretrieve(icon.img['src'], folder + icon.img['alt'] + '.jpg')
答案 0 :(得分:0)
我使用请求和beautifulsoup(以及re和closeil)来获取完整尺寸的图像,而不仅仅是缩略图。您必须在命令行中pip install requests
和pip install bs4
才能使该解决方案生效。
import requests, re, shutil
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/76.0.3809.132 Safari/537.36',
}
base_url = 'https://www.butterfliesofamerica.com'
all_imgs = requests.get(base_url + '/t/Phocides_belus_a.htm', headers=headers)
parsed_imgs = BeautifulSoup(all_imgs.text, 'html.parser')
img_hrefs = [img['href'] for img in parsed_imgs.find_all('a', class_='y')]
for img_href in img_hrefs:
real_img_href = img_href.replace('..', base_url)
image_page = requests.get(real_img_href, headers=headers)
page_soup = BeautifulSoup(image_page.text, 'html.parser')
source_image = page_soup.find('img')['src']
img_name = re.search(r'/([\w\-\.]+?\.(?:jpg|JPG))', source_image).group(1)
img = requests.get(base_url + source_image, stream=True, headers=headers)
with open(img_name, 'wb') as img_file:
shutil.copyfileobj(img.raw, img_file)
print(img_name, ' found')