我想抓取任何网站并仅下载图像。但是,使用以下代码,该程序甚至可以下载img标签中存在的gif。我该如何选择仅下载png和jpeg?
def fetch_url():
url = _url.get()
config['images'] = []
_images.set(())
try:
page = requests.get(url)
except requests.RequestException as rex:
_sb(str(rex))
else:
soup = BeautifulSoup(page.content, 'html.parser')
images = fetch_images(soup, url)
if images:
_images.set(tuple(img['name'] for img in images))
_sb('Images found: {}'.format(len(images)))
else:
_sb('No images found!.')
config['images'] = images
def fetch_images(soup, base_url):
images = []
for img in soup.findAll('img'):
src = img.get('src')
img_url = ('{base_url}/{src}'.format(base_url=base_url, src=src))
name = img_url.split('/')[-1]
images.append(dict(name=name, url=img_url))
return images
答案 0 :(得分:0)
您是否尝试仅添加所需的格式?
def fetch_images(soup, base_url):
images = []
for img in soup.findAll('img'):
src = img.get('src')
img_url = ('{base_url}/{src}'.format(base_url=base_url, src=src))
name = img_url.split('/')[-1]
if name[-3:] == "png" or name[-3:] == "jpg" or name[-4:] == "jpeg": ### <- here
images.append(dict(name=name, url=img_url))
return images
答案 1 :(得分:0)
我要查找以.jpeg
或.png
结尾的href
soup.select("[href$='.png'], [href$='.jpeg']")
答案 2 :(得分:0)
此外,当您找到标签时,也可以使用正则表达式。
Range("O2:O" & LastRow) = Format(Cells(LastRow, 2).Value,"mm/dd/yyyy") & " - " & Cells(LastRow,3)