我正在尝试在同一网站上的多个页面上下载所有图像。我有一些代码可以抓取单个页面中的所有图像,但是无法想出一个简单的方法来重复几个URL的过程。
import re
import requests
from bs4 import BeautifulSoup
site = 'SiteNameHere'
response = requests.get(site)
soup = BeautifulSoup(response.text, 'html.parser')
img_tags = soup.find_all('img')
urls = [img['src'] for img in img_tags]
for url in urls:
filename = re.search(r'/([\w_-]+[.](jpg|gif|png))$', url)
with open(filename.group(1), 'wb') as f:
if 'http' not in url:
url = '{}{}'.format(site, url)
response = requests.get(url)
f.write(response.content)
答案 0 :(得分:0)
我会尝试在for循环中添加/更改每个url请求。查看下面的示例(使用请求模块和lxml):
import lxml.html
import requests
#List of CSID numbers
ListOfNumbers = ['12','132','455']
for CSID in CSIDList:
UrlCompleted = ('http://www.chemspider.com/ChemicalStructure.%s.html?rid' %CSID)
ChemSpiderPage = requests.get(UrlCompleted)
html = lxml.html.fromstring(ChemSpiderPage.content)
#Xpath describing the location of the string (see "text()" at the end)
MolecularWeight = html.xpath('//*[@id="ctl00_ctl00_ContentSection_ContentPlaceHolder1_RecordViewDetails_rptDetailsView_ctl00_structureHead"]/ul[1]/li[2]/text()')
try:
print (CSID, MolecularWeight)
print (MolecularWeight)
except:
print 'ERROR'