我正在尝试此代码,因为我正在学习如何从在线网站中删除所有图像。这是我从一本书中获得的代码,该程序能够顺利运行而没有任何错误,但问题是在运行代码之后,没有任何图像保存在文件夹中.xkcd' 。我已经看了几个小时,但我仍然无法弄明白,所以我想在我忽略的东西上寻求帮助。非常感谢任何帮助。
import requests, os, bs4
url = 'http://xkcd.com' # starting url
os.makedirs('xkcd', exist_ok=True) # store comics in ./xkcd
while not url.endswith('1790/'):
# Download the page.
print('Downloading page %s...' % url)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text,'html.parser')
# Find the URL of the comic image.
comicElem = soup.select('#comic img')
if comicElem == []:
print('Could not find comic image.')
else:
try:
comicUrl = 'http:' + comicElem[0].get('src')
# Download the image.
print('Downloading image %s...' % (comicUrl))
res = requests.get(comicUrl)
res.raise_for_status()
except requests.exceptions.MissingSchema:
# skip this comic
prevLink = soup.select('a[rel="prev"]')[0]
url = 'http://xkcd.com' + prevLink.get('href')
continue
# Save the image to ./xkcd.
imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb')
for chunk in res.iter_content(100000):
imageFile.write(chunk)
imageFile.close()
# Get the Prev button's url.
prevLink = soup.select('a[rel="prev"]')[0]
url = 'http://xkcd.com' + prevLink.get('href')
print('Done.')
编辑:上面的代码现在运作良好。
答案 0 :(得分:0)
您的if
可能应该是:
if comicElem == []:
print('Could not find comic image.')
else:
try:
comicUrl = 'http:' + comicElem[0].get('src')
# Download the image.
print('Downloading image %s...' % (comicUrl))
res = requests.get(comicUrl)
res.raise_for_status()
# Save the image to ./xkcd.
imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb')
for chunk in res.iter_content(100000):
imageFile.write(chunk)
imageFile.close()
except requests.exceptions.MissingSchema:
# skip this comic
prevLink = soup.select('a[rel="prev"]')[0]
url = 'http://xkcd.com' + prevLink.get('href')
continue
我不完全确定您为什么要在异常处理程序中保存文件,但无论如何继续'声明将意味着实际的保存代码永远不会被运行。
答案 1 :(得分:0)
您的写作操作应该在try块内。