Python - 编写后没有保存文件

时间:2017-02-14 14:41:29

标签: python web-scraping beautifulsoup python-3.6

我正在尝试此代码,因为我正在学习如何从在线网站中删除所有图像。这是我从一本书中获得的代码,该程序能够顺利运行而没有任何错误,但问题是在运行代码之后,没有任何图像保存在文件夹中.xkcd' 。我已经看了几个小时,但我仍然无法弄明白,所以我想在我忽略的东西上寻求帮助。非常感谢任何帮助。

import requests, os, bs4

url = 'http://xkcd.com'              # starting url
os.makedirs('xkcd', exist_ok=True)   # store comics in ./xkcd
while not url.endswith('1790/'):
# Download the page.
print('Downloading page %s...' % url)
res = requests.get(url)
res.raise_for_status()

soup = bs4.BeautifulSoup(res.text,'html.parser')
# Find the URL of the comic image.
comicElem = soup.select('#comic img')

if comicElem == []:
     print('Could not find comic image.')
else:
    try:
        comicUrl = 'http:' + comicElem[0].get('src')
        # Download the image.
        print('Downloading image %s...' % (comicUrl))
        res = requests.get(comicUrl)
        res.raise_for_status()

    except requests.exceptions.MissingSchema:
        # skip this comic
        prevLink = soup.select('a[rel="prev"]')[0]
        url = 'http://xkcd.com' + prevLink.get('href')
        continue
   # Save the image to ./xkcd.
   imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb')
   for chunk in res.iter_content(100000):
        imageFile.write(chunk)
        imageFile.close()

# Get the Prev button's url.
prevLink = soup.select('a[rel="prev"]')[0]
url = 'http://xkcd.com' + prevLink.get('href')
print('Done.')

编辑:上面的代码现在运作良好。

2 个答案:

答案 0 :(得分:0)

您的if可能应该是:

if comicElem == []:
     print('Could not find comic image.')
else:
    try:
        comicUrl = 'http:' + comicElem[0].get('src')
        # Download the image.
        print('Downloading image %s...' % (comicUrl))
        res = requests.get(comicUrl)
        res.raise_for_status()

        # Save the image to ./xkcd.
        imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb')
        for chunk in res.iter_content(100000):
        imageFile.write(chunk)
        imageFile.close()

    except requests.exceptions.MissingSchema:
        # skip this comic
        prevLink = soup.select('a[rel="prev"]')[0]
        url = 'http://xkcd.com' + prevLink.get('href')
        continue

我不完全确定您为什么要在异常处理程序中保存文件,但无论如何继续'声明将意味着实际的保存代码永远不会被运行。

答案 1 :(得分:0)

您的写作操作应该在try块内。