XKCD Web抓取器-自动完成无聊的工作

时间:2019-07-08 04:31:51

标签: python python-3.x web-scraping beautifulsoup

我目前正在学习ATBS的第11章,并正在完成Web Scraper项目。我可以正常运行它,但是实际上从来没有将网络漫画下载到Mac上。

#! /usr/bin/env python3

#downloadXkcd.py - Downloads every single XKCD comic.

import requests, os, bs4

url = 'http://xkcd.com'             # starting URL
os.makedirs('xkcd', exist_ok=True)  # store comics in ./xkcd

while not url.endswith('#'):

    #TODO: DL the page
    print('Downloading page %s...' % url)
    res = requests.get(url)
    res.raise_for_status()

    soup = bs4.BeautifulSoup(res.text)

    #TODO: Find URL of image
    comicElem = soup.select('#comic img')
    if comicElem == []:
        print('Could not find comic image.')
    else:
        comicUrl = 'http:' + comicElem[0].get('src')

        #TODO: Download Image
        print('Downloading image %s' % (comicUrl))
        res = requests.get(comicUrl)
        res.raise_for_status()

        #TODO: Save image to ./xkcd
        imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb')
        for chunk in res.iter_content(100000):
            imageFile.write(chunk)
        imageFile.close()

    #TODO: Get prev button URL
    prevLink = soup.select('a[rel="prev"]')[0]
    url = 'http://xkcd.com' + prevLink.get('href')

print('Done.')

我需要解决什么才能下载漫画?谢谢。

1 个答案:

答案 0 :(得分:0)

您似乎遗漏了html.parser,如下所示:

soup = bs4.BeautifulSoup(res.text, 'html.parser')