Python中的多线程

时间:2016-11-05 22:12:13

标签: python multithreading

我跟着这本书"用Python自动化无聊的任务"我试图创建一个从http://xkcd.com下载多个漫画的程序 同时,但遇到了一些问题。我正在复制与书上完全相同的程序。

这是我的代码:

# multidownloadXkcd.py   - Downloads XKCD comics using multiple threads.

import requests, os ,bs4, threading

os.chdir('c:\\users\\patty\\desktop')
os.makedirs('xkcd', exist_ok=True)   # store comics on ./xkcd

def downloadXkcd(startComic, endComic):             
    for urlNumber in range(startComic, endComic):                   
        #Download the page
        print('Downloading page http://xkcd.com/%s...' %(urlNumber))
        res = requests.get('http://xkcd.com/%s' % (urlNumber))
        res.raise_for_status()

        soup= bs4.BeautifulSoup(res.text, "html.parser")        

        #Find the URL of the comic image.
        comicElem = soup.select('#comic img')
        if comicElem == []:
            print('Could not find comic image.')
        else:
            comicUrl = comicElem[0].get('src')
            #Download the image.
            print('Downloading image %s...' % (comicUrl))
            res = requests.get(comicUrl, "html.parser")
            res.raise_for_status()

            #Save the image to ./xkcd.
            imageFile = open(os.path.join('xkcd', os.path.basename(comicUrl)), 'wb')
            for chunk in res.iter_content(100000):
                imageFile.write(chunk)
            imageFile.close()

downloadThreads = []                # a list of all the Thread objects
for i in range(0,1400, 100):        # loops 14 times, creates 14 threads
    downloadThread = threading.Thread(target=downloadXkcd, args=(i, i + 99))
    downloadThreads.append(downloadThread)
    downloadThread.start()

# Wait for all threads to end.
for downloadThread in downloadThreads:
    downloadThread.join()
print('Done.')

我收到以下异常:

Exception in thread Thread-1:
Traceback (most recent call last):
  File "C:\Python\Python35\lib\threading.py", line 914, in _bootstrap_inner
    self.run()
  File "C:\Python\Python35\lib\threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\PATTY\PycharmProjects\CH15_TASKS\practice.py", line 13, in downloadXkcd
    res.raise_for_status()
  File "C:\Python\Python35\lib\site-packages\requests\models.py", line 862, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: http://xkcd.com/0
Exception in thread Thread-2:
Traceback (most recent call last):
  File "C:\Python\Python35\lib\threading.py", line 914, in _bootstrap_inner
    self.run()
  File "C:\Python\Python35\lib\threading.py", line 862, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\PATTY\PycharmProjects\CH15_TASKS\practice.py", line 25, in downloadXkcd
    res = requests.get(comicUrl, "html.parser")
  File "C:\Python\Python35\lib\site-packages\requests\api.py", line 70, in get
    return request('get', url, params=params, **kwargs)
  File "C:\Python\Python35\lib\site-packages\requests\api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "C:\Python\Python35\lib\site-packages\requests\sessions.py", line 461, in request
    prep = self.prepare_request(req)
  File "C:\Python\Python35\lib\site-packages\requests\sessions.py", line 394, in prepare_request
    hooks=merge_hooks(request.hooks, self.hooks),
  File "C:\Python\Python35\lib\site-packages\requests\models.py", line 294, in prepare
    self.prepare_url(url, params)
  File "C:\Python\Python35\lib\site-packages\requests\models.py", line 354, in prepare_url
    raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL '//imgs.xkcd.com/comics/family_circus.jpg': No schema supplied. Perhaps you meant http:////imgs.xkcd.com/comics/family_circus.jpg?

它说URL无效,但每当我将该网址粘贴到webrowser中时,它似乎都是有效的。任何人都知道如何解决这个问题?感谢

1 个答案:

答案 0 :(得分:2)

是的,正如@spectras所说,只是因为你的网址修复了你的网址并不意味着它是有效的。 尝试使用“http://www”。在它之前,试着看看它是否正常工作。