使用多个网址抓取数据时发生python多处理错误

时间:2018-10-20 17:50:52

标签: python

错误

Traceback (most recent call last):
  File "ask.py", line 23, in <module>
    perform()
  File "ask.py", line 21, in perform
    results = pool.map(display, ['http://www.freejobalert.com/upsc-advt-no-18/33742/', 'http://www.freejobalert.com/upsc-recruitment/16960/#Engg-Services2019'],[1,2,3,3,345,78,96,78])
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 260, in map
    return self._map_async(func, iterable, mapstar, chunksize).get()
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 354, in _map_async
    error_callback=error_callback)
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 633, in __init__
    if chunksize <= 0:
TypeError: unorderable types: list() <= int()

代码:

import requests
from bs4 import BeautifulSoup

def display(url,ids):
  page = requests.get(url)
  c = page.content
  soup = BeautifulSoup(c,"html5lib")
  row = soup.find_all("table",{"style":"width: 500px;"})[0].find_all('tr')
  dict = {}
  for i in row:
      for title in i.find_all('span', attrs={
      'style':'color: #008000;'}):
          dict['Title'] = title.text
      for link in i.find_all('a',attrs={'title':'UPSC'}, href=True):
          dict['Link'] = link['href']
          print(dict)

def perform():
    from multiprocessing.dummy import Pool as ThreadPool
    pool = ThreadPool(4)
    results = pool.map(
        display,
        ['http://www.freejobalert.com/upsc-advt-no-18/33742/', 'http://www.freejobalert.com/upsc-recruitment/16960/#Engg-Services2019'],
        [1,2,3,3,345,78,96,78]
     )

perform()

在这里,我要从多个网址中抓取数据,并使用多重处理来提高性能。

当我在函数中的第一个列表之后添加[1,2,3,3,345,78,96,78]列表时出现错误

0 个答案:

没有答案