Python中的并行Web请求(request.get和BeautifulSoup)

时间:2017-08-31 21:09:15

标签: python get beautifulsoup

我有一个简单的Python脚本循环遍历字典,其中键是url链接。我必须从每个链接中提取一些信息并将信息存储到另一个字典中。下面的代码是函数的第一部分,代码似乎按预期工作。 但它只打开了一个链接,而我认为如果我并行执行此操作,我可能会在运行时获得一些改进。您是否有任何建议如何在Python中以简单的方式实现这一目标?

def updater(local):
    links = myItems['links']
        for link in links.keys():
            page = requests.get(link)
            soup = BeautifulSoup(page.content, 'html.parser')
            newsoup = soup.find("div", {"id": "overviewQuickstatsBenchmarkDiv"})
            rows = newsoup.findAll('tr')[1]
            counter = 0
            date = ""
            for td in rows.findAll('td'):
                counter += 1
                if td.contents[0] == 'Date':
                    date = td.text.replace("Date", "")
                elif counter == 2:
                    pass
                elif counter == 3:
                    price = re.findall("\d+\.\d+", td.string)[0]

这是我尝试使用多处理(但我无法得到任何结果,代码似乎无法运行):

    def read(url):
        result = {'link': url, 'data': requests.get(url)}
        print "Reading: " + url
        return result


    def updater(local):
        links = myItems['links']
        pool = Pool(processes=5)
        results = pool.map(read, links.keys())
            for link in links.keys():
                # need to read the results and store data into a dictionary

0 个答案:

没有答案