我正在使用beautifulSoup抓取两个不同的网站 - 我如何在一个代码中运行它?

时间:2017-09-02 23:57:18

标签: python-2.7 web-scraping beautifulsoup screen-scraping

我正在使用BeautifulSoup刮取几个公司网站的工作岗位(我已获得许可)。它们的HTML结构略有不同,所以我创建了几个刮刀来抓取各个网站。刮刀的输出与作业过帐的网址相同。

问题

我有刮刀,它们单独工作正常 - 但为了提高效率,我希望能够同时运行它们,而不必单独运行它们。最简单的方法是什么?

刮刀1

import requests
from bs4 import BeautifulSoup 

base = "http://implementconsultinggroup.com"
url = "http://implementconsultinggroup.com/career/#/1143"

req = requests.get(url).text
soup = BeautifulSoup(req,'html.parser')
links = soup.select("a")

for link in links:
    if "career" in link.get("href") and 'COPENHAGEN' in link.text:
        res = requests.get(base + link.get("href")).text
        soup = BeautifulSoup(res,'html.parser')
        title = soup.select_one("h1.page-intro__title").get_text() if 
soup.select_one("h1.section__title") else ""
        overview = soup.select_one("p.page-intro__longDescription").get_text()
        details = soup.select_one("div.rte").get_text()
        print(title, link, details)

刮刀2

import requests
from bs4 import BeautifulSoup 

url = 
"http://deloittedk.easycruit.com/_sp=136ecff9b65625bf.1504382903200&icid=top_"
r = requests.get(url)

soup = BeautifulSoup(r.content)

links = soup.find_all("a")

for link in links:
            print "<a href='%s'>%s</a>" %(link.get("href"), link.text)

0 个答案:

没有答案