确实使用BeautifulSoup python取消了前100个工作结果

时间:2019-03-11 08:19:01

标签: python web-scraping beautifulsoup

我是python网络抓取的新手,我想从确实抓取前100个工作结果,而且我只能抓取首页结果(即前10个)。我正在使用BeautifulSoup框架。这是我的代码,有人可以帮助我解决这个问题吗?

import urllib2
from bs4 import BeautifulSoup
import json

URL = "https://www.indeed.co.in/jobs?q=software+developer&l=Bengaluru%2C+Karnataka"
soup = BeautifulSoup(urllib2.urlopen(URL).read(), 'html.parser')

results = soup.find_all('div', attrs={'class': 'jobsearch-SerpJobCard'})

for x in results:
company = x.find('span', attrs={"class":"company"})
print 'company:', company.text.strip()

job = x.find('a', attrs={'data-tn-element': "jobTitle"})
print 'job:', job.text.strip()

1 个答案:

答案 0 :(得分:1)

如果将代码包含在范围循环中,则可以执行以下操作:

from bs4 import BeautifulSoup
import json
import urllib2

URL = "https://www.indeed.co.in/jobs?q=software+developer&l=Bengaluru%2C+Karnataka&start="

for i in range(0 , 100 , 10):
    soup = BeautifulSoup(urllib2.urlopen(URL+str(i)).read(), 'html.parser')
    results = soup.find_all('div', attrs={'class': 'jobsearch-SerpJobCard'})
    for x in results:
        company = x.find('span', attrs={"class":"company"})
        print 'company:', company.text.strip()

        job = x.find('a', attrs={'data-tn-element': "jobTitle"})
        print 'job:', job.text.strip()