我是python网络抓取的新手,我想从确实抓取前100个工作结果,而且我只能抓取首页结果(即前10个)。我正在使用BeautifulSoup框架。这是我的代码,有人可以帮助我解决这个问题吗?
import urllib2
from bs4 import BeautifulSoup
import json
URL = "https://www.indeed.co.in/jobs?q=software+developer&l=Bengaluru%2C+Karnataka"
soup = BeautifulSoup(urllib2.urlopen(URL).read(), 'html.parser')
results = soup.find_all('div', attrs={'class': 'jobsearch-SerpJobCard'})
for x in results:
company = x.find('span', attrs={"class":"company"})
print 'company:', company.text.strip()
job = x.find('a', attrs={'data-tn-element': "jobTitle"})
print 'job:', job.text.strip()
答案 0 :(得分:1)
如果将代码包含在范围循环中,则可以执行以下操作:
from bs4 import BeautifulSoup
import json
import urllib2
URL = "https://www.indeed.co.in/jobs?q=software+developer&l=Bengaluru%2C+Karnataka&start="
for i in range(0 , 100 , 10):
soup = BeautifulSoup(urllib2.urlopen(URL+str(i)).read(), 'html.parser')
results = soup.find_all('div', attrs={'class': 'jobsearch-SerpJobCard'})
for x in results:
company = x.find('span', attrs={"class":"company"})
print 'company:', company.text.strip()
job = x.find('a', attrs={'data-tn-element': "jobTitle"})
print 'job:', job.text.strip()