我正在尝试使用bs来删除简历,但我遇到了一些问题 这是示例网站:https://www.indeed.com/resumes?q=java&l=&cb=jt
这是我的代码:
URL = "https://www.indeed.com/resumes?q=java&l=&cb=jt"
page = requests.get(URL)
soup = BeautifulSoup(page.text, 'html.parser')
def scrap_job_title(soup):
job = []
for div in soup.find_all(name='li', attrs={'class':'sre'}):
for a in div.find_all(name='a', attrs={'class':'app-link'}):
job.append(a['title'])
return(job)
scrap_job_title(soup)
它什么都没打印出来:[]
正如您在图片中看到的,我想获得职称“Java开发人员”。
答案 0 :(得分:1)
该课程为app_link
,而不是app-link
。此外,a['title']
无法满足您的需求。请改用a.contents[0]
。
URL = "https://www.indeed.com/resumes?q=java&l=&cb=jt"
page = requests.get(URL)
soup = BeautifulSoup(page.text, 'html.parser')
def scrape_job_title(soup):
job = []
for div in soup.find_all(name='li', attrs={'class':'sre'}):
for a in div.find_all(name='a', attrs={'class':'app_link'}):
job.append(a.contents[0])
return(job)
scrape_job_title(soup)
答案 1 :(得分:1)
试试这个以获得所有职位:
import requests
from bs4 import BeautifulSoup
URL = "https://www.indeed.com/resumes?q=java&l=&cb=jt"
page = requests.get(URL)
soup = BeautifulSoup(page.text, 'html5lib')
for items in soup.select('.sre'):
data = [item.text for item in items.select('.app_link')]
print(data)