用漂亮的汤来确实废弃数据

时间:2018-02-06 04:06:24

标签: python beautifulsoup

我正在尝试使用bs来删除简历,但我遇到了一些问题 这是示例网站:https://www.indeed.com/resumes?q=java&l=&cb=jt

这是我的代码:

URL = "https://www.indeed.com/resumes?q=java&l=&cb=jt"
page = requests.get(URL)
soup = BeautifulSoup(page.text, 'html.parser')

def scrap_job_title(soup): 
    job = []
    for div in soup.find_all(name='li', attrs={'class':'sre'}):
        for a in div.find_all(name='a', attrs={'class':'app-link'}):
            job.append(a['title'])
        return(job)
scrap_job_title(soup)

它什么都没打印出来:[]

enter image description here

正如您在图片中看到的,我想获得职称“Java开发人员”。

2 个答案:

答案 0 :(得分:1)

该课程为app_link,而不是app-link。此外,a['title']无法满足您的需求。请改用a.contents[0]

URL = "https://www.indeed.com/resumes?q=java&l=&cb=jt"
page = requests.get(URL)
soup = BeautifulSoup(page.text, 'html.parser')

def scrape_job_title(soup): 
    job = []
    for div in soup.find_all(name='li', attrs={'class':'sre'}):
        for a in div.find_all(name='a', attrs={'class':'app_link'}):
        job.append(a.contents[0])
    return(job)

scrape_job_title(soup)

答案 1 :(得分:1)

试试这个以获得所有职位:

import requests
from bs4 import BeautifulSoup

URL = "https://www.indeed.com/resumes?q=java&l=&cb=jt"
page = requests.get(URL)
soup = BeautifulSoup(page.text, 'html5lib')

for items in soup.select('.sre'):
    data = [item.text for item in items.select('.app_link')]
    print(data)