如何获得下一个“href”分页?

时间:2017-07-09 02:56:30

标签: html python-3.x beautifulsoup

所以我无法获取网址下一页的href链接。我开始获取所有文本以及标签包含的内容,但我似乎无法绕过去除我不需要的文本,只是获取href并浏览页面。

这是我的代码:

import requests
from bs4 import BeautifulSoup
import webbrowser
import time

jobsearch = input("What type of job?: ")
location = input("What is your location: ")
url = ("https://ca.indeed.com/jobs?q=" + jobsearch + "&l=" + location)
base_url = 'https://ca.indeed.com/'

r = requests.get(url)
rcontent = r.content
prettify = BeautifulSoup(rcontent, "html.parser")

filter_words = ['engineering', 'instrumentation', 'QA']
all_job_url = []
nextpages = []
filtered_job_links = []
http_flinks = []
flinks = []

def all_next_pages():
    pages = prettify.find_all('div', {'class':'pagination'})
    for next_page in pages:
        next_page.find_all('a')
        nextpages.append(next_page)
        print(next_page)

all_next_pages()

1 个答案:

答案 0 :(得分:1)

以下是获取搜索结果项链接的方法。查找row result课程,然后找到a标记,其中包含您需要的所有信息。

import requests
from bs4 import BeautifulSoup
import webbrowser
import time

jobsearch = input("What type of job?: ")
location = input("What is your location: ")
url = ("https://ca.indeed.com/jobs?q=" + jobsearch + "&l=" + location)
base_url = 'https://ca.indeed.com/'

r = requests.get(url)
rcontent = r.text
prettify = BeautifulSoup(rcontent, "lxml")

filter_words = ['engineering', 'instrumentation', 'QA']
all_job_url = []
nextpages = []
filtered_job_links = []
http_flinks = []
flinks = []

def all_next_pages():
    pages = prettify.find_all('div', {'class':'  row  result'})
    for next_page in pages:
        info = next_page.find('a')
        url = info.get('href')
        title = info.get('title')
        print(title,url)

all_next_pages()