美丽的汤玻璃门页面

时间:2020-08-01 03:51:54

标签: python beautifulsoup

我有一个玻璃门链接,我正尝试通过request.get()访问 https://www.glassdoor.com/Job/jobs.htm?sc.generalKeyword=%22teaching%22&sc.locationSeoString=new+york&locId=1132348&locT=C

我注意到,当我单击下一页时,会添加一个lo_IP{page_number}.htm。例如: https://www.glassdoor.com/Job/jobs.htm?sc.generalKeyword=%22teaching%22&sc.locationSeoString=new+york&locId=1132348&lo_IP4.htm for page 4。

但是当我直接转到该链接(例如第4页)时,并没有带我到第4页。是否有办法转到第n页?

    pages= 2
 
    for x in range(1, pages):
        page_url = "https://www.glassdoor.com/Job/jobs.htm?sc.generalKeyword=%22teaching%22&sc.locationSeoString=new+york&locId=1132348&lo_IP{}.htm".format(x)
        headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36"}
        page = requests.get(page_url, headers=headers)
        soup = BeautifulSoup(page.content, 'html.parser')

1 个答案:

答案 0 :(得分:1)

通过:

<li class="page">
    <a href="/Job/jobs.htm?sc.generalKeyword=%22teaching%22&amp;sc.locationSeoString=new+york&amp;locId=1132348&amp;locT=C&amp;p=4">
        <span class="link">4</span>
    </a>
</li>

https://www.glassdoor.com/Job/jobs.htm?sc.generalKeyword=%22teaching%22&sc.locationSeoString=new+york&locId=1132348&locT=C&p=4将转到第4页。

从逻辑上讲&p=n将转到第n页。 所以要获得第n页

url = f'https://www.glassdoor.com/Job/jobs.htm?sc.generalKeyword="teaching"&sc.locationSeoString=new+york&locId=1132348&locT=C&p={n}'

Origin网站由JS工作。它只是请求数据并更新url和页面。因此,https://www.glassdoor.com/Job/jobs.htm?sc.generalKeyword=%22teaching%22&sc.locationSeoString=new+york&locId=1132348&lo_IP4.htm只是它放入在网址上的内容。