Question

我试图在BeautifulSoup的帮助下抓取一个网站。我无法获取网站的内容，但是当我检查网站时，它是在源代码上。

import requests
import urllib 

from bs4 import BeautifulSoup


url1 = 'https://recruiting.ultipro.com/usg1006/JobBoard/dfc53730-57d1-3460-336f-ddafabd108f3/?q=&o=postedDateDesc'

response1 = get(url1)

print(response1.text[:500])
html_soup1 = BeautifulSoup(response1.text, 'html.parser')
type(html_soup1)

all_info1 = html_soup1.find("div", {"data-bind": "foreach: opportunities"})
all_info1

all_automation1 = all_info1.find_all("div",{"data-automation":"opportunity"})

all_automation1

在源代码中有＆＃34; job-title＆＃34;，＆＃34; location＆＃34;和＆＃34;描述＆＃34;和其他细节，但我无法在html内容中看到相同的细节。

Answer 1

你应该尝试这样或类似的东西从该页面获取标题：

import time
from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
driver.get('https://recruiting.ultipro.com/usg1006/JobBoard/dfc53730-57d1-3460-336f-ddafabd108f3/?q=&o=postedDateDesc')
time.sleep(3)       #let the browser load it's content
soup = BeautifulSoup(driver.page_source,'lxml')
for item in soup.select("h3 .opportunity-link"):
    print(item.text)
driver.quit()

无法获取网站的隐藏内容

1 个答案: