我试图在BeautifulSoup的帮助下抓取一个网站。我无法获取网站的内容,但是当我检查网站时,它是在源代码上。
import requests
import urllib
from bs4 import BeautifulSoup
url1 = 'https://recruiting.ultipro.com/usg1006/JobBoard/dfc53730-57d1-3460-336f-ddafabd108f3/?q=&o=postedDateDesc'
response1 = get(url1)
print(response1.text[:500])
html_soup1 = BeautifulSoup(response1.text, 'html.parser')
type(html_soup1)
all_info1 = html_soup1.find("div", {"data-bind": "foreach: opportunities"})
all_info1
all_automation1 = all_info1.find_all("div",{"data-automation":"opportunity"})
all_automation1
在源代码中有" job-title"," location"和"描述"和其他细节,但我无法在html内容中看到相同的细节。
答案 0 :(得分:0)
你应该尝试这样或类似的东西从该页面获取标题:
import time
from selenium import webdriver
from bs4 import BeautifulSoup
driver = webdriver.Chrome()
driver.get('https://recruiting.ultipro.com/usg1006/JobBoard/dfc53730-57d1-3460-336f-ddafabd108f3/?q=&o=postedDateDesc')
time.sleep(3) #let the browser load it's content
soup = BeautifulSoup(driver.page_source,'lxml')
for item in soup.select("h3 .opportunity-link"):
print(item.text)
driver.quit()