无法获取网站的隐藏内容

时间:2017-12-20 08:38:21

标签: beautifulsoup python-3.5 spyder

我试图在BeautifulSoup的帮助下抓取一个网站。我无法获取网站的内容,但是当我检查网站时,它是在源代码上。

import requests
import urllib 

from bs4 import BeautifulSoup


url1 = 'https://recruiting.ultipro.com/usg1006/JobBoard/dfc53730-57d1-3460-336f-ddafabd108f3/?q=&o=postedDateDesc'

response1 = get(url1)

print(response1.text[:500])
html_soup1 = BeautifulSoup(response1.text, 'html.parser')
type(html_soup1)

all_info1 = html_soup1.find("div", {"data-bind": "foreach: opportunities"})
all_info1

all_automation1 = all_info1.find_all("div",{"data-automation":"opportunity"})

all_automation1

在源代码中有" job-title"," location"和"描述"和其他细节,但我无法在html内容中看到相同的细节。

1 个答案:

答案 0 :(得分:0)

你应该尝试这样或类似的东西从该页面获取标题:

import time
from selenium import webdriver
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
driver.get('https://recruiting.ultipro.com/usg1006/JobBoard/dfc53730-57d1-3460-336f-ddafabd108f3/?q=&o=postedDateDesc')
time.sleep(3)       #let the browser load it's content
soup = BeautifulSoup(driver.page_source,'lxml')
for item in soup.select("h3 .opportunity-link"):
    print(item.text)
driver.quit()