(Python和第一篇文章的新手)
请参阅下面的代码,但问题在于: 我试图在代码中抓取页面上所有职位的网页,但是当我打印列表时,我没有得到任何值。我尝试过使用不同的xpath来查看是否可以打印一些东西,但每次我的列表都是空的。
是否有人知道我的代码是否存在问题,或者网站结构是否存在我没有考虑过的问题?
提前致谢!
from lxml import html
import requests
page = requests.get("https://careers.homedepot.com/job-search-results/?location=Atlanta%2C%20GA%2C%20United%20States&latitude=33.7489954&longitude=-84.3879824&radius=15&parent_category=Corporate%2FOther")
tree = html.fromstring(page.content)
Job_Title = tree.xpath('//*[@id="widget-jobsearch-results-list"]/div/div/div/div[@class="jobTitle"]/a/text()')
print (Job_Title)
答案 0 :(得分:1)
您正在寻找的信息是使用JavaScript
动态生成的,而requests
则只能获得初始HTML
页面来源。
您可能需要使用selenium
(+ chromedriver
)来获取所需数据:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Chrome()
driver.get("https://careers.homedepot.com/job-search-results/?location=Atlanta%2C%20GA%2C%20United%20States&latitude=33.7489954&longitude=-84.3879824&radius=15&parent_category=Corporate%2FOther")
xpath = "//a[starts-with(@id, 'job-results')]"
wait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, xpath)))
jobs = [job.text for job in driver.find_elements_by_xpath(xpath)]
答案 1 :(得分:1)
尝试一个可以解析JS的库(dryscrape是一个轻量级替代品)。
这是一个代码示例
from lxml import html
import requests
import dryscrape
session = dryscrape.Session()
session.visit("https://careers.homedepot.com/job-search-results/?location=Atlanta%2C%20GA%2C%20United%20States&latitude=33.7489954&longitude=-84.3879824&radius=15&parent_category=Corporate%2FOther")
page = session.body()
tree = html.fromstring(page.content)
Job_Title = tree.xpath('//*[@id="widget-jobsearch-results-list"]/div/div/div/div[@class="jobTitle"]/a/text()')
print (Job_Title)
答案 2 :(得分:0)
该页面使用JS构建HTML(表格)。换句话说,目标块在该页面上不存在为HTML。请打开源并检查它。
<div class="entry-content-wrapper clearfix">
<div id="widget-jobsearch-results-list"></div> # <- Target block is empty!
<div id="widget-jobsearch-results-pages"></div>
</div>