我正在尝试从https://web.archive.org/web/20110101000000*/area51.stackexchange.com
获取日历中突出显示的日期。我可以在chrome inspector中看到“ calendar-day”类,但是它没有显示在源代码中。 我也曾尝试查找其他类元素,例如“ month-week”,但失败了。谁能帮助我诊断问题所在?我研究了ShadowDOM,但这似乎不是这里的问题(尽管我可能是错的)。
此外,我还尝试获取网址“ /web/20110430/area51.stackexchange.com”,但不知道如何通过类,标记名,css或Xpath进行定位。
driver = webdriver.Firefox()
driver.get("https://web.archive.org/web/20110101000000*/area51.stackexchange.com")
element=driver.find_element_by_class_name("calendar-day")
谢谢!
答案 0 :(得分:1)
只需等待日历的div元素出现并打印出来。您的班级名称有多余的空间,而且页面加载后需要一些时间。
element=WebDriverWait(driver, 20).until(EC.presence_of_element_located((By.XPATH, "//div[contains(@class,'calendar-day')]")))
print(element.text)
输出11
要抓取多个
elements=WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[contains(@class,'calendar-day')]")))
for element in elements:
print(element.text)
输出 11 30 7 16 18岁 10 11 7 12
导入
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
答案 1 :(得分:1)
您正在使用的页面需要一段时间才能加载,因此如果在提取之前引入一些明确的等待会更好。
示例脚本可以是:
spring:
cloud:
stream:
bindings:
toUpperCase:
binder: rabbitTest
destination: toUpperCase-out-0
reverse:
binder: rabbitTest
binders:
rabbitTest:
type: rabbit
environment:
spring:
rabbitmq:
host: localhost
port: 5672
virtual-host: /
username: ***
password: ***
要提取多个元素,只需添加/更改from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("https://web.archive.org/web/20110101000000*/area51.stackexchange.com")
wait = WebDriverWait(driver, 10)
element = wait.until(EC.presence_of_element_located((By.CLASS_NAME, "calendar-day ")))
print(element.text)
driver.quit()
因此可以更新示例脚本
WebDriverWait(driver, 20).until(EC.presence_of_all_elements_located((By.CLASS_NAME, "calendar-day ")))