我在python中编写了一个与selenium相关联的脚本,用于解析处理延迟加载方法的网页中的一些名称,网页在每个滚动到底部时显示其内容。我的脚本无错误地完成。但是,我无法解决的唯一问题是从我的脚本中取出硬编码延迟。我真的不知道如何使用explicit wait
而不是hardcoded delay
保持逻辑(在脚本中应用),因为它是为了提高效率。提前感谢您的帮助。
这是我到目前为止所尝试的(工作一个):
import time
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("find_the_link_above")
last_len = len(driver.find_elements_by_class_name("listing__name--link"))
new_len = last_len
while True:
last_len = new_len
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
time.sleep(3) ##I wish to kick out this harcoded delay and use explicit wait in place
items = driver.find_elements_by_class_name("listing__name--link")
new_len = len(items)
if last_len == new_len:break
for item in items:
print(item.text)
driver.quit()
答案 0 :(得分:1)
这是实现ExplicitWait的方式:
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.common.exceptions import TimeoutException
driver = webdriver.Chrome()
driver.get("https://www.yellowpages.ca/search/si/1/coffee/all%20states")
last_len = len(driver.find_elements_by_class_name("listing__name--link"))
while True:
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
try:
wait(driver, 3).until(lambda driver: len(driver.find_elements_by_class_name("listing__name--link")) > last_len)
items = driver.find_elements_by_class_name("listing__name--link")
last_len = len(items)
except TimeoutException:
break
for item in items:
print(item.text)
driver.quit()
这应该允许您向下滚动并等待最多3秒(如果需要,增加超时),直到循环中元素数量增加或者在数字保持不变的情况下中断while
循环
答案 1 :(得分:0)
要解析webpage中的名称,您可以使用以下代码块:
代码块:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
items = []
options = Options()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
options.add_argument("--no-sandbox")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\path\to\chromedriver.exe')
driver.get('https://www.yellowpages.ca/search/si/1/coffee/all%20states')
items=driver.find_elements_by_css_selector("h3[itemprop='name']>a.listing__name--link")
while(driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")):
items.append(driver.find_elements_by_css_selector("h3[itemprop='name']>a.listing__name--link"))
for item in items:
print(item.text)
控制台输出:
Tim Hortons
Downtown Expresso Café
Tim Hortons
Tim Hortons
Tim Hortons
Starbucks
Tim Hortons
Tim Hortons
Tim Hortons
Tim Hortons
Tim Hortons
Tim Hortons
Tim Hortons
Starbucks
Tim Hortons
Tim Hortons
Budokan
Anchor Cafe House
Starbucks
Tim Hortons
Tim Hortons
Starbucks
Tim Hortons
Starbucks
Tim Hortons
Tim Hortons
Colonial Coffee Co Ltd
Personal Service Coffee
Tim Hortons
Suzie's Grill Cafe Inc
Loaves N Fishes Catering & Cafe
Tim Hortons
Tim Hortons
Tim Hortons
Tim Hortons
Elizabeth Houte Coiffure
The Grind House Cafe
Tim Hortons
Black Bench Coffee Roasters
Tim Hortons