Question

我正试图取消Domino's Pizza Canada的促销页面。本质上，我想要促销的名称和价格。我的下面的代码无法得到结果。

xpth = "//div[@class='relative flex' and /span[@class='dn dib-l']]//@href"
links = browser.find_elements_by_xpath(xpth)

先谢谢了。

Answer 1

正如其他人提到的那样，Selenium将允许您在脚本加载所有元素之后获取页面的HTML。但是，对于该特定页面，即使在我向页面加载硒后，汤也不会产生任何span元素。事实证明，我的电脑花了几秒钟才能正确加载脚本/元素，因此我不得不加入睡眠功能以等待页面加载。

from bs4 import BeautifulSoup
from selenium import webdriver
import time

browser = webdriver.Firefox()
url = "https://www.dominos.ca/"
browser.get(url)
time.sleep(10) # wait ten seconds for all elements to load
html = browser.page_source
soup = BeautifulSoup(html,features='html.parser')
spans = soup.find_all('span', {"class": "promo__title__emphasis"})
print(spans)

返回：

[<span class="promo__title__emphasis">2-Topping<br/>Pizzas</span>, <span class="promo__title__emphasis">2-Topping<br/>Pizzas</span>]

使用BS4刮除Domino的促销页面

1 个答案: