我正在使用 Selenium 和 Scrapy 来抓取动态网站中的内容。我是 Selenium 的新手。我正在从here中提取酒单。该网站有一个show more
按钮,点击后会显示更多葡萄酒列表。至于现在,我只能点击一次按钮并提取酒单。但我每次都需要点击按钮,直到show more
按钮不显示。任何有关这方面的帮助将非常感激。到目前为止,这是我的代码:
# -*- coding: utf-8 -*-
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
from selenium import webdriver
from scrapy.selector import Selector
import time
class WineSpider(CrawlSpider):
name = "wspider"
allowed_domains = ["vivino.com"]
start_urls = ["http://www.vivino.com/wineries/francis-ford-coppola/"] #hloru
def __init__(self):
self.driver = webdriver.Firefox()
def parse(self,response):
sel = Selector(self.driver.get(response.url))
self.driver.get(response.url)
links = []
time.sleep(5)
#this is for selecting the show more button
click = self.driver.find_elements_by_xpath("//*[@id='btn-more-wines']")
click[0].click()
time.sleep(5)
wines = self.driver.find_elements_by_xpath('//a[@class = "link-muted"]')
for w in wines:
links.append(w.get_attribute("href"))
print len(links)
self.driver.close()
任何帮助都非常有用。
答案 0 :(得分:3)
制作无限循环,使用Explicit Wait
等待"显示更多"按钮出现,打破循环一次"显示更多"不再可见(不再剩下葡萄酒) - 只有解析结果:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.get("http://www.vivino.com/wineries/francis-ford-coppola/")
while True:
try:
button = WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.ID, "btn-more-wines")))
except TimeoutException:
break # no more wines
button.click() # load more wines
wines = driver.find_elements_by_xpath('//a[@class = "link-muted"]')
links = [w.get_attribute("href") for w in wines]
driver.close()
请注意,显式等待实际上是游戏规则改变者 - 与硬编码的time.sleep延迟相比,它会使您的代码更加可靠和快速。
答案 1 :(得分:-1)
如果我是你,我会尝试做以下事情。 保持动作模拟,即在单独的函数中点击show-more按钮,例如,
def emulate_action(self):
try:
click = self.driver.find_elements_by_xpath("//*[@id='btn-more-wines']")
click[0].click()
time.sleep(5.0)
return True
except ElementNotVisibleException as e:
print " All elements displayed"
return False
然后调用它直到所有酒单都已加载,
while 1:
flag = self.emulate_action()
if (res):
continue
else:
break
然后,如果我没有错,那么这段代码应该有希望解决你的问题。
wines = self.driver.find_elements_by_xpath('//a[@class = "link-muted"]')
for w in wines:
links.append(w.get_attribute("href"))
print len(links)
self.driver.close()
请告诉我此方法是否适合您!