Selenium检查元素是否存在并单击

时间:2015-03-03 09:16:16

标签: python python-2.7 selenium selenium-webdriver scrapy

我正在使用 Selenium Scrapy 来抓取动态网站中的内容。我是 Selenium 的新手。我正在从here中提取酒单。该网站有一个show more按钮,点击后会显示更多葡萄酒列表。至于现在,我只能点击一次按钮并提取酒单。但我每次都需要点击按钮,直到show more按钮不显示。任何有关这方面的帮助将非常感激。到目前为止,这是我的代码:

# -*- coding: utf-8 -*-

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.contrib.linkextractors import LinkExtractor
from selenium import webdriver
from scrapy.selector import Selector
import time




class WineSpider(CrawlSpider):
    name = "wspider"
    allowed_domains = ["vivino.com"]



    start_urls = ["http://www.vivino.com/wineries/francis-ford-coppola/"] #hloru
    def __init__(self):
        self.driver = webdriver.Firefox()

    def parse(self,response):

        sel = Selector(self.driver.get(response.url))

        self.driver.get(response.url)
        links = []

        time.sleep(5)

        #this is for selecting the show more button

        click = self.driver.find_elements_by_xpath("//*[@id='btn-more-wines']")
        click[0].click()
        time.sleep(5)
        wines = self.driver.find_elements_by_xpath('//a[@class = "link-muted"]')
        for w in wines:
                links.append(w.get_attribute("href"))



        print len(links)
        self.driver.close()

任何帮助都非常有用。

2 个答案:

答案 0 :(得分:3)

制作无限循环,使用Explicit Wait等待"显示更多"按钮出现,打破循环一次"显示更多"不再可见(不再剩下葡萄酒) - 只有解析结果:

from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Firefox()
driver.get("http://www.vivino.com/wineries/francis-ford-coppola/")

while True:
    try:
        button = WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.ID, "btn-more-wines")))
    except TimeoutException:
        break  # no more wines

    button.click()  # load more wines


wines = driver.find_elements_by_xpath('//a[@class = "link-muted"]')

links = [w.get_attribute("href") for w in wines]

driver.close()

请注意,显式等待实际上是游戏规则改变者 - 与硬编码的time.sleep延迟相比,它会使您的代码更加可靠和快速。

答案 1 :(得分:-1)

如果我是你,我会尝试做以下事情。 保持动作模拟,即在单独的函数中点击show-more按钮,例如,

def emulate_action(self):
try:
    click = self.driver.find_elements_by_xpath("//*[@id='btn-more-wines']")
    click[0].click()
    time.sleep(5.0)
    return True

except ElementNotVisibleException as e:
    print " All elements displayed"
    return False

然后调用它直到所有酒单都已加载,

while 1:

flag = self.emulate_action()

if (res):
    continue
else:
    break

然后,如果我没有错,那么这段代码应该有希望解决你的问题。

wines = self.driver.find_elements_by_xpath('//a[@class = "link-muted"]')
    for w in wines:
            links.append(w.get_attribute("href"))



    print len(links)
    self.driver.close()

请告诉我此方法是否适合您!