无法使用硒刮取数据

时间:2016-07-16 16:16:27

标签: python selenium web-scraping scrapy

我使用selenium从angel.co获取数据,但仍然没有从网站获取数据

from scrapy import Request,Spider

import urllib
from selenium import webdriver

class AngelSpider(Spider):
    name = "angel"
    allowed_domains = ["angel.co"]
    AJAXCRAWL_ENABLED = True
    start_urls = (
        "https://angel.co/companies?locations[]=India",
    )

    def __init__(self):
        self.path ='/usr/lib/chromium-browser/chromedriver'
        self.driver = webdriver.Chrome(self.path)

    def parse(self,response):
        self.driver.get(response.url)
        self.driver.implicitly_wait(50)
        while True:
            next = self.driver.find_element_by_css_selector("div.more")
            try:
                next.click()
                self.driver.implicitly_wait(10)
                divs = self.driver.find_element_by_xpath("//div[@class= 'results']")
                for div in divs:
                    name =divs.find_element_by_css_selector("div.name")
                    print name.text
            except:
                break

1 个答案:

答案 0 :(得分:0)

您没有看到任何打印的原因是您正在使用 bare except子句,并且基本上默默地忽略所有引发的异常

问题在于您在页面上找到元素的方式,在此行中,您正在使用find_element_by_xpath()方法找到单个div元素:

divs = self.driver.find_element_by_xpath("//div[@class= 'results']")

divs现在是一个 WebElement实例,它不可迭代,迭代它会在下一行失败:

for div in divs:

相反,你要做的就是这样:

results = self.driver.find_elements_by_css_selector(".results > div")
for result in results:
    name = result.find_element_by_css_selector(".name")
    print(name.text)