Selenium scrapy python:csv / jsob中没有数据

时间:2014-06-09 20:01:10

标签: python selenium web-scraping scrapy

我试图从http://www.qchp.org.qa/en/Pages/searchpractitioners.aspx中删除信息。我想做以下事情:

  • 选择"牙医"来自页面顶部的下拉列表
  • 点击搜索
  • 请注意,页面底部的信息会使用javascript动态更改
  • 点击从业者姓名的超链接,弹出窗口
  • 我想在每个从业者的json / csv文件中保存所有信息。

我还想要在页面底部链接的其他页面上的信息更改保存div中的信息。

我尝试将数据导出到json文件,但它会生成一个空文件。我在控制台中看不到任何错误。

spider.py

from scrapy.spider import Spider
from scrapy.selector import Selector
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from scrapytutorial.items import SchItem
from selenium.webdriver.support.ui import Select

class DmozSpider(Spider):
    name = "sch"

    driver = webdriver.Firefox()
    driver.get("http://www.qchp.org.qa/en/Pages/searchpractitioners.aspx")

    dropdown = driver.find_element_by_name("ctl00$m$g_28bc0e11_4b8f_421f_84b7_d671de504bc3$ctl00$drp_practitionerType")
    all_options = dropdown.find_elements_by_tag_name("option")

    for option in all_options:
        if option.get_attribute("value") == "4":  #Dentist
            option.click()
            break

    driver.find_element_by_name("ctl00$m$g_28bc0e11_4b8f_421f_84b7_d671de504bc3$ctl00$Searchbtn").click()


    def parse(self, response):

        all_docs = element.find_elements_by_tag_name("td")
        for name in all_docs:
            name.click()
            alert = driver.switch_to_alert()
            sel = Selector(response)
            ma = sel.xpath('//table')
            items = []
            for site in ma:
                item = SchItem()
                item['name'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Name']/text()").extract()
                item['profession'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Profession']/text()").extract()
                item['scope_of_practise'] = site.xpath("//span[@id='PractitionerDetails1_lbl_sop']/text()").extract()
                item['instituition'] = site.xpath("//span[@id='PractitionerDetails1_lbl_institution']/text()").extract()
                item['license'] = site.xpath("//span[@id='PractitionerDetails1_lbl_LicenceNo']/text()").extract()
                item['license_expiry_date'] = site.xpath("//span[@id='PractitionerDetails1_lbl_LicenceExpiry']/text()").extract()
                item['qualification'] = site.xpath("//span[@id='PractitionerDetails1_lbl_Qualification']/text()").extract()

                items.append(item)
            return items

这是items.py 来自scrapy.item import Item

class SchItem(Item):

    name = Field()
    profession = Field()
    scope_of_practise = Field()
    instituition = Field()
    license = Field()
    license_expiry_date = Field()
    qualification = Field()

1 个答案:

答案 0 :(得分:0)

2个可能的想法: *缩回“退货” - 向左移动4个单位; **而不是sel = Selector(response)尝试sel = Selector(response.url) - >你没有使用scrapy反应,而是使用硒反应。