使用Selenium从站点中查找并获取元素

时间:2017-10-07 16:49:32

标签: python-2.7 selenium web-scraping

我需要你的帮助,我有一个网站,我必须从这个网站获取信息。网站示例:Image HTML enter image description here

我必须从class inputField获取数据,但我必须对数据进行排序,例如:如果class keyType of Work,我们会写入数据从class inputFieldvar1,如果class keyApplication No.,我们会从class inputField撰写数据到var2,如果class keyDate Lodged,我们会将数据从class inputField写入var3。 代码:

    import scrapy
    from tasks.items import TasksItem
    from selenium import webdriver
    from selenium.webdriver.common.by import By


    class MySpider(scrapy.Spider):
        title = []
        type = []
        name = 'Spider'
        allowed_domains = ['https://ecouncil.bayside.vic.gov.au/']

        driver = webdriver.Chrome('C:/TEMP/Scrapy/chromedriver')

        driver.get('https://ecouncil.bayside.vic.gov.au/eservice/daEnquiryInit.do?docType=5&nodeNum=1118')
        driver.get('https://ecouncil.bayside.vic.gov.au/eservice/daEnquiry.do?number=&lodgeRangeType=on&dateFrom=01%2F09%2F2017&dateTo=30%2F09%2F2017&detDateFromString=&detDateToString=&streetName=&suburb=0&unitNum=&houseNum=0%0D%0A%09%09%09%09%09&planNumber=&strataPlan=&lotNumber=&propertyName=&searchMode=A&submitButton=Search')

        title = driver.find_elements_by_css_selector('a.plain_header')
        type = driver.find_elements_by_css_selector('p.rowDataOnly')
        for i in type:
            t1 = i.find_element_by_class_name('key').text
            if t1 == 'Type of Work':
                var1 = t1
            elif t1 == 'some_text':
                var2 = t1
            else:
                var3 = t1

但我不知道如何从inputField

获取数据

2 个答案:

答案 0 :(得分:0)

您目前的逻辑不能很好地运作。你想要做的是获得属性数量的计数,然后遍历每个属性。当你遍历每一个时,你会抓住你感兴趣的三个项目并将它们存储在三个变量中(你真的应该使用更多的描述性名称,顺便说一句。)

下面应该做的事情。

class MySpider(scrapy.Spider):
    title = []
    type = []
    name = 'Spider'
    allowed_domains = ['https://ecouncil.bayside.vic.gov.au/']

    driver = webdriver.Chrome('C:/TEMP/Scrapy/chromedriver')

    driver.get('https://ecouncil.bayside.vic.gov.au/eservice/daEnquiryInit.do?docType=5&nodeNum=1118')
    driver.get('https://ecouncil.bayside.vic.gov.au/eservice/daEnquiry.do?number=&lodgeRangeType=on&dateFrom=01%2F09%2F2017&dateTo=30%2F09%2F2017&detDateFromString=&detDateToString=&streetName=&suburb=0&unitNum=&houseNum=0%0D%0A%09%09%09%09%09&planNumber=&strataPlan=&lotNumber=&propertyName=&searchMode=A&submitButton=Search')

    titles = driver.find_elements_by_css_selector('a.plain_header')
    for i in range(0, len(titles) - 1):
        var1 = driver.find_elements_by_xpath("//span[@class='key'][.='Type of Work']/following-sibling::span[@class='inputField']")[i].text
        var2 = driver.find_elements_by_xpath("//span[@class='key'][.='Application No.']/following-sibling::span[@class='inputField']")[i].text
        var3 = driver.find_elements_by_xpath("//span[@class='key'][.='Date Lodged']/following-sibling::span[@class='inputField']")[i].text

为了使这更容易维护(和阅读),您可以获取最后三行中的代码并将其转换为传递字段名称的函数,例如提交日期,并返回字段值,例如2017年1月9日。我会把它作为锻炼给你。

答案 1 :(得分:-1)

我在Java中尝试过。你可以在python中使用相同的方法。

您可以使用 class = key class = inputField 获取所有span元素。 迭代这些并获得感兴趣的信息。

left: 50%; top: 50%; transform: translate(-50%, -50%);