我需要你的帮助,我有一个网站,我必须从这个网站获取信息。网站示例:Image HTML
我必须从class
inputField
获取数据,但我必须对数据进行排序,例如:如果class
key
为Type of Work
,我们会写入数据从class
inputField
到var1
,如果class
key
为Application No.
,我们会从class
inputField
撰写数据到var2
,如果class
key
为Date Lodged
,我们会将数据从class
inputField
写入var3
。
代码:
import scrapy
from tasks.items import TasksItem
from selenium import webdriver
from selenium.webdriver.common.by import By
class MySpider(scrapy.Spider):
title = []
type = []
name = 'Spider'
allowed_domains = ['https://ecouncil.bayside.vic.gov.au/']
driver = webdriver.Chrome('C:/TEMP/Scrapy/chromedriver')
driver.get('https://ecouncil.bayside.vic.gov.au/eservice/daEnquiryInit.do?docType=5&nodeNum=1118')
driver.get('https://ecouncil.bayside.vic.gov.au/eservice/daEnquiry.do?number=&lodgeRangeType=on&dateFrom=01%2F09%2F2017&dateTo=30%2F09%2F2017&detDateFromString=&detDateToString=&streetName=&suburb=0&unitNum=&houseNum=0%0D%0A%09%09%09%09%09&planNumber=&strataPlan=&lotNumber=&propertyName=&searchMode=A&submitButton=Search')
title = driver.find_elements_by_css_selector('a.plain_header')
type = driver.find_elements_by_css_selector('p.rowDataOnly')
for i in type:
t1 = i.find_element_by_class_name('key').text
if t1 == 'Type of Work':
var1 = t1
elif t1 == 'some_text':
var2 = t1
else:
var3 = t1
但我不知道如何从inputField
答案 0 :(得分:0)
您目前的逻辑不能很好地运作。你想要做的是获得属性数量的计数,然后遍历每个属性。当你遍历每一个时,你会抓住你感兴趣的三个项目并将它们存储在三个变量中(你真的应该使用更多的描述性名称,顺便说一句。)
下面应该做的事情。
class MySpider(scrapy.Spider):
title = []
type = []
name = 'Spider'
allowed_domains = ['https://ecouncil.bayside.vic.gov.au/']
driver = webdriver.Chrome('C:/TEMP/Scrapy/chromedriver')
driver.get('https://ecouncil.bayside.vic.gov.au/eservice/daEnquiryInit.do?docType=5&nodeNum=1118')
driver.get('https://ecouncil.bayside.vic.gov.au/eservice/daEnquiry.do?number=&lodgeRangeType=on&dateFrom=01%2F09%2F2017&dateTo=30%2F09%2F2017&detDateFromString=&detDateToString=&streetName=&suburb=0&unitNum=&houseNum=0%0D%0A%09%09%09%09%09&planNumber=&strataPlan=&lotNumber=&propertyName=&searchMode=A&submitButton=Search')
titles = driver.find_elements_by_css_selector('a.plain_header')
for i in range(0, len(titles) - 1):
var1 = driver.find_elements_by_xpath("//span[@class='key'][.='Type of Work']/following-sibling::span[@class='inputField']")[i].text
var2 = driver.find_elements_by_xpath("//span[@class='key'][.='Application No.']/following-sibling::span[@class='inputField']")[i].text
var3 = driver.find_elements_by_xpath("//span[@class='key'][.='Date Lodged']/following-sibling::span[@class='inputField']")[i].text
为了使这更容易维护(和阅读),您可以获取最后三行中的代码并将其转换为传递字段名称的函数,例如提交日期,并返回字段值,例如2017年1月9日。我会把它作为锻炼给你。
答案 1 :(得分:-1)
我在Java中尝试过。你可以在python中使用相同的方法。
您可以使用 class = key 和 class = inputField 获取所有span元素。 迭代这些并获得感兴趣的信息。
left: 50%; top: 50%; transform: translate(-50%, -50%);