使用Python Selenium进行网络抓取期间返回了错误的值

时间:2018-12-21 14:50:12

标签: python selenium web-scraping

我想抓取2018年12月18日在钦奈天气历史记录下的表格。该表格具有以下变量-时间,温度,天气,风,湿度,晴雨表和能见度。这是我的尝试:

import csv
from datetime import date

from selenium import webdriver
from selenium.webdriver.support.ui import Select


def valueGetter(date_list, type):
    dates = []
    for anchor in date_list:
        dates.append(anchor.get_attribute('innerHTML'))
    return dates


chrome_path = "/usr/bin/chromedriver"
d = date.today()
weather_data = "/home/kcube/Desktop/Selenium-Project/timeanddate.com/met_verif_"+str(d.strftime("%d-%m-%Y"))+".csv"
fieldnames = ['humidity']
driver = webdriver.Chrome(chrome_path)
QueryFormatString = 'https://www.timeanddate.com/weather/india/chennai/historic'
driver.get(QueryFormatString)
driver.find_element_by_id('wt-his-select').click()
select = Select(driver.find_element_by_id("wt-his-select"))
option = select.options
previous_dates = valueGetter(option , "null")

with open(weather_data, 'a', newline='', encoding='utf8') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
    writer.writeheader()

    for number in range(3,4):
        driver.find_element_by_id('wt-his-select').click()
        select = Select(driver.find_element_by_id("wt-his-select"))
        select.select_by_index(number)
        selected_date = previous_dates[number]
        print (selected_date)
        dataValue = driver.find_element_by_xpath('//*[@id="wt-his"]/tbody/tr[4]')
        print (dataValue.get_attribute('innerHTML'))

问题在于结果中的属性值错误。例如,我试图刮擦表的第四行,但它返回以下内容:

<th>17:00</th><td class="wt-ic"><img class="mtt" title="Broken clouds." src="//c.tadst.com/gfx/w/40/wt-6.png" width="40" height="40"></td><td>28&nbsp;°C</td><td class="small">Broken clouds.</td><td class="sep">19 km/h</td><td><span class="comp sa18" title="Wind blowing from 30° North-northeast to South-southwest">↑</span></td><td>70%</td><td class="sep">1011 mbar</td><td>5&nbsp;km</td>

谢谢

0 个答案:

没有答案