我想抓取2018年12月18日在钦奈天气历史记录下的表格。该表格具有以下变量-时间,温度,天气,风,湿度,晴雨表和能见度。这是我的尝试:
import csv
from datetime import date
from selenium import webdriver
from selenium.webdriver.support.ui import Select
def valueGetter(date_list, type):
dates = []
for anchor in date_list:
dates.append(anchor.get_attribute('innerHTML'))
return dates
chrome_path = "/usr/bin/chromedriver"
d = date.today()
weather_data = "/home/kcube/Desktop/Selenium-Project/timeanddate.com/met_verif_"+str(d.strftime("%d-%m-%Y"))+".csv"
fieldnames = ['humidity']
driver = webdriver.Chrome(chrome_path)
QueryFormatString = 'https://www.timeanddate.com/weather/india/chennai/historic'
driver.get(QueryFormatString)
driver.find_element_by_id('wt-his-select').click()
select = Select(driver.find_element_by_id("wt-his-select"))
option = select.options
previous_dates = valueGetter(option , "null")
with open(weather_data, 'a', newline='', encoding='utf8') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
for number in range(3,4):
driver.find_element_by_id('wt-his-select').click()
select = Select(driver.find_element_by_id("wt-his-select"))
select.select_by_index(number)
selected_date = previous_dates[number]
print (selected_date)
dataValue = driver.find_element_by_xpath('//*[@id="wt-his"]/tbody/tr[4]')
print (dataValue.get_attribute('innerHTML'))
问题在于结果中的属性值错误。例如,我试图刮擦表的第四行,但它返回以下内容:
<th>17:00</th><td class="wt-ic"><img class="mtt" title="Broken clouds." src="//c.tadst.com/gfx/w/40/wt-6.png" width="40" height="40"></td><td>28 °C</td><td class="small">Broken clouds.</td><td class="sep">19 km/h</td><td><span class="comp sa18" title="Wind blowing from 30° North-northeast to South-southwest">↑</span></td><td>70%</td><td class="sep">1011 mbar</td><td>5 km</td>
谢谢