我想从网站“ https://www.timeanddate.com/weather”获取所需区域和日期的每日天气。但是我无法使用以下代码到达以下div类。我该怎么办?
我尝试使用BeautifulSoup进行提取。我想提取的信息在div和temp类中(它们是度和天气情况,例如“越过云层”等等。)因此,我尝试了以下代码;
import requests
url = 'https://www.timeanddate.com/weather/spain/salou/historic?month=1&year=2014'
result = requests.get(url, verify = False)
soup = BeautifulSoup(result.text, "html.parser")
w1 = soup.findAll('div', attrs ={'class':'temp'})
w2 = soup.findAll('div', attrs ={'class':'wdesc'})
我希望从w1得到天气的程度(13/11°C),从w2得到零星的天气情况(零星的云)。但是相反,我从w1和w2得到了两个空列表。
答案 0 :(得分:1)
您的数据在脚本中,因此解决方案之一是使用Selenium。如果您尚未安装,则可以安装它:
https://chromedriver.storage.googleapis.com/index.html?path=2.35/
这是代码:
from selenium import webdriver
driver_path = r'chromedriverpath'
browser = webdriver.Chrome(executable_path=driver_path)
browser.get("https://www.timeanddate.com/weather/spain/salou/historic?month=1&year=2014")
meta = browser.execute_script('return data')
my_json_string = meta['detail']
print my_json_string
输出:
[{u'hlsh':u'1 Oca',u'templow':4,u'temp':14,u'hum':77,u'hls':u'1 Oca \ xc7ar', u'ts':u'06:00',u'wd':30,u'wind':5,u'hl':True,u'date':1388556000000,u'icon':2,u'ds ':u'1 Ocak 2014 \ xc7ar \ u015famba,06:00 \ u2014 12:00',u'baro':1019,u'desc':u'Passingclouds。'},{u'templow':11, u'temp':15,u'hum':72,u'ts':u'12:00',u'wd':210,u'wind':9,u'date':1388577600000,u'desc ':u'Passing cloud。',u'ds':u'1 Ocak 2014 \ xc7ar \ u015famba,12:00 \ u2014 18:00',u'baro':1016,u'icon':2},{ u'templow':9,u'temp':11,u'hum':90,u'ts':u'18:00',u'wd':0,u'wind':5,u'date ':1388599200000,u'desc':u'Passing cloud。',u'ds':u'1 Ocak 2014 \ xc7ar \ u015famba,18:00 \ u2014 00:00',u'baro':1015,u'图标':14},{u'hlsh':u'2 Oca',u'wd':0,u'hum':0,u'hls':u'2 Oca Per',u'ts':u '00:00',u'wind':0,u'hl':True,u'date':1388620800000,u'icon':36,u'ds':u'2 Ocak 2014 Per \ u015fembe,00: 00 \ u2014 06:00',u'baro':0,u'desc':u'没有可用的天气数据'},{u'templow':6,u'te mp':15,u'hum':93,u'ts':u'06:00',u'wd':0,u'wind':6,u'date':1388642400000,u'desc': u'passing cloud。',u'ds':u'2 Ocak 2014 Per \ u015fembe,06:00 \ u2014 12:00',u'baro':1013,u'icon':2},{u'templow ':15,u'temp':18,u'hum':61,u'ts':u'12:00',u'wd':0,u'wind':7,u'date':1388664000000 ,u'desc':u'Passing clouds。',u'ds':u'2 Ocak 2014 Per \ u015fembe,12:00 \ u2014 18:00',u'baro':1013,u'icon':2 },{u'templow':13,u'temp':15,u'hum':80,u'ts':u'18:00',u'wd':0,u'wind':4, u'date':1388685600000,u'desc':u'Passing云彩。',u'ds':u'2 Ocak 2014 Per \ u015fembe,18:00 \ u2014 00:00',u'baro':1014, u'icon':14},{u'hlsh':u'3 Oca',u'wd':0,u'hum':0,u'hls':u'3 Oca Cum',u'ts' :u'00:00',u'wind':0,u'hl':True,u'date':1388707200000,u'icon':36,u'ds':u'3 Ocak 2014 Cuma,00: 00 \ u2014 06:00',u'baro':0,u'desc':u'没有可用的天气数据'},{u'templow':9,u'temp':18,u'hum':76 ,u'ts':u'06:00',u'wd':0,u'wind':6,u'date':1388728800000,u'desc':u'Passin g cloud。',u'ds':u'3 Ocak 2014 Cuma,06:00 \ u2014 12:00',u'baro':1015,u'icon':2},{u'templow':17, u'temp':20,u'hum':55,u'ts':u'12:00',u'wd':290,u'wind':11,u'date':1388750400000,u'desc ':u'Passing cloud。',u'ds':u'3 Ocak 2014 Cuma,12:00 \ u2014 18:00',u'baro':1016,u'icon':2},..至结束
当您到达这些列表时,可以使用json或其他方式解析它。使用硒是其中一种选择
答案 1 :(得分:0)
答案 2 :(得分:0)
如果您只想使用表(假设它在<table>
标签下),那么使用Pandas来拉取该表要比直接使用BeautifulSoup容易得多。
import pandas as pd
url = 'https://www.timeanddate.com/weather/spain/salou/historic?month=1&year=2014'
tables = pd.read_html(url)
df = tables[-1]
输出:
print (df.to_string())
Unnamed: 0_level_0 Conditions Comfort Unnamed: 7_level_0 Unnamed: 8_level_0
Time Unnamed: 1_level_1 Temp Weather Wind Unnamed: 5_level_1 Humidity Barometer Visibility
0 7:00 amWed, Jan 1 NaN 39 °F Clear. 3 mph ↑ 93% 30.07 "Hg 10 mi
1 7:30 am NaN 41 °F Clear. 3 mph ↑ 87% 30.07 "Hg 10 mi
2 8:00 am NaN 41 °F Passing clouds. 5 mph ↑ 87% 30.07 "Hg NaN
3 8:30 am NaN 43 °F Passing clouds. 6 mph ↑ 81% 30.07 "Hg NaN
4 9:00 am NaN 43 °F Passing clouds. 2 mph ↑ 87% 30.07 "Hg NaN
5 9:30 am NaN 46 °F Passing clouds. 5 mph ↑ 76% 30.07 "Hg NaN
6 10:00 am NaN 48 °F Passing clouds. 3 mph ↑ 76% 30.09 "Hg NaN
7 10:30 am NaN 54 °F Passing clouds. No wind ↑ 67% 30.09 "Hg NaN
8 11:00 am NaN 55 °F Passing clouds. No wind ↑ 63% 30.09 "Hg NaN
9 11:30 am NaN 55 °F Passing clouds. 3 mph ↑ 63% 30.09 "Hg NaN
10 12:00 pm NaN 57 °F Passing clouds. 6 mph ↑ 63% 30.07 "Hg NaN
11 12:30 pm NaN 57 °F Passing clouds. 8 mph ↑ 67% 30.07 "Hg NaN
12 1:00 pm NaN 59 °F Passing clouds. 9 mph ↑ 68% 30.04 "Hg NaN
13 2:00 pm NaN 59 °F Passing clouds. 10 mph ↑ 68% 30.01 "Hg NaN
14 2:30 pm NaN 59 °F Passing clouds. 9 mph ↑ 68% 30.01 "Hg NaN
15 3:00 pm NaN 57 °F Passing clouds. 7 mph ↑ 67% 30.01 "Hg NaN
16 3:30 pm NaN 57 °F Passing clouds. 6 mph ↑ 67% 29.98 "Hg NaN
17 4:00 pm NaN 57 °F Passing clouds. 3 mph ↑ 72% 30.01 "Hg NaN
18 4:30 pm NaN 55 °F Passing clouds. 3 mph ↑ 77% 30.01 "Hg NaN
19 5:00 pm NaN 55 °F Passing clouds. 1 mph ↑ 77% 29.98 "Hg NaN
20 5:30 pm NaN 54 °F Passing clouds. No wind ↑ 82% 30.01 "Hg NaN
21 6:00 pm NaN 52 °F Passing clouds. 1 mph ↑ 88% 29.98 "Hg NaN
22 6:30 pm NaN 52 °F Passing clouds. 1 mph ↑ 88% 29.98 "Hg NaN
23 7:30 pm NaN 50 °F Passing clouds. 3 mph ↑ 94% 29.98 "Hg NaN
24 8:00 pm NaN 50 °F Passing clouds. 3 mph ↑ 94% 29.98 "Hg NaN
25 8:30 pm NaN 52 °F Passing clouds. 7 mph ↑ 88% 29.98 "Hg NaN
26 9:00 pm NaN 52 °F Passing clouds. 5 mph ↑ 82% 29.98 "Hg NaN
27 9:30 pm NaN 50 °F Passing clouds. 5 mph ↑ 88% 29.98 "Hg NaN
28 10:00 pm NaN 48 °F Light rain. Passing clouds. 1 mph ↑ 94% 29.95 "Hg NaN
29
其他:
要获得多个工作日,我们将通过ajax获取他的数据。我们将遍历这些请求。我们还需要对返回的信息进行一点操作,因为它并非完全采用有效的json格式,但看起来是一致的,因此不应该成为问题。
注意:您需要更改start_date
和num_of_days
才能获得所需的内容。该示例从2014年1月1日开始,并将在该天加上接下来的9天(因此总共有10天)
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
import re
import datetime
start_date = '20140101'
num_of_days = 10
url = 'https://www.timeanddate.com/scripts/cityajax.php'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}
datetime_object = datetime.datetime.strptime(start_date, '%Y%m%d')
results = pd.DataFrame()
for x in range(num_of_days):
parse_time = datetime_object + datetime.timedelta(days=x)
str_time = parse_time.strftime('%Y%m%d')
month = parse_time.strftime('%#m')
year = parse_time.strftime('%Y')
payload = {
'n': 'spain/salou',
'mode': 'historic',
'hd': str_time,
'month': month,
'year': year,
'json': '1'}
jsonStr = requests.get(url, headers=headers, params=payload).text
jsonStr = jsonStr.replace('c:','"c":')
jsonStr = jsonStr.replace('h:','"h":')
jsonStr = jsonStr.replace('s:','"s":')
jsonData = json.loads(jsonStr)
for alpha in jsonData:
row = alpha['c']
try:
date = BeautifulSoup(row[0]['h'], 'html.parser').find('span').text
except:
pass
time = re.findall(r'\b((1[0-2]|0?[1-9]):([0-5][0-9]) ([AaPp][Mm]))', BeautifulSoup(row[0]['h'], 'html.parser').text)[0][0]
condition = BeautifulSoup(row[3]['h'], 'html.parser').text
temp = BeautifulSoup(row[2]['h'], 'html.parser').text.replace('\xa0', ' ')
wspd = BeautifulSoup(row[4]['h'], 'html.parser').text
wdir = BeautifulSoup(row[5]['h'], 'html.parser').text
wdesc = BeautifulSoup(row[5]['h'], 'html.parser').find('span')['title']
humd = BeautifulSoup(row[6]['h'], 'html.parser').text
barm = BeautifulSoup(row[7]['h'], 'html.parser').text
vis = BeautifulSoup(row[8]['h'], 'html.parser').text.replace('\xa0', ' ')
temp_df = pd.DataFrame([[date, time, temp, condition, wspd, wdir, wdesc, humd, barm, vis]], columns = ['Date', 'Time', 'Temp', 'Weather', 'Wind Speed', 'Wind Direction', 'Wind Description', 'Humidity', 'Barometer', 'Visibility'])
print ('Processed: %s %s' %(date, time))
results = results.append(temp_df).reset_index(drop=True)
输出:
print (results)
Date Time Temp ... Humidity Barometer Visibility
0 Wed, Jan 1 7:00 am 39 °F ... 93% 30.07 "Hg 10 mi
1 Wed, Jan 1 7:30 am 41 °F ... 87% 30.07 "Hg 10 mi
2 Wed, Jan 1 8:00 am 41 °F ... 87% 30.07 "Hg N/A
3 Wed, Jan 1 8:30 am 43 °F ... 81% 30.07 "Hg N/A
4 Wed, Jan 1 9:00 am 43 °F ... 87% 30.07 "Hg N/A
5 Wed, Jan 1 9:30 am 46 °F ... 76% 30.07 "Hg N/A
6 Wed, Jan 1 10:00 am 48 °F ... 76% 30.09 "Hg N/A
7 Wed, Jan 1 10:30 am 54 °F ... 67% 30.09 "Hg N/A
8 Wed, Jan 1 11:00 am 55 °F ... 63% 30.09 "Hg N/A
9 Wed, Jan 1 11:30 am 55 °F ... 63% 30.09 "Hg N/A
10 Wed, Jan 1 12:00 pm 57 °F ... 63% 30.07 "Hg N/A
11 Wed, Jan 1 12:30 pm 57 °F ... 67% 30.07 "Hg N/A
12 Wed, Jan 1 1:00 pm 59 °F ... 68% 30.04 "Hg N/A
13 Wed, Jan 1 2:00 pm 59 °F ... 68% 30.01 "Hg N/A
14 Wed, Jan 1 2:30 pm 59 °F ... 68% 30.01 "Hg N/A
15 Wed, Jan 1 3:00 pm 57 °F ... 67% 30.01 "Hg N/A
16 Wed, Jan 1 3:30 pm 57 °F ... 67% 29.98 "Hg N/A
17 Wed, Jan 1 4:00 pm 57 °F ... 72% 30.01 "Hg N/A
18 Wed, Jan 1 4:30 pm 55 °F ... 77% 30.01 "Hg N/A
19 Wed, Jan 1 5:00 pm 55 °F ... 77% 29.98 "Hg N/A
20 Wed, Jan 1 5:30 pm 54 °F ... 82% 30.01 "Hg N/A
21 Wed, Jan 1 6:00 pm 52 °F ... 88% 29.98 "Hg N/A
22 Wed, Jan 1 6:30 pm 52 °F ... 88% 29.98 "Hg N/A
23 Wed, Jan 1 7:30 pm 50 °F ... 94% 29.98 "Hg N/A
24 Wed, Jan 1 8:00 pm 50 °F ... 94% 29.98 "Hg N/A
25 Wed, Jan 1 8:30 pm 52 °F ... 88% 29.98 "Hg N/A
26 Wed, Jan 1 9:00 pm 52 °F ... 82% 29.98 "Hg N/A
27 Wed, Jan 1 9:30 pm 50 °F ... 88% 29.98 "Hg N/A
28 Wed, Jan 1 10:00 pm 48 °F ... 94% 29.95 "Hg N/A
29 Thu, Jan 2 7:00 am 43 °F ... 100% 29.89 "Hg N/A
.. ... ... ... ... ... ... ...
307 Sat, Jan 11 7:30 am 52 °F ... 82% 30.07 "Hg N/A
308 Sat, Jan 11 8:00 am 52 °F ... 82% 30.07 "Hg N/A
309 Sat, Jan 11 8:30 am 54 °F ... 82% 30.07 "Hg N/A
310 Sat, Jan 11 9:00 am 54 °F ... 77% 30.09 "Hg N/A
311 Sat, Jan 11 9:30 am 54 °F ... 82% 30.09 "Hg N/A
312 Sat, Jan 11 10:00 am 54 °F ... 82% 30.12 "Hg 4 mi
313 Sat, Jan 11 10:30 am 54 °F ... 82% 30.12 "Hg 4 mi
314 Sat, Jan 11 11:00 am 54 °F ... 82% 30.12 "Hg 4 mi
315 Sat, Jan 11 11:30 am 55 °F ... 77% 30.12 "Hg 4 mi
316 Sat, Jan 11 12:00 pm 57 °F ... 72% 30.12 "Hg 4 mi
317 Sat, Jan 11 12:30 pm 57 °F ... 72% 30.12 "Hg N/A
318 Sat, Jan 11 1:00 pm 57 °F ... 72% 30.09 "Hg N/A
319 Sat, Jan 11 1:30 pm 57 °F ... 72% 30.09 "Hg N/A
320 Sat, Jan 11 2:00 pm 57 °F ... 72% 30.09 "Hg N/A
321 Sat, Jan 11 2:30 pm 59 °F ... 72% 30.09 "Hg N/A
322 Sat, Jan 11 3:00 pm 59 °F ... 72% 30.07 "Hg N/A
323 Sat, Jan 11 3:30 pm 59 °F ... 72% 30.09 "Hg N/A
324 Sat, Jan 11 4:00 pm 57 °F ... 77% 30.09 "Hg N/A
325 Sat, Jan 11 4:30 pm 57 °F ... 77% 30.09 "Hg N/A
326 Sat, Jan 11 5:00 pm 55 °F ... 88% 30.09 "Hg N/A
327 Sat, Jan 11 5:30 pm 55 °F ... 88% 30.09 "Hg 6 mi
328 Sat, Jan 11 6:00 pm 55 °F ... 88% 30.12 "Hg 3 mi
329 Sat, Jan 11 6:30 pm 55 °F ... 94% 30.12 "Hg 3 mi
330 Sat, Jan 11 7:00 pm 55 °F ... 94% 30.12 "Hg 4 mi
331 Sat, Jan 11 7:30 pm 54 °F ... 100% 30.12 "Hg 4 mi
332 Sat, Jan 11 8:00 pm 54 °F ... 100% 30.15 "Hg 6 mi
333 Sat, Jan 11 8:30 pm 54 °F ... 100% 30.15 "Hg 6 mi
334 Sat, Jan 11 9:00 pm 54 °F ... 100% 30.15 "Hg 6 mi
335 Sat, Jan 11 9:30 pm 54 °F ... 94% 30.15 "Hg 6 mi
336 Sat, Jan 11 10:00 pm 54 °F ... 94% 30.15 "Hg 6 mi
[337 rows x 10 columns]