无法使用beautifulsoup提取某些div标签

时间:2019-06-13 09:58:30

标签: python web-scraping beautifulsoup

我想从网站“ https://www.timeanddate.com/weather”获取所需区域和日期的每日天气。但是我无法使用以下代码到达以下div类。我该怎么办?

我尝试使用BeautifulSoup进行提取。我想提取的信息在div和temp类中(它们是度和天气情况,例如“越过云层”等等。)因此,我尝试了以下代码;

Screenshot

import requests

url = 'https://www.timeanddate.com/weather/spain/salou/historic?month=1&year=2014'

result = requests.get(url, verify = False)
soup = BeautifulSoup(result.text, "html.parser")
w1 = soup.findAll('div', attrs ={'class':'temp'})
w2 = soup.findAll('div', attrs ={'class':'wdesc'})

我希望从w1得到天气的程度(13/11°C),从w2得到零星的天气情况(零星的云)。但是相反,我从w1和w2得到了两个空列表。

3 个答案:

答案 0 :(得分:1)

您的数据在脚本中,因此解决方案之一是使用Selenium。如果您尚未安装,则可以安装它:

https://chromedriver.storage.googleapis.com/index.html?path=2.35/

这是代码:

from  selenium import webdriver

driver_path = r'chromedriverpath'
browser = webdriver.Chrome(executable_path=driver_path)
browser.get("https://www.timeanddate.com/weather/spain/salou/historic?month=1&year=2014")
meta = browser.execute_script('return data')


my_json_string = meta['detail']

print my_json_string

输出:

  

[{u'hlsh':u'1 Oca',u'templow':4,u'temp':14,u'hum':77,u'hls':u'1 Oca \ xc7ar', u'ts':u'06:00',u'wd':30,u'wind':5,u'hl':True,u'date':1388556000000,u'icon':2,u'ds ':u'1 Ocak 2014 \ xc7ar \ u015famba,06:00 \ u2014 12:00',u'baro':1019,u'desc':u'Passingclouds。'},{u'templow':11, u'temp':15,u'hum':72,u'ts':u'12:00',u'wd':210,u'wind':9,u'date':1388577600000,u'desc ':u'Passing cloud。',u'ds':u'1 Ocak 2014 \ xc7ar \ u015famba,12:00 \ u2014 18:00',u'baro':1016,u'icon':2},{ u'templow':9,u'temp':11,u'hum':90,u'ts':u'18:00',u'wd':0,u'wind':5,u'date ':1388599200000,u'desc':u'Passing cloud。',u'ds':u'1 Ocak 2014 \ xc7ar \ u015famba,18:00 \ u2014 00:00',u'baro':1015,u'图标':14},{u'hlsh':u'2 Oca',u'wd':0,u'hum':0,u'hls':u'2 Oca Per',u'ts':u '00:00',u'wind':0,u'hl':True,u'date':1388620800000,u'icon':36,u'ds':u'2 Ocak 2014 Per \ u015fembe,00: 00 \ u2014 06:00',u'baro':0,u'desc':u'没有可用的天气数据'},{u'templow':6,u'te mp':15,u'hum':93,u'ts':u'06:00',u'wd':0,u'wind':6,u'date':1388642400000,u'desc': u'passing cloud。',u'ds':u'2 Ocak 2014 Per \ u015fembe,06:00 \ u2014 12:00',u'baro':1013,u'icon':2},{u'templow ':15,u'temp':18,u'hum':61,u'ts':u'12:00',u'wd':0,u'wind':7,u'date':1388664000000 ,u'desc':u'Passing clouds。',u'ds':u'2 Ocak 2014 Per \ u015fembe,12:00 \ u2014 18:00',u'baro':1013,u'icon':2 },{u'templow':13,u'temp':15,u'hum':80,u'ts':u'18:00',u'wd':0,u'wind':4, u'date':1388685600000,u'desc':u'Passing云彩。',u'ds':u'2 Ocak 2014 Per \ u015fembe,18:00 \ u2014 00:00',u'baro':1014, u'icon':14},{u'hlsh':u'3 Oca',u'wd':0,u'hum':0,u'hls':u'3 Oca Cum',u'ts' :u'00:00',u'wind':0,u'hl':True,u'date':1388707200000,u'icon':36,u'ds':u'3 Ocak 2014 Cuma,00: 00 \ u2014 06:00',u'baro':0,u'desc':u'没有可用的天气数据'},{u'templow':9,u'temp':18,u'hum':76 ,u'ts':u'06:00',u'wd':0,u'wind':6,u'date':1388728800000,u'desc':u'Passin g cloud。',u'ds':u'3 Ocak 2014 Cuma,06:00 \ u2014 12:00',u'baro':1015,u'icon':2},{u'templow':17, u'temp':20,u'hum':55,u'ts':u'12:00',u'wd':290,u'wind':11,u'date':1388750400000,u'desc ':u'Passing cloud。',u'ds':u'3 Ocak 2014 Cuma,12:00 \ u2014 18:00',u'baro':1016,u'icon':2},..至结束

当您到达这些列表时,可以使用json或其他方式解析它。使用硒是其中一种选择

答案 1 :(得分:0)

我想您想看看称为wt-his的表。 它似乎包含您要查找的所有值的行。

Screenshot of table

答案 2 :(得分:0)

如果您只想使用表(假设它在<table>标签下),那么使用Pandas来拉取该表要比直接使用BeautifulSoup容易得多。

import pandas as pd

url = 'https://www.timeanddate.com/weather/spain/salou/historic?month=1&year=2014'
tables = pd.read_html(url)
df = tables[-1]

输出:

print (df.to_string())
                  Unnamed: 0_level_0                        Conditions                                                                                               Comfort                                                                                    Unnamed: 7_level_0                Unnamed: 8_level_0
                                Time                Unnamed: 1_level_1                              Temp                           Weather                              Wind                Unnamed: 5_level_1                          Humidity                         Barometer                        Visibility
0                  7:00 amWed, Jan 1                               NaN                             39 °F                            Clear.                             3 mph                                 ↑                               93%                         30.07 "Hg                             10 mi
1                            7:30 am                               NaN                             41 °F                            Clear.                             3 mph                                 ↑                               87%                         30.07 "Hg                             10 mi
2                            8:00 am                               NaN                             41 °F                   Passing clouds.                             5 mph                                 ↑                               87%                         30.07 "Hg                               NaN
3                            8:30 am                               NaN                             43 °F                   Passing clouds.                             6 mph                                 ↑                               81%                         30.07 "Hg                               NaN
4                            9:00 am                               NaN                             43 °F                   Passing clouds.                             2 mph                                 ↑                               87%                         30.07 "Hg                               NaN
5                            9:30 am                               NaN                             46 °F                   Passing clouds.                             5 mph                                 ↑                               76%                         30.07 "Hg                               NaN
6                           10:00 am                               NaN                             48 °F                   Passing clouds.                             3 mph                                 ↑                               76%                         30.09 "Hg                               NaN
7                           10:30 am                               NaN                             54 °F                   Passing clouds.                           No wind                                 ↑                               67%                         30.09 "Hg                               NaN
8                           11:00 am                               NaN                             55 °F                   Passing clouds.                           No wind                                 ↑                               63%                         30.09 "Hg                               NaN
9                           11:30 am                               NaN                             55 °F                   Passing clouds.                             3 mph                                 ↑                               63%                         30.09 "Hg                               NaN
10                          12:00 pm                               NaN                             57 °F                   Passing clouds.                             6 mph                                 ↑                               63%                         30.07 "Hg                               NaN
11                          12:30 pm                               NaN                             57 °F                   Passing clouds.                             8 mph                                 ↑                               67%                         30.07 "Hg                               NaN
12                           1:00 pm                               NaN                             59 °F                   Passing clouds.                             9 mph                                 ↑                               68%                         30.04 "Hg                               NaN
13                           2:00 pm                               NaN                             59 °F                   Passing clouds.                            10 mph                                 ↑                               68%                         30.01 "Hg                               NaN
14                           2:30 pm                               NaN                             59 °F                   Passing clouds.                             9 mph                                 ↑                               68%                         30.01 "Hg                               NaN
15                           3:00 pm                               NaN                             57 °F                   Passing clouds.                             7 mph                                 ↑                               67%                         30.01 "Hg                               NaN
16                           3:30 pm                               NaN                             57 °F                   Passing clouds.                             6 mph                                 ↑                               67%                         29.98 "Hg                               NaN
17                           4:00 pm                               NaN                             57 °F                   Passing clouds.                             3 mph                                 ↑                               72%                         30.01 "Hg                               NaN
18                           4:30 pm                               NaN                             55 °F                   Passing clouds.                             3 mph                                 ↑                               77%                         30.01 "Hg                               NaN
19                           5:00 pm                               NaN                             55 °F                   Passing clouds.                             1 mph                                 ↑                               77%                         29.98 "Hg                               NaN
20                           5:30 pm                               NaN                             54 °F                   Passing clouds.                           No wind                                 ↑                               82%                         30.01 "Hg                               NaN
21                           6:00 pm                               NaN                             52 °F                   Passing clouds.                             1 mph                                 ↑                               88%                         29.98 "Hg                               NaN
22                           6:30 pm                               NaN                             52 °F                   Passing clouds.                             1 mph                                 ↑                               88%                         29.98 "Hg                               NaN
23                           7:30 pm                               NaN                             50 °F                   Passing clouds.                             3 mph                                 ↑                               94%                         29.98 "Hg                               NaN
24                           8:00 pm                               NaN                             50 °F                   Passing clouds.                             3 mph                                 ↑                               94%                         29.98 "Hg                               NaN
25                           8:30 pm                               NaN                             52 °F                   Passing clouds.                             7 mph                                 ↑                               88%                         29.98 "Hg                               NaN
26                           9:00 pm                               NaN                             52 °F                   Passing clouds.                             5 mph                                 ↑                               82%                         29.98 "Hg                               NaN
27                           9:30 pm                               NaN                             50 °F                   Passing clouds.                             5 mph                                 ↑                               88%                         29.98 "Hg                               NaN
28                          10:00 pm                               NaN                             48 °F       Light rain. Passing clouds.                             1 mph                                 ↑                               94%                         29.95 "Hg                               NaN
29  

其他:

要获得多个工作日,我们将通过ajax获取他的数据。我们将遍历这些请求。我们还需要对返回的信息进行一点操作,因为它并非完全采用有效的json格式,但看起来是一致的,因此不应该成为问题。

注意:您需要更改start_datenum_of_days才能获得所需的内容。该示例从2014年1月1日开始,并将在该天加上接下来的9天(因此总共有10天)

import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
import re
import datetime

start_date = '20140101'
num_of_days = 10

url = 'https://www.timeanddate.com/scripts/cityajax.php'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.131 Safari/537.36'}

datetime_object = datetime.datetime.strptime(start_date, '%Y%m%d')


results = pd.DataFrame()
for x in range(num_of_days):
    parse_time = datetime_object + datetime.timedelta(days=x)
    str_time = parse_time.strftime('%Y%m%d')
    month = parse_time.strftime('%#m')
    year = parse_time.strftime('%Y')

    payload = {
    'n': 'spain/salou',
    'mode': 'historic',
    'hd': str_time,
    'month': month,
    'year': year,
    'json': '1'}

    jsonStr = requests.get(url, headers=headers, params=payload).text
    jsonStr = jsonStr.replace('c:','"c":')
    jsonStr = jsonStr.replace('h:','"h":')
    jsonStr = jsonStr.replace('s:','"s":')

    jsonData = json.loads(jsonStr)

    for alpha in jsonData:
        row = alpha['c']

        try:
            date = BeautifulSoup(row[0]['h'], 'html.parser').find('span').text
        except:
            pass

        time = re.findall(r'\b((1[0-2]|0?[1-9]):([0-5][0-9]) ([AaPp][Mm]))', BeautifulSoup(row[0]['h'], 'html.parser').text)[0][0]
        condition = BeautifulSoup(row[3]['h'], 'html.parser').text
        temp = BeautifulSoup(row[2]['h'], 'html.parser').text.replace('\xa0', ' ')
        wspd = BeautifulSoup(row[4]['h'], 'html.parser').text
        wdir = BeautifulSoup(row[5]['h'], 'html.parser').text
        wdesc = BeautifulSoup(row[5]['h'], 'html.parser').find('span')['title']
        humd = BeautifulSoup(row[6]['h'], 'html.parser').text
        barm = BeautifulSoup(row[7]['h'], 'html.parser').text
        vis = BeautifulSoup(row[8]['h'], 'html.parser').text.replace('\xa0', ' ')

        temp_df = pd.DataFrame([[date, time, temp, condition, wspd, wdir, wdesc, humd, barm, vis]], columns = ['Date', 'Time', 'Temp', 'Weather', 'Wind Speed', 'Wind Direction', 'Wind Description', 'Humidity', 'Barometer', 'Visibility'])

        print ('Processed: %s %s' %(date, time))
        results = results.append(temp_df).reset_index(drop=True)

输出:

print (results)
            Date      Time   Temp  ... Humidity  Barometer Visibility
0     Wed, Jan 1   7:00 am  39 °F  ...      93%  30.07 "Hg      10 mi
1     Wed, Jan 1   7:30 am  41 °F  ...      87%  30.07 "Hg      10 mi
2     Wed, Jan 1   8:00 am  41 °F  ...      87%  30.07 "Hg        N/A
3     Wed, Jan 1   8:30 am  43 °F  ...      81%  30.07 "Hg        N/A
4     Wed, Jan 1   9:00 am  43 °F  ...      87%  30.07 "Hg        N/A
5     Wed, Jan 1   9:30 am  46 °F  ...      76%  30.07 "Hg        N/A
6     Wed, Jan 1  10:00 am  48 °F  ...      76%  30.09 "Hg        N/A
7     Wed, Jan 1  10:30 am  54 °F  ...      67%  30.09 "Hg        N/A
8     Wed, Jan 1  11:00 am  55 °F  ...      63%  30.09 "Hg        N/A
9     Wed, Jan 1  11:30 am  55 °F  ...      63%  30.09 "Hg        N/A
10    Wed, Jan 1  12:00 pm  57 °F  ...      63%  30.07 "Hg        N/A
11    Wed, Jan 1  12:30 pm  57 °F  ...      67%  30.07 "Hg        N/A
12    Wed, Jan 1   1:00 pm  59 °F  ...      68%  30.04 "Hg        N/A
13    Wed, Jan 1   2:00 pm  59 °F  ...      68%  30.01 "Hg        N/A
14    Wed, Jan 1   2:30 pm  59 °F  ...      68%  30.01 "Hg        N/A
15    Wed, Jan 1   3:00 pm  57 °F  ...      67%  30.01 "Hg        N/A
16    Wed, Jan 1   3:30 pm  57 °F  ...      67%  29.98 "Hg        N/A
17    Wed, Jan 1   4:00 pm  57 °F  ...      72%  30.01 "Hg        N/A
18    Wed, Jan 1   4:30 pm  55 °F  ...      77%  30.01 "Hg        N/A
19    Wed, Jan 1   5:00 pm  55 °F  ...      77%  29.98 "Hg        N/A
20    Wed, Jan 1   5:30 pm  54 °F  ...      82%  30.01 "Hg        N/A
21    Wed, Jan 1   6:00 pm  52 °F  ...      88%  29.98 "Hg        N/A
22    Wed, Jan 1   6:30 pm  52 °F  ...      88%  29.98 "Hg        N/A
23    Wed, Jan 1   7:30 pm  50 °F  ...      94%  29.98 "Hg        N/A
24    Wed, Jan 1   8:00 pm  50 °F  ...      94%  29.98 "Hg        N/A
25    Wed, Jan 1   8:30 pm  52 °F  ...      88%  29.98 "Hg        N/A
26    Wed, Jan 1   9:00 pm  52 °F  ...      82%  29.98 "Hg        N/A
27    Wed, Jan 1   9:30 pm  50 °F  ...      88%  29.98 "Hg        N/A
28    Wed, Jan 1  10:00 pm  48 °F  ...      94%  29.95 "Hg        N/A
29    Thu, Jan 2   7:00 am  43 °F  ...     100%  29.89 "Hg        N/A
..           ...       ...    ...  ...      ...        ...        ...
307  Sat, Jan 11   7:30 am  52 °F  ...      82%  30.07 "Hg        N/A
308  Sat, Jan 11   8:00 am  52 °F  ...      82%  30.07 "Hg        N/A
309  Sat, Jan 11   8:30 am  54 °F  ...      82%  30.07 "Hg        N/A
310  Sat, Jan 11   9:00 am  54 °F  ...      77%  30.09 "Hg        N/A
311  Sat, Jan 11   9:30 am  54 °F  ...      82%  30.09 "Hg        N/A
312  Sat, Jan 11  10:00 am  54 °F  ...      82%  30.12 "Hg       4 mi
313  Sat, Jan 11  10:30 am  54 °F  ...      82%  30.12 "Hg       4 mi
314  Sat, Jan 11  11:00 am  54 °F  ...      82%  30.12 "Hg       4 mi
315  Sat, Jan 11  11:30 am  55 °F  ...      77%  30.12 "Hg       4 mi
316  Sat, Jan 11  12:00 pm  57 °F  ...      72%  30.12 "Hg       4 mi
317  Sat, Jan 11  12:30 pm  57 °F  ...      72%  30.12 "Hg        N/A
318  Sat, Jan 11   1:00 pm  57 °F  ...      72%  30.09 "Hg        N/A
319  Sat, Jan 11   1:30 pm  57 °F  ...      72%  30.09 "Hg        N/A
320  Sat, Jan 11   2:00 pm  57 °F  ...      72%  30.09 "Hg        N/A
321  Sat, Jan 11   2:30 pm  59 °F  ...      72%  30.09 "Hg        N/A
322  Sat, Jan 11   3:00 pm  59 °F  ...      72%  30.07 "Hg        N/A
323  Sat, Jan 11   3:30 pm  59 °F  ...      72%  30.09 "Hg        N/A
324  Sat, Jan 11   4:00 pm  57 °F  ...      77%  30.09 "Hg        N/A
325  Sat, Jan 11   4:30 pm  57 °F  ...      77%  30.09 "Hg        N/A
326  Sat, Jan 11   5:00 pm  55 °F  ...      88%  30.09 "Hg        N/A
327  Sat, Jan 11   5:30 pm  55 °F  ...      88%  30.09 "Hg       6 mi
328  Sat, Jan 11   6:00 pm  55 °F  ...      88%  30.12 "Hg       3 mi
329  Sat, Jan 11   6:30 pm  55 °F  ...      94%  30.12 "Hg       3 mi
330  Sat, Jan 11   7:00 pm  55 °F  ...      94%  30.12 "Hg       4 mi
331  Sat, Jan 11   7:30 pm  54 °F  ...     100%  30.12 "Hg       4 mi
332  Sat, Jan 11   8:00 pm  54 °F  ...     100%  30.15 "Hg       6 mi
333  Sat, Jan 11   8:30 pm  54 °F  ...     100%  30.15 "Hg       6 mi
334  Sat, Jan 11   9:00 pm  54 °F  ...     100%  30.15 "Hg       6 mi
335  Sat, Jan 11   9:30 pm  54 °F  ...      94%  30.15 "Hg       6 mi
336  Sat, Jan 11  10:00 pm  54 °F  ...      94%  30.15 "Hg       6 mi

[337 rows x 10 columns]