在python中遇到beautifulsoup有问题

时间:2016-07-19 17:29:19

标签: python beautifulsoup

我是python的新手,但是下面的代码有问题。我试图在网站上获得温度或日期,但似乎无法获得输出。我尝试了很多变化,但似乎仍然无法做到正确..

感谢您的帮助!

#Code below: 
import requests,bs4
r = requests.get('http://www.hko.gov.hk/contente.htm')
print r.raise_for_status()
hkweather = bs4.BeautifulSoup(r.text)
print hkweather.select('div left_content fnd_day fnd_date')

2 个答案:

答案 0 :(得分:1)

您的css选择器不正确,您应该在标记和css类之间使用.,您想要的标记位于div中的fnd_day类,其中id为 fnd_content

divs = soup.select("#fnd_content div.fnd_day")

但由于它是通过ajax请求动态生成的,因此仍无法获取数据,您可以使用以下代码以 json 格式获取所有数据:

u = "http://www.hko.gov.hk/wxinfo/json/one_json.xml?_=1468955579991"

data = requests.get(u).json()

from pprint import pprint as pp
pp(data)

几乎返回所有动态内容,包括日期和临时等。

如果您访问密钥 F9D ,您可以查看所有临时和日期的一般天气描述:

from pprint import pprint as pp

pp(data['F9D'])

输出:

{'BulletinDate': '20160720',
 'BulletinTime': '0315',
 'GeneralSituation': 'A southwesterly airstream will bring showers to the '
                     'coast of Guangdong today. Under the dominance of an '
                     'upper-air anticyclone, it will be generally fine and '
                     'very hot over southern China in the latter part of this '
                     'week and early next week.',
 'NPTemp': '25',
 'WeatherForecast': [{'ForecastDate': '20160720',
                      'ForecastIcon': 'pic53.png',
                      'ForecastMaxrh': '95',
                      'ForecastMaxtemp': '32',
                      'ForecastMinrh': '70',
                      'ForecastMintemp': '26',
                      'ForecastWeather': 'Sunny periods and a few showers. '
                                         'Isolated squally thunderstorms at '
                                         'first.',
                      'ForecastWind': 'South to southwest force 4.',
                      'IconDesc': 'Sunny Periods with A Few Showers',
                      'WeekDay': '3'},
                     {'ForecastDate': '20160721',
                      'ForecastIcon': 'pic90.png',
                      'ForecastMaxrh': '90',
                      'ForecastMaxtemp': '33',
                      'ForecastMinrh': '65',
                      'ForecastMintemp': '28',
                      'ForecastWeather': 'Mainly fine and very hot apart from '
                                         'isolated showers in the morning.',
                      'ForecastWind': 'South to southwest force 3 to 4.',
                      'IconDesc': 'Hot',
                      'WeekDay': '4'},
                     {'ForecastDate': '20160722',
                      'ForecastIcon': 'pic90.png',
                      'ForecastMaxrh': '90',
                      'ForecastMaxtemp': '33',
                      'ForecastMinrh': '65',
                      'ForecastMintemp': '28',
                      'ForecastWeather': 'Mainly fine and very hot apart from '
                                         'isolated showers in the morning.',
                      'ForecastWind': 'Southwest force 3.',
                      'IconDesc': 'Hot',
                      'WeekDay': '5'},
                     {'ForecastDate': '20160723',
                      'ForecastIcon': 'pic90.png',
                      'ForecastMaxrh': '90',
                      'ForecastMaxtemp': '34',
                      'ForecastMinrh': '65',
                      'ForecastMintemp': '29',
                      'ForecastWeather': 'Fine and very hot.',
                      'ForecastWind': 'Southwest force 3.',
                      'IconDesc': 'Hot',
                      'WeekDay': '6'},
                     {'ForecastDate': '20160724',
                      'ForecastIcon': 'pic90.png',
                      'ForecastMaxrh': '90',
                      'ForecastMaxtemp': '34',
                      'ForecastMinrh': '65',
                      'ForecastMintemp': '29',
                      'ForecastWeather': 'Fine and very hot.',
                      'ForecastWind': 'Southwest force 3.',
                      'IconDesc': 'Hot',
                      'WeekDay': '0'},
                     {'ForecastDate': '20160725',
                      'ForecastIcon': 'pic90.png',
                      'ForecastMaxrh': '90',
                      'ForecastMaxtemp': '33',
                      'ForecastMinrh': '65',
                      'ForecastMintemp': '29',
                      'ForecastWeather': 'Mainly fine and very hot apart from '
                                         'isolated showers in the morning.',
                      'ForecastWind': 'South to southwest force 3.',
                      'IconDesc': 'Hot',
                      'WeekDay': '1'},
                     {'ForecastDate': '20160726',
                      'ForecastIcon': 'pic90.png',
                      'ForecastMaxrh': '90',
                      'ForecastMaxtemp': '33',
                      'ForecastMinrh': '65',
                      'ForecastMintemp': '29',
                      'ForecastWeather': 'Mainly fine and very hot apart from '
                                         'isolated showers in the morning.',
                      'ForecastWind': 'South to southwest force 3.',
                      'IconDesc': 'Hot',
                      'WeekDay': '2'},
                     {'ForecastDate': '20160727',
                      'ForecastIcon': 'pic90.png',
                      'ForecastMaxrh': '90',
                      'ForecastMaxtemp': '33',
                      'ForecastMinrh': '65',
                      'ForecastMintemp': '28',
                      'ForecastWeather': 'Mainly fine and very hot apart from '
                                         'isolated showers in the morning.',
                      'ForecastWind': 'Southwest force 3 to 4.',
                      'IconDesc': 'Hot',
                      'WeekDay': '3'},
                     {'ForecastDate': '20160728',
                      'ForecastIcon': 'pic90.png',
                      'ForecastMaxrh': '90',
                      'ForecastMaxtemp': '33',
                      'ForecastMinrh': '65',
                      'ForecastMintemp': '28',
                      'ForecastWeather': 'Mainly fine and very hot apart from '
                                         'isolated showers in the morning.',
                      'ForecastWind': 'Southwest force 3 to 4.',
                      'IconDesc': 'Hot',
                      'WeekDay': '4'}]}

唯一的查询字符串参数是epoch timestamp,您可以使用lib:

生成它
from time import time
u = "http://www.hko.gov.hk/wxinfo/json/one_json.xml?_={}".format(int(time()))

data = requests.get(u).json()

未传递时间戳也会返回相同的数据,因此我将让您调查其重要性。

答案 1 :(得分:0)

我能够得到日期:

>>> import requests,bs4
>>> r = requests.get('http://www.hko.gov.hk/contente.htm')
>>> hkweather = bs4.BeautifulSoup(r.text)
>>> print hkweather.select('div[class="fnd_date"]')
# [<div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>]

但文字遗失了。这似乎不是BeautifulSoup的一个问题,因为我自己查看了r.text,我看到的只是<div class="fnd_date"></div>而不是<div class="fnd_date">July 20</div>

您可以使用正则表达式检查文本是否存在(尽管使用带有HTML的正则表达式是不受欢迎的):

>>> import re
>>> re.findall(r'<[^<>]*fnd_date[^<>]*>[^>]*>', r.text)
# [u'<div id="fnd_date" class="date"></div>', ... repeated 10 times]