我是python的新手,但是下面的代码有问题。我试图在网站上获得温度或日期,但似乎无法获得输出。我尝试了很多变化,但似乎仍然无法做到正确..
感谢您的帮助!
#Code below:
import requests,bs4
r = requests.get('http://www.hko.gov.hk/contente.htm')
print r.raise_for_status()
hkweather = bs4.BeautifulSoup(r.text)
print hkweather.select('div left_content fnd_day fnd_date')
答案 0 :(得分:1)
您的css选择器不正确,您应该在标记和css类之间使用.
,您想要的标记位于div中的fnd_day
类,其中id为 fnd_content
divs = soup.select("#fnd_content div.fnd_day")
但由于它是通过ajax请求动态生成的,因此仍无法获取数据,您可以使用以下代码以 json 格式获取所有数据:
u = "http://www.hko.gov.hk/wxinfo/json/one_json.xml?_=1468955579991"
data = requests.get(u).json()
from pprint import pprint as pp
pp(data)
几乎返回所有动态内容,包括日期和临时等。
如果您访问密钥 F9D ,您可以查看所有临时和日期的一般天气描述:
from pprint import pprint as pp
pp(data['F9D'])
输出:
{'BulletinDate': '20160720',
'BulletinTime': '0315',
'GeneralSituation': 'A southwesterly airstream will bring showers to the '
'coast of Guangdong today. Under the dominance of an '
'upper-air anticyclone, it will be generally fine and '
'very hot over southern China in the latter part of this '
'week and early next week.',
'NPTemp': '25',
'WeatherForecast': [{'ForecastDate': '20160720',
'ForecastIcon': 'pic53.png',
'ForecastMaxrh': '95',
'ForecastMaxtemp': '32',
'ForecastMinrh': '70',
'ForecastMintemp': '26',
'ForecastWeather': 'Sunny periods and a few showers. '
'Isolated squally thunderstorms at '
'first.',
'ForecastWind': 'South to southwest force 4.',
'IconDesc': 'Sunny Periods with A Few Showers',
'WeekDay': '3'},
{'ForecastDate': '20160721',
'ForecastIcon': 'pic90.png',
'ForecastMaxrh': '90',
'ForecastMaxtemp': '33',
'ForecastMinrh': '65',
'ForecastMintemp': '28',
'ForecastWeather': 'Mainly fine and very hot apart from '
'isolated showers in the morning.',
'ForecastWind': 'South to southwest force 3 to 4.',
'IconDesc': 'Hot',
'WeekDay': '4'},
{'ForecastDate': '20160722',
'ForecastIcon': 'pic90.png',
'ForecastMaxrh': '90',
'ForecastMaxtemp': '33',
'ForecastMinrh': '65',
'ForecastMintemp': '28',
'ForecastWeather': 'Mainly fine and very hot apart from '
'isolated showers in the morning.',
'ForecastWind': 'Southwest force 3.',
'IconDesc': 'Hot',
'WeekDay': '5'},
{'ForecastDate': '20160723',
'ForecastIcon': 'pic90.png',
'ForecastMaxrh': '90',
'ForecastMaxtemp': '34',
'ForecastMinrh': '65',
'ForecastMintemp': '29',
'ForecastWeather': 'Fine and very hot.',
'ForecastWind': 'Southwest force 3.',
'IconDesc': 'Hot',
'WeekDay': '6'},
{'ForecastDate': '20160724',
'ForecastIcon': 'pic90.png',
'ForecastMaxrh': '90',
'ForecastMaxtemp': '34',
'ForecastMinrh': '65',
'ForecastMintemp': '29',
'ForecastWeather': 'Fine and very hot.',
'ForecastWind': 'Southwest force 3.',
'IconDesc': 'Hot',
'WeekDay': '0'},
{'ForecastDate': '20160725',
'ForecastIcon': 'pic90.png',
'ForecastMaxrh': '90',
'ForecastMaxtemp': '33',
'ForecastMinrh': '65',
'ForecastMintemp': '29',
'ForecastWeather': 'Mainly fine and very hot apart from '
'isolated showers in the morning.',
'ForecastWind': 'South to southwest force 3.',
'IconDesc': 'Hot',
'WeekDay': '1'},
{'ForecastDate': '20160726',
'ForecastIcon': 'pic90.png',
'ForecastMaxrh': '90',
'ForecastMaxtemp': '33',
'ForecastMinrh': '65',
'ForecastMintemp': '29',
'ForecastWeather': 'Mainly fine and very hot apart from '
'isolated showers in the morning.',
'ForecastWind': 'South to southwest force 3.',
'IconDesc': 'Hot',
'WeekDay': '2'},
{'ForecastDate': '20160727',
'ForecastIcon': 'pic90.png',
'ForecastMaxrh': '90',
'ForecastMaxtemp': '33',
'ForecastMinrh': '65',
'ForecastMintemp': '28',
'ForecastWeather': 'Mainly fine and very hot apart from '
'isolated showers in the morning.',
'ForecastWind': 'Southwest force 3 to 4.',
'IconDesc': 'Hot',
'WeekDay': '3'},
{'ForecastDate': '20160728',
'ForecastIcon': 'pic90.png',
'ForecastMaxrh': '90',
'ForecastMaxtemp': '33',
'ForecastMinrh': '65',
'ForecastMintemp': '28',
'ForecastWeather': 'Mainly fine and very hot apart from '
'isolated showers in the morning.',
'ForecastWind': 'Southwest force 3 to 4.',
'IconDesc': 'Hot',
'WeekDay': '4'}]}
唯一的查询字符串参数是epoch timestamp,您可以使用lib:
生成它from time import time
u = "http://www.hko.gov.hk/wxinfo/json/one_json.xml?_={}".format(int(time()))
data = requests.get(u).json()
未传递时间戳也会返回相同的数据,因此我将让您调查其重要性。
答案 1 :(得分:0)
我能够得到日期:
>>> import requests,bs4
>>> r = requests.get('http://www.hko.gov.hk/contente.htm')
>>> hkweather = bs4.BeautifulSoup(r.text)
>>> print hkweather.select('div[class="fnd_date"]')
# [<div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>, <div class="fnd_date"></div>]
但文字遗失了。这似乎不是BeautifulSoup的一个问题,因为我自己查看了r.text
,我看到的只是<div class="fnd_date"></div>
而不是<div class="fnd_date">July 20</div>
。
您可以使用正则表达式检查文本是否存在(尽管使用带有HTML的正则表达式是不受欢迎的):
>>> import re
>>> re.findall(r'<[^<>]*fnd_date[^<>]*>[^>]*>', r.text)
# [u'<div id="fnd_date" class="date"></div>', ... repeated 10 times]