我希望从https://www.timeanddate.com/weather/usa/new-york/historic?month=12&year=2018刮起整个2018年12月的天气
此网页有2个选择字段。我对html和发布请求是完全陌生的。我已经阅读了Filling out a select tag with requests Python的答案。在我看来,我需要包括所有字段id-value对。下面是我的代码。
import requests
r = requests.post(
"https://www.timeanddate.com/weather/usa/new-york/historic?month=12&year=2018",
data={
"month": r'2018-12',
"wt-his-select": r"20181205",
})
我希望根据我在上面输入的id-value对,获得2018年12月5日的天气记录,但我总是会得到12月1日的天气
答案 0 :(得分:2)
由于数据以json格式存在,因此我们的beautifulsoup可以提取<script>
标签。然后将其读入字典以转换为数据框:
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
r = requests.get("https://www.timeanddate.com/weather/usa/new-york/historic?month=12&year=2018")
soup = BeautifulSoup(r.text, 'html.parser')
scripts = soup.find_all('script')
for script in scripts:
if 'var data=' in script.text:
jsonStr = script.text
jsonStr = jsonStr.split('var data=')[-1].split(';window.')[0]
jsonData = json.loads(jsonStr)
weather = jsonData['detail']
results = pd.DataFrame()
for each in weather:
results = results.append(pd.DataFrame([each]), sort=True).reset_index(drop=True)
输出:
print (results)
baro date desc ... ts wd wind
0 30.14 1.543622e+12 Clear. ... 12 am 0 0
1 30.21 1.543644e+12 Sunny. ... 6 am 0 0
2 30.17 1.543666e+12 Sunny. ... 12 pm 0 0
3 30.13 1.543687e+12 Light rain. Overcast. ... 6 pm 0 0
4 29.96 1.543709e+12 Light rain. Fog. ... 12 am 0 0
5 29.80 1.543730e+12 Light rain. Fog. ... 6 am 0 0
6 29.65 1.543752e+12 Fog. ... 12 pm 0 0
7 29.62 1.543774e+12 Fog. ... 6 pm 0 0
8 29.58 1.543795e+12 Passing clouds. ... 12 am 0 0
9 29.63 1.543817e+12 Sunny. ... 6 am 0 0
10 29.66 1.543838e+12 Overcast. ... 12 pm 0 0
11 29.72 1.543860e+12 Clear. ... 6 pm 0 0
12 29.80 1.543882e+12 Overcast. ... 12 am 0 0
13 29.93 1.543903e+12 Overcast. ... 6 am 0 0
14 29.96 1.543925e+12 Sunny. ... 12 pm 0 0
15 30.06 1.543946e+12 Clear. ... 6 pm 0 0
16 30.08 1.543968e+12 Clear. ... 12 am 0 0
17 30.09 1.543990e+12 Sunny. ... 6 am 0 0
18 30.03 1.544011e+12 Sunny. ... 12 pm 0 0
19 30.09 1.544033e+12 Clear. ... 6 pm 0 0
20 30.14 1.544054e+12 Clear. ... 12 am 0 0
21 30.19 1.544076e+12 Sunny. ... 6 am 0 0
22 30.15 1.544098e+12 Sunny. ... 12 pm 0 0
23 30.14 1.544119e+12 Mostly cloudy. ... 6 pm 0 0
24 30.18 1.544141e+12 Passing clouds. ... 12 am 0 0
25 30.32 1.544162e+12 Sunny. ... 6 am 0 0
26 30.34 1.544184e+12 Sunny. ... 12 pm 0 0
27 30.44 1.544206e+12 Clear. ... 6 pm 0 0
28 30.45 1.544227e+12 Clear. ... 12 am 0 0
29 30.48 1.544249e+12 Passing clouds. ... 6 am 0 0
.. ... ... ... ... ... .. ...
94 30.03 1.545653e+12 Partly sunny. ... 12 pm 0 0
95 30.09 1.545674e+12 Clear. ... 6 pm 0 0
96 30.17 1.545696e+12 Clear. ... 12 am 0 0
97 30.26 1.545718e+12 Overcast. ... 6 am 0 0
98 30.27 1.545739e+12 Sunny. ... 12 pm 0 0
99 30.34 1.545761e+12 Clear. ... 6 pm 0 0
100 30.40 1.545782e+12 Clear. ... 12 am 0 0
101 30.47 1.545804e+12 Overcast. ... 6 am 0 0
102 30.43 1.545826e+12 Partly sunny. ... 12 pm 0 0
103 30.47 1.545847e+12 Clear. ... 6 pm 0 0
104 30.52 1.545869e+12 Overcast. ... 12 am 0 0
105 30.60 1.545890e+12 Sunny. ... 6 am 0 0
106 30.56 1.545912e+12 Sunny. ... 12 pm 0 0
107 30.51 1.545934e+12 Overcast. ... 6 pm 0 0
108 30.34 1.545955e+12 Light rain. Fog. ... 12 am 0 0
109 30.14 1.545977e+12 Rain. Fog. ... 6 am 0 0
110 29.91 1.545998e+12 Light rain. Fog. ... 12 pm 0 0
111 29.83 1.546020e+12 Fog. ... 6 pm 0 0
112 29.85 1.546042e+12 Mostly cloudy. ... 12 am 0 0
113 29.97 1.546063e+12 Scattered clouds. ... 6 am 0 0
114 30.07 1.546085e+12 Partly sunny. ... 12 pm 0 0
115 30.16 1.546106e+12 Overcast. ... 6 pm 0 0
116 30.17 1.546128e+12 Clear. ... 12 am 0 0
117 30.23 1.546150e+12 Light snow. Overcast. ... 6 am 0 0
118 30.21 1.546171e+12 Overcast. ... 12 pm 0 0
119 30.27 1.546193e+12 Mostly cloudy. ... 6 pm 0 0
120 30.30 1.546214e+12 Clear. ... 12 am 0 0
121 30.34 1.546236e+12 Overcast. ... 6 am 0 0
122 30.23 1.546258e+12 Light rain. Mostly cloudy. ... 12 pm 0 0
123 30.00 1.546279e+12 Heavy rain. Fog. ... 6 pm 0 0
[124 rows x 14 columns]
附加:
您可以通过访问json获得单个日期(小时)。只需更改payload
中的参数即可获取特定日期:
import pandas as pd
url = 'https://www.timeanddate.com/scripts/cityajax.php'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36'}
year = 2018
month = 12
day = 1
payload = {
'n': 'usa/new-york',
'mode': 'historic',
'hd': '%d%02d%02d' %(year, month, day),
'month': '%02d' %(month),
'year': '%d' %(year)}
data = requests.get(url, headers=headers, params=payload).text
table = pd.read_html('<table>' + data + '</table>')[0][:-1]
table = table.dropna(axis=1)
输出:
print (table.to_string())
Unnamed: 0_level_0 Conditions Comfort Unnamed: 7_level_0 Unnamed: 8_level_0
Time Temp Weather Unnamed: 5_level_1 Humidity Barometer Visibility
0 12:51 amSat, Dec 1 40 °F Overcast. ↑ 80% 30.11 "Hg 10 mi
1 1:51 am 40 °F Passing clouds. ↑ 77% 30.12 "Hg 10 mi
2 2:51 am 39 °F Clear. ↑ 79% 30.12 "Hg 10 mi
3 3:51 am 39 °F Clear. ↑ 79% 30.13 "Hg 10 mi
4 4:51 am 38 °F Passing clouds. ↑ 79% 30.16 "Hg 10 mi
5 5:51 am 37 °F Clear. ↑ 82% 30.17 "Hg 9 mi
6 6:51 am 37 °F Clear. ↑ 86% 30.19 "Hg 10 mi
7 7:51 am 38 °F Sunny. ↑ 79% 30.21 "Hg 10 mi
8 8:51 am 40 °F Sunny. ↑ 73% 30.21 "Hg 10 mi
9 9:51 am 42 °F Sunny. ↑ 68% 30.22 "Hg 10 mi
10 10:51 am 44 °F Scattered clouds. ↑ 63% 30.21 "Hg 10 mi
11 11:51 am 44 °F Sunny. ↑ 60% 30.21 "Hg 10 mi
12 12:51 pm 45 °F Sunny. ↑ 58% 30.18 "Hg 10 mi
13 1:51 pm 46 °F Passing clouds. ↑ 56% 30.17 "Hg 10 mi
14 2:51 pm 45 °F Sunny. ↑ 58% 30.17 "Hg 10 mi
15 3:51 pm 45 °F Sunny. ↑ 56% 30.17 "Hg 10 mi
16 4:51 pm 44 °F Clear. ↑ 63% 30.17 "Hg 10 mi
17 5:51 pm 43 °F Passing clouds. ↑ 62% 30.16 "Hg 10 mi
18 6:51 pm 42 °F Light rain. Mostly cloudy. ↑ 82% 30.16 "Hg 7 mi
19 7:51 pm 42 °F Light rain. Overcast. ↑ 79% 30.15 "Hg 7 mi
20 8:51 pm 41 °F Light rain. Mostly cloudy. ↑ 86% 30.15 "Hg 10 mi
21 9:51 pm 42 °F Mostly cloudy. ↑ 82% 30.14 "Hg 10 mi
22 10:32 pm 42 °F Light rain. Overcast. ↑ 85% 30.15 "Hg 8 mi
23 10:51 pm 42 °F Light rain. Overcast. ↑ 89% 30.11 "Hg 8 mi
24 11:51 pm 42 °F Fog. ↑ 92% 30.07 "Hg 4 mi