使用python从大型数据集中获取每年的最高价值时,我遇到了一个问题。
with open('GlobalLandTemperaturesByCity.csv') as csvfile:
data = pd.read_csv(csvfile)
changedata = data[data['City'].str.match('Pokhara') & data['Country'].str.match('Nepal')]
changedata = changedata.set_index(changedata['dt'])
#changedata = changedata['dt'].to_datetime()
usedata = changedata[changedata['dt']> '1970-1-1 01:00:00']
print(usedata)
这产生
dt AverageTemperature AverageTemperatureUncertainty \
dt
1970-10-01 1970-10-01 16.388 0.395
1970-11-01 1970-11-01 10.569 1.017
1970-12-01 1970-12-01 7.455 0.194
1971-01-01 1971-01-01 5.508 0.435
1971-02-01 1971-02-01 7.458 0.413
... ... ... ...
2013-05-01 2013-05-01 20.069 0.719
2013-06-01 2013-06-01 21.168 0.407
2013-07-01 2013-07-01 21.503 0.316
2013-08-01 2013-08-01 21.541 0.478
2013-09-01 2013-09-01 NaN NaN
City Country Latitude Longitude
dt
1970-10-01 Pokhara Nepal 28.13N 84.55E
1970-11-01 Pokhara Nepal 28.13N 84.55E
1970-12-01 Pokhara Nepal 28.13N 84.55E
1971-01-01 Pokhara Nepal 28.13N 84.55E
1971-02-01 Pokhara Nepal 28.13N 84.55E
... ... ... ... ...
2013-05-01 Pokhara Nepal 28.13N 84.55E
2013-06-01 Pokhara Nepal 28.13N 84.55E
2013-07-01 Pokhara Nepal 28.13N 84.55E
2013-08-01 Pokhara Nepal 28.13N 84.55E
2013-09-01 Pokhara Nepal 28.13N 84.55E
我需要一种方法来获取每月的最高或最低数据,而不是获取每个月的数据? 任何帮助将不胜感激!
答案 0 :(得分:2)
如果您的索引是真实的日期时间索引:
# Optional fix for datetime-like str index:
# df.index = pd.to_datetime(df.index)
df \
.resample('1y') \
.AverageTemperature \
.agg([min, max])
示例
dr = pd.date_range('2010-01-01', '2020-01-01')
df = pd.DataFrame(range(len(dr)), index=dr, columns=['AverageTemperature'])
df.resample('1y').AverageTemperature.agg([min, max])
结果
min max
2010-12-31 0 364
2011-12-31 365 729
2012-12-31 730 1095
2013-12-31 1096 1460
2014-12-31 1461 1825
2015-12-31 1826 2190
2016-12-31 2191 2556
2017-12-31 2557 2921
2018-12-31 2922 3286
2019-12-31 3287 3651
2020-12-31 3652 3652
绘图
要绘制此图,您只需调用
df \
.resample('1y') \
.AverageTemperature \
.agg([min, max]) \
.plot()