我想获得每年以及整个时间序列的最大径流月份。这个想法是通过查看最大径流月份来描述全球季节性特征。然后,我想尝试考虑每个像素是否具有单峰或双峰状态。
我想像Pangeo示例here中那样创建一张地图。
这表示最大降雨量的小时。我想显示最大径流量的MONTH(以整数表示)。
在这里,我下载GRUN runoff data并创建一个xarray对象。 注意::此处的数据集> 1GB。我正在使用它使此示例完全可重复。
# get the data
import subprocess
command = """
wget -O grun.nc https://www.research-collection.ethz.ch/bitstream/handle/20.500.11850/324386/GRUN_v1_GSWP3_WGS84_05_1902_2014.nc?sequence=1&isAllowed=y
"""
import os
if not os.path.exists('grun.nc'):
process = subprocess.Popen(command.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
# read the data
import xarray as xr
ds = xr.open_dataset('grun.nc')
# select a subset so we can work with it more quickly
ds = ds.isel(time=slice(-100,-1))
ds
Out[]:
<xarray.Dataset>
Dimensions: (lat: 360, lon: 720, time: 99)
Coordinates:
* lon (lon) float64 -179.8 -179.2 -178.8 -178.2 ... 178.8 179.2 179.8
* lat (lat) float64 -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25 89.75
* time (time) datetime64[ns] 2006-09-01 2006-10-01 ... 2014-11-01
Data variables:
Runoff (time, lat, lon) float32 ...
Attributes:
title: GRUN
version: GRUN 1.0
meteorological_forcing: GSWP3
temporal_resolution: monthly
spatial_resolution: 0.5x0.5
crs: WGS84
proj4: +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs
EPSG: 4326
references: Ghiggi et al.,2019. GRUN: An observation-based g...
authors: Gionata Ghiggi; Lukas Gudmundsson
contacts: gionata.ghiggi@gmail.com; lukas.gudmundsson@env....
institution: Land-Climate Dynamics, Institute for Atmospheric...
institution_id: IAC ETHZ
我有nan值,所以不能仅将argmax()
应用于数据集。我使用与@jhamman here相同的方法以及上面的Pangeo示例。我不确定这给了我什么,但似乎给了我
# Apply argmax where you have NAN values
def my_func(ds, dim=None):
return ds.isel(**{dim: ds['Runoff'].argmax(dim)})
mask = ds['Runoff'].isel(time=0).notnull() # determine where you have valid data
ds2 = ds.fillna(-9999) # fill nans with a missing flag of some kind
new = ds2.reset_coords(drop=True).groupby('time.month').apply(my_func, dim='time').where(mask) # do the groupby operation/reduction and reapply the mask
new
Out[]:
<xarray.Dataset>
Dimensions: (lat: 360, lon: 720, month: 12)
Coordinates:
* lon (lon) float64 -179.8 -179.2 -178.8 -178.2 ... 178.8 179.2 179.8
* lat (lat) float64 -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25 89.75
* month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
Data variables:
Runoff (month, lat, lon) float32 nan nan nan nan nan ... nan nan nan nan
Attributes:
title: GRUN
version: GRUN 1.0
meteorological_forcing: GSWP3
temporal_resolution: monthly
spatial_resolution: 0.5x0.5
crs: WGS84
proj4: +proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs
EPSG: 4326
references: Ghiggi et al.,2019. GRUN: An observation-based g...
authors: Gionata Ghiggi; Lukas Gudmundsson
contacts: gionata.ghiggi@gmail.com; lukas.gudmundsson@env....
institution: Land-Climate Dynamics, Institute for Atmospheric...
institution_id: IAC ETHZ
这给了我
import matplotlib.pyplot as plt
fig,ax = plt.subplots(figsize=(12,8))
new.Runoff.sel(month=10).plot(ax=ax, cmap='twilight')
很高兴在必要时转换为pandas
。
因此,我最终得到一个xr.Dataset,其中包含最大径流月份的整数。理想情况下,最好也要有一个月的最大径流量,以便我也可以看到这种季节性变化的方式。
<xarray.Dataset>
Dimensions: (lat: 360, lon: 720)
Coordinates:
* lon (lon) float64 -179.8 -179.2 -178.8 -178.2 ... 178.8 179.2 179.8
* lat (lat) float64 -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25 89.75
Data variables:
Month_of_max (lat, lon) int32 ...
# OR EVEN BETTER
<xarray.Dataset>
Dimensions: (lat: 360, lon: 720, Year: 10)
Coordinates:
* lon (lon) float64 -179.8 -179.2 -178.8 -178.2 ... 178.8 179.2 179.8
* lat (lat) float64 -89.75 -89.25 -88.75 -88.25 ... 88.75 89.25 89.75
* year (year) float64 2010 2011 2012 2013 ...
Data variables:
Month_of_max (lat, lon, year) int32 ...
答案 0 :(得分:0)
答案 1 :(得分:0)
因此,我找到的最佳解决方案是将其转换为pandas.Dataframe
对象,然后在那里进行计算。我已经将解决方案包装到下面的函数中。
首先,让我们处理数据的子集(否则需要一段时间)。这是肯尼亚周围的箱子。
import xarray as xr
ds = xr.open_dataset('grun.nc')
ds = ds.isel(time=slice(-20,-1))
ds = ds.sel(lat=slice(-5.202,6.002),lon=slice(33.501,42.283))
ds.attrs = ''
ds
Out[]:
<xarray.Dataset>
Dimensions: (lat: 22, lon: 18, time: 19)
Coordinates:
* lon (lon) float64 33.75 34.25 34.75 35.25 ... 40.75 41.25 41.75 42.25
* lat (lat) float64 -4.75 -4.25 -3.75 -3.25 -2.75 ... 4.25 4.75 5.25 5.75
* time (time) datetime64[ns] 2013-05-01 2013-06-01 ... 2014-11-01
Data variables:
Runoff (time, lat, lon) float32 ...
所有工作都已完成并捆绑在calculate_annual_month_of_max()
中。基本上,它的工作是将xr.Dataset
转换为pd.Dataframe
对象,然后为每个lat-lon-year
提取最大径流的时间步长。这种方法的优点在于它同时返回Runoff
值和month
整数。
import pandas as pd
def convert_to_df(ds):
"""
Returns:
-------
xr.Dataset
"""
df = ds.to_dataframe()
df.reset_index(inplace=True)
return df
def calculate_year_month_cols(df):
""""""
assert 'time' in df.columns, f"time should be in df.columns. Currently: {[c for c in df.columns]}"
df['year'] = df.time.map(lambda x: x.year)
df['month'] = df.time.map(lambda x: x.month)
return df
def calculate_month_of_max_value(df, value_col):
"""
Arguments
---------
df : pd.DataFrame
dataframe converted from xarray with ['lat','lon', 'year', value_col] columns
value_col : str
column that you want to find the month of maximum for
e.g. Which month (int) in each pixel (lat,lon) has the highest runoff
"""
max_months = df.loc[df.groupby(["lat","lon","year"])[value_col].idxmax()]
return max_months
def convert_dataframe_to_xarray(df, index_cols=['lat','lon']):
"""
Arguments
---------
df: pd.DataFrame
the dataframe to convert to xr.dataset
index_cols: List[str]
the columns that will become the coordinates
of the output xr.Dataset
Returns
-------
xr.Dataset
"""
out = df.set_index(index_cols).dropna()
ds = out.to_xarray()
return ds
def calculate_annual_month_of_max(ds, variable):
"""for the `variable` in the `ds` calculate the
month of maximum for a given pixel-year.
Returns:
-------
xr.Dataset
"""
# convert to a dataframe
df = convert_to_df(ds)
df = calculate_year_month_cols(df)
# calculate the month of maximum
df = calculate_month_of_max_value(df, value_col=variable)
# reconstitute the dataframe object
ds_out = convert_dataframe_to_xarray(df, index_cols=['lat','lon','year'])
return ds_out
mon_of_max = calculate_annual_month_of_max(ds, variable='Runoff')
mon_of_max
Out[]:
<xarray.Dataset>
Dimensions: (lat: 22, lon: 18, year: 2)
Coordinates:
* lat (lat) float64 -4.75 -4.25 -3.75 -3.25 -2.75 ... 4.25 4.75 5.25 5.75
* lon (lon) float64 33.75 34.25 34.75 35.25 ... 40.75 41.25 41.75 42.25
* year (year) float64 2.013e+03 2.014e+03
Data variables:
time (lat, lon, year) datetime64[ns] 2013-12-01 ... 2014-10-01
Runoff (lat, lon, year) float32 0.5894838 0.9081207 ... 0.2789653
month (lat, lon, year) float64 12.0 1.0 12.0 1.0 ... 11.0 10.0 11.0 10.0