DataArray
和时间(geo_df.geometry
和geo_df.plant_date
)从xarray geo_df.cut_date
中选择值GeoDataFrame
。我想将它们作为“功能”加入输出GeoDataFrame
中。我正在使用的包裹:
import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely import geometry
import xarray as xr
我有一个存储纬度/经度点的地理数据框,该点对应于住户。 index
列是家庭的ID。
geo_df.head()
Out[]:
crop_name xxx cut_date plant_date geometry
0 SORGHUM 0.061029 2011-11-10 2011-11-10 POINT (37.89087631 14.35381619)
1 MILLET -0.104342 2011-10-19 2011-10-19 POINT (37.89087631 14.35381619)
2 SORGHUM -0.031697 2013-11-26 2013-11-26 POINT (37.89087631 14.35381619)
我有一个xarray对象,用于存储GRIDDED植被健康数据(NDVI)。
ndvi_df = xr.open_dataset(geo_data_dir+ndvi_dir).ndvi
Out[]: <xarray.DataArray 'ndvi' (time: 212, lat: 200, lon: 220)>
[9328000 values with dtype=float32]
Coordinates:
* lon (lon) float32 35.024994 35.074997 35.125 35.174988 35.22499 ...
* lat (lat) float32 14.974998 14.924995 14.875 14.824997 14.775002 ...
* time (time) datetime64[ns] 2000-02-14 2000-03-16 2000-04-15 ...
Attributes:
long_name: Normalized Difference Vegetation Index
units: 1
_fillvalue: -3000
我有一个存储对应于一个国家的POLYGON的地理数据框。
world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
ethiopia = world.loc[world["name"] == "Ethiopia"]
我的数据集相互重叠显示如下(出于演示目的,每年进行绘制)。
(ndvi_df.loc[f'{year}-01-16T00:00:00.000000000':f'{year}-12-16T00:00:00.000000000']
.mean(dim='time')
.plot(cmap='gist_earth_r', vmin=-0.1, vmax=1)
)
ax = plt.gca()
ethiopia.plot(alpha=0.2, color='black', ax=ax)
(geo_df
.loc[ (lsms_geo_1["cut_date"] > f'{year}-01-01') & (lsms_geo_1["cut_date"] < f'{year+1}-01-01') ]
.plot(markersize=6 ,ax=ax, color="#FEF731")
)
ax.set_title(f'{year} Mean NDVI and Households')
plt.show()
我想要输出一个带有额外列的地理数据框,该数据框告诉我住户内像素的“前一个月”中的NDVI值。
index
列是家庭的ID。
像这样:
crop_name xxx cut_date plant_date geometry ndvi_month_0 ndvi_month_1 ndvi_month_2
0 SORGHUM 0.061029 2011-11-10 2011-11-10 POINT (37.89087631 14.35381619) 0.3 0.3 0.3
1 MILLET -0.104342 2011-10-19 2011-10-19 POINT (37.89087631 14.35381619) 0.6 0.6 0.6
2 SORGHUM -0.031697 2013-11-26 2013-11-26 POINT (37.89087631 14.35381619) 0.1 0.1 0.1
我还想知道如何通过使用地理数据框多边形ethiopia
在xarray对象中对数据进行子集化。
(重新发布在GIS Stack Exchange here上)
答案 0 :(得分:0)
因此,在@om_henners here的帮助下,有一个有效的解决方案。
以下功能可以应用于geopandas.GeoDataFrame
对象。它将选择之前的12个月,并为lat,lon
中的GeoDataFrame
点选择NEAREST值。
def geo_var_for_point(row, geovar_df, geovar_name):
"""
Return a pandas series of geovariable values (NDVI or LST) which will be
indexed by the time index.
Usage:
-----
`geo_df.apply(ndvi_for_point, axis=1, **{"geovar_df":ndvi_df})`
Arguments:
---------
:df (geopandas.GeoDataFrame) : dataframe with `geometry` and `cut_date` cols
:geovar_df (xarray.DataArray): the geographic variable you want information from
:geovar_name (str): how to label to columns with the correct geovariable
Returns:
-------
:(pd.Series) : series object of geo_Var values for the 12 months prior to cut_date
Variables:
---------
:point (shapely.Point): geometry of the point (x, y coords)
:cut_date (pd.datetime): the date at which the crop was cut
:start_date (pd.datetime): the first month to select geovars from
"""
# get the times
cut_date = row['cut_date']
start_date = cut_date - pd.DateOffset(months=12)
# subset the geovar dataframe by time
limited_geovar = geovar_df.loc[start_date: cut_date]
# get the location
point = row['geometry']
# select the values from the xarray.DataArray for that location
series = limited_geovar.sel(lat=point.y, lon=point.x, method='nearest').to_series()
# create the output with columns labelled
columns = [f"{geovar_name}_month_t-{i}" for i in np.arange(len(series))]
return pd.Series(series.values , index=columns)
此功能可以像这样应用:
ndvi_extract = geo_df.head().apply(geo_var_for_point, axis=1, **{"geovar_df":ndvi_df, "geovar_name": "ndvi"})
哪个返回:
ndvi_month_t-0 ndvi_month_t-1 ndvi_month_t-2 ndvi_month_t-3 ndvi_month_t-4 ndvi_month_t-5 ndvi_month_t-6 ndvi_month_t-7 ndvi_month_t-8 ndvi_month_t-9 ndvi_month_t-10 ndvi_month_t-11
0 0.3141 0.2559 0.2287 0.2056 0.1993 0.2015 0.1970 0.2187 0.2719 0.3669 0.4647 0.3563
1 0.3141 0.2559 0.2287 0.2056 0.1993 0.2015 0.1970 0.2187 0.2719 0.3669 0.4647 0.3563
2 0.2257 0.2065 0.1967 0.1949 0.1878 0.1861 0.1987 0.2801 0.4338 0.5667 0.4209 0.2880
3 0.2866 0.2257 0.2065 0.1967 0.1949 0.1878 0.1861 0.1987 0.2801 0.4338 0.5667 0.4209
4 0.4044 0.2866 0.2257 0.2065 0.1967 0.1949 0.1878 0.1861 0.1987 0.2801 0.4338 0.5667
然后可以将其连接到原始数据帧:
pd.concat([geo_df.head(), ndvi_extract.head()], axis=1)
这将返回带有网格产品中该点的geovariable值的geopandas.GeoDataFrame。