我的数据集：

我正在使用的包裹：

import numpy as np
import pandas as pd
import geopandas as gpd
import matplotlib.pyplot as plt
from shapely import geometry
import xarray as xr

我有一个存储纬度/经度点的地理数据框，该点对应于住户。 index列是家庭的ID。

geo_df.head()

Out[]:
  crop_name     xxx     cut_date plant_date                       geometry
0   SORGHUM  0.061029 2011-11-10 2011-11-10 POINT (37.89087631 14.35381619)
1    MILLET -0.104342 2011-10-19 2011-10-19 POINT (37.89087631 14.35381619)
2   SORGHUM -0.031697 2013-11-26 2013-11-26 POINT (37.89087631 14.35381619)

我有一个xarray对象，用于存储GRIDDED植被健康数据（NDVI）。

ndvi_df = xr.open_dataset(geo_data_dir+ndvi_dir).ndvi

Out[]: <xarray.DataArray 'ndvi' (time: 212, lat: 200, lon: 220)>
[9328000 values with dtype=float32]
Coordinates:
  * lon      (lon) float32 35.024994 35.074997 35.125 35.174988 35.22499 ...
  * lat      (lat) float32 14.974998 14.924995 14.875 14.824997 14.775002 ...
  * time     (time) datetime64[ns] 2000-02-14 2000-03-16 2000-04-15 ...
Attributes:
    long_name:   Normalized Difference Vegetation Index
    units:       1
    _fillvalue:  -3000

我有一个存储对应于一个国家的POLYGON的地理数据框。

world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))
ethiopia = world.loc[world["name"] == "Ethiopia"]

视觉摘要：

我的数据集相互重叠显示如下（出于演示目的，每年进行绘制）。

(ndvi_df.loc[f'{year}-01-16T00:00:00.000000000':f'{year}-12-16T00:00:00.000000000']
 .mean(dim='time')
 .plot(cmap='gist_earth_r', vmin=-0.1, vmax=1)
)

ax = plt.gca()

ethiopia.plot(alpha=0.2, color='black', ax=ax)

(geo_df
 .loc[ (lsms_geo_1["cut_date"] > f'{year}-01-01') & (lsms_geo_1["cut_date"] < f'{year+1}-01-01') ]
 .plot(markersize=6 ,ax=ax, color="#FEF731")
)
ax.set_title(f'{year} Mean NDVI and Households')
plt.show()

理想的输出：

我想要输出一个带有额外列的地理数据框，该数据框告诉我住户内像素的“前一个月”中的NDVI值。

index列是家庭的ID。

像这样：

  crop_name     xxx     cut_date plant_date                       geometry  ndvi_month_0  ndvi_month_1  ndvi_month_2
0   SORGHUM  0.061029 2011-11-10 2011-11-10 POINT (37.89087631 14.35381619)          0.3           0.3           0.3
1    MILLET -0.104342 2011-10-19 2011-10-19 POINT (37.89087631 14.35381619)          0.6           0.6           0.6
2   SORGHUM -0.031697 2013-11-26 2013-11-26 POINT (37.89087631 14.35381619)          0.1           0.1           0.1

我还想知道如何通过使用地理数据框多边形ethiopia在xarray对象中对数据进行子集化。

（重新发布在GIS Stack Exchange here上）

Answer 1

因此，在@om_henners here的帮助下，有一个有效的解决方案。

以下功能可以应用于geopandas.GeoDataFrame对象。它将选择之前的12个月，并为lat,lon中的GeoDataFrame点选择NEAREST值。

def geo_var_for_point(row, geovar_df, geovar_name):
    """
      Return a pandas series of geovariable values (NDVI or LST) which will be 
        indexed by the time index.

      Usage:
      -----
      `geo_df.apply(ndvi_for_point, axis=1, **{"geovar_df":ndvi_df})`

      Arguments:
      ---------
      :df (geopandas.GeoDataFrame) : dataframe with `geometry` and `cut_date` cols
      :geovar_df (xarray.DataArray): the geographic variable you want information from
      :geovar_name (str): how to label to columns with the correct geovariable

      Returns:
      -------
      :(pd.Series) : series object of geo_Var values for the 12 months prior to cut_date

      Variables:
      ---------
      :point (shapely.Point): geometry of the point (x, y coords)
      :cut_date (pd.datetime): the date at which the crop was cut
      :start_date (pd.datetime): the first month to select geovars from
    """
    # get the times
    cut_date = row['cut_date']
    start_date = cut_date - pd.DateOffset(months=12)

    # subset the geovar dataframe by time
    limited_geovar = geovar_df.loc[start_date: cut_date]

    # get the location
    point = row['geometry']

    # select the values from the xarray.DataArray for that location
    series = limited_geovar.sel(lat=point.y, lon=point.x, method='nearest').to_series()

    # create the output with columns labelled
    columns = [f"{geovar_name}_month_t-{i}" for i in np.arange(len(series))]
    return pd.Series(series.values , index=columns)

此功能可以像这样应用：

ndvi_extract = geo_df.head().apply(geo_var_for_point, axis=1, **{"geovar_df":ndvi_df, "geovar_name": "ndvi"})

哪个返回：

  ndvi_month_t-0  ndvi_month_t-1  ndvi_month_t-2  ndvi_month_t-3  ndvi_month_t-4  ndvi_month_t-5  ndvi_month_t-6  ndvi_month_t-7  ndvi_month_t-8  ndvi_month_t-9  ndvi_month_t-10 ndvi_month_t-11
0         0.3141          0.2559          0.2287          0.2056          0.1993          0.2015          0.1970          0.2187          0.2719          0.3669           0.4647          0.3563
1         0.3141          0.2559          0.2287          0.2056          0.1993          0.2015          0.1970          0.2187          0.2719          0.3669           0.4647          0.3563
2         0.2257          0.2065          0.1967          0.1949          0.1878          0.1861          0.1987          0.2801          0.4338          0.5667           0.4209          0.2880
3         0.2866          0.2257          0.2065          0.1967          0.1949          0.1878          0.1861          0.1987          0.2801          0.4338           0.5667          0.4209
4         0.4044          0.2866          0.2257          0.2065          0.1967          0.1949          0.1878          0.1861          0.1987          0.2801           0.4338          0.5667

然后可以将其连接到原始数据帧：

pd.concat([geo_df.head(), ndvi_extract.head()], axis=1)

这将返回带有网格产品中该点的geovariable值的geopandas.GeoDataFrame。

python Point-In-Polygon操作。根据网格内的点将网格数据与点数据连接起来

我想知道如何根据Geopandas中行的位置（`DataArray`和时间（`geo_df.geometry`和`geo_df.plant_date`）从xarray `geo_df.cut_date`中选择值`GeoDataFrame`。我想将它们作为“功能”加入输出`GeoDataFrame`中。

我的数据集：

视觉摘要：

理想的输出：

1 个答案:

python Point-In-Polygon操作。根据网格内的点将网格数据与点数据连接起来

我想知道如何根据Geopandas中行的位置（DataArray和时间（geo_df.geometry和geo_df.plant_date）从xarray geo_df.cut_date中选择值GeoDataFrame。我想将它们作为“功能”加入输出GeoDataFrame中。

我的数据集：

视觉摘要：

理想的输出：

1 个答案:

我想知道如何根据Geopandas中行的位置（`DataArray`和时间（`geo_df.geometry`和`geo_df.plant_date`）从xarray `geo_df.cut_date`中选择值`GeoDataFrame`。我想将它们作为“功能”加入输出`GeoDataFrame`中。