如何在熊猫数据框中选择地理区域内的对象

时间:2019-04-11 16:22:41

标签: python pandas geolocation selection geopandas

我正在尝试从pandas数据框中选择区域内的对象,该数据框中包含项目ID和纬度对的列表。是否有选择方法? 我认为这类似于该SO问题,但使用PANDAS而不是SQL

Selecting geographical points within area

这是我的表格保存在 locations.csv

ID, LAT, LON
001,35.00,-75.00
002,35.01,-80.00 
...
999,25.76,-64.00

我可以加载数据框,然后选择一个矩形区域:

import pandas as pd
df = pd.read_csv('locations.csv', delimiter=',')
lat_max = 32.323496
lat_min = 25.712767
lon_max = -72.863358
lon_min = -74.729456
small_df = df[df['LAT'] > lat_min][df['LAT'] < lat_max][df['LON'] < lon_max][df['LON'] > lon_min]

如何选择不规则区域内的对象?

如何构造数据框选择命令?

我可以构建一个lambda函数,该函数将在该区域内为LAT和LON生成True值,但是我不确定如何将其与pandas数据框一起使用。

1 个答案:

答案 0 :(得分:2)

下面的工作代码执行的在区域内选择点的过程始于创建2个地理数据框。第一个包含多边形,第二个包含与第一个关联的所有点spatial join。使用空间连接运算符within可以选择位于多边形内的点。操作的结果也是一个地理数据框,它仅包含落在多边形区域内的所需点。

locations.csv的内容; 6行,带列标题。 注意:第一行中没有空格。

ID,LAT,LON
1, 15.1, 10.0
2, 15.2, 15.1
3, 15.3, 20.2
4, 15.4, 25.3
5, 15.5, 30.4

代码:

import pandas as pd
import geopandas as gpd
from shapely import wkt
from shapely.geometry import Point, Polygon
from shapely.wkt import loads

# Create a geo-dataframe `polygon_df` having 1 row of polygon
# This polygon will be used to select points in a geodataframe
d = {'poly_id':[1], 'wkt':['POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))']}
df = pd.DataFrame( data=d )
geometry = [loads(pgon) for pgon in df.wkt]
polygon_df = gpd.GeoDataFrame(df, \
                   crs={'init': 'epsg:4326'}, \
                   geometry=geometry)

# One can plot this polygon with the command:
# polygon_df.plot()

# Read the file with `pandas`
locs = pd.read_csv('locations.csv', sep=',')

# Making it a geo-dataframe with new name: `geo_locs`
geo_locs = gpd.GeoDataFrame(locs, crs={'init': 'epsg:4326'})
locs_geom = [Point(xy) for xy in zip(geo_locs.LON, geo_locs.LAT)]
geo_locs['wkt'] = geo_locs.apply( lambda x: Point(x.LON, x.LAT), axis=1 )
geo_locs = gpd.GeoDataFrame(geo_locs, crs={'init': 'epsg:4326'}, \
    geometry=geo_locs['wkt'])

# Do a spatial join of `point` within `polygon`, get the result in `pts_in_poly` GeodataFrame.
pts_in_poly = gpd.sjoin(geo_locs, polygon_df, op='within', how='inner')

# Print the ID of the points that fall within the polygon.
print(pts_in_poly.ID)

# The output will be:
#2    3
#3    4
#4    5
#Name: ID, dtype: int64

# Plot the polygon and all the points.
ax1 = polygon_df.plot(color='lightgray', zorder=1)
geo_locs.plot(ax=ax1, zorder=5, color="red")

输出图:

enter image description here

在图中,ID为3、4和5的点落在多边形内。