我正在尝试从pandas数据框中选择区域内的对象,该数据框中包含项目ID和纬度对的列表。是否有选择方法? 我认为这类似于该SO问题,但使用PANDAS而不是SQL
Selecting geographical points within area
这是我的表格保存在 locations.csv
中ID, LAT, LON
001,35.00,-75.00
002,35.01,-80.00
...
999,25.76,-64.00
我可以加载数据框,然后选择一个矩形区域:
import pandas as pd
df = pd.read_csv('locations.csv', delimiter=',')
lat_max = 32.323496
lat_min = 25.712767
lon_max = -72.863358
lon_min = -74.729456
small_df = df[df['LAT'] > lat_min][df['LAT'] < lat_max][df['LON'] < lon_max][df['LON'] > lon_min]
如何选择不规则区域内的对象?
如何构造数据框选择命令?
我可以构建一个lambda函数,该函数将在该区域内为LAT和LON生成True值,但是我不确定如何将其与pandas数据框一起使用。
答案 0 :(得分:2)
下面的工作代码执行的在区域内选择点的过程始于创建2个地理数据框。第一个包含多边形,第二个包含与第一个关联的所有点spatial join
。使用空间连接运算符within
可以选择位于多边形内的点。操作的结果也是一个地理数据框,它仅包含落在多边形区域内的所需点。
locations.csv
的内容; 6行,带列标题。
注意:第一行中没有空格。
ID,LAT,LON
1, 15.1, 10.0
2, 15.2, 15.1
3, 15.3, 20.2
4, 15.4, 25.3
5, 15.5, 30.4
代码:
import pandas as pd
import geopandas as gpd
from shapely import wkt
from shapely.geometry import Point, Polygon
from shapely.wkt import loads
# Create a geo-dataframe `polygon_df` having 1 row of polygon
# This polygon will be used to select points in a geodataframe
d = {'poly_id':[1], 'wkt':['POLYGON ((30 10, 40 40, 20 40, 10 20, 30 10))']}
df = pd.DataFrame( data=d )
geometry = [loads(pgon) for pgon in df.wkt]
polygon_df = gpd.GeoDataFrame(df, \
crs={'init': 'epsg:4326'}, \
geometry=geometry)
# One can plot this polygon with the command:
# polygon_df.plot()
# Read the file with `pandas`
locs = pd.read_csv('locations.csv', sep=',')
# Making it a geo-dataframe with new name: `geo_locs`
geo_locs = gpd.GeoDataFrame(locs, crs={'init': 'epsg:4326'})
locs_geom = [Point(xy) for xy in zip(geo_locs.LON, geo_locs.LAT)]
geo_locs['wkt'] = geo_locs.apply( lambda x: Point(x.LON, x.LAT), axis=1 )
geo_locs = gpd.GeoDataFrame(geo_locs, crs={'init': 'epsg:4326'}, \
geometry=geo_locs['wkt'])
# Do a spatial join of `point` within `polygon`, get the result in `pts_in_poly` GeodataFrame.
pts_in_poly = gpd.sjoin(geo_locs, polygon_df, op='within', how='inner')
# Print the ID of the points that fall within the polygon.
print(pts_in_poly.ID)
# The output will be:
#2 3
#3 4
#4 5
#Name: ID, dtype: int64
# Plot the polygon and all the points.
ax1 = polygon_df.plot(color='lightgray', zorder=1)
geo_locs.plot(ax=ax1, zorder=5, color="red")
输出图:
在图中,ID为3、4和5的点落在多边形内。