假设我有两个不相交的群体/多边形“群岛”(想想两个非相邻县的人口普查区)。我的数据看起来像这样:
>>> p1=Polygon([(0,0),(10,0),(10,10),(0,10)])
>>> p2=Polygon([(10,10),(20,10),(20,20),(10,20)])
>>> p3=Polygon([(10,10),(10,20),(0,10)])
>>>
>>> p4=Polygon([(40,40),(50,40),(50,30),(40,30)])
>>> p5=Polygon([(40,40),(50,40),(50,50),(40,50)])
>>> p6=Polygon([(40,40),(40,50),(30,50)])
>>>
>>> df=gpd.GeoDataFrame(geometry=[p1,p2,p3,p4,p5,p6])
>>> df
geometry
0 POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0))
1 POLYGON ((10 10, 20 10, 20 20, 10 20, 10 10))
2 POLYGON ((10 10, 10 20, 0 10, 10 10))
3 POLYGON ((40 40, 50 40, 50 30, 40 30, 40 40))
4 POLYGON ((40 40, 50 40, 50 50, 40 50, 40 40))
5 POLYGON ((40 40, 40 50, 30 50, 40 40))
>>>
>>> df.plot()
我希望每个岛屿内的多边形都采用代表其组的ID(可以是任意的)。例如,左下角的3个多边形可以具有IslandID = 1,右上角的3个多边形可以具有IslandID = 2.
我已经开发出一种方法来做到这一点,但我想知道它是否是最好/最有效的方式。我做了以下事情:
1)创建一个GeoDataFrame,其几何体等于多面体一元并集内的多边形。这给了我两个多边形,每个“岛”一个。
>>> SepIslands=gpd.GeoDataFrame(geometry=list(df.unary_union))
>>> SepIslands.plot()
2)为每个组创建一个ID。
>>> SepIslands['IslandID']=SepIslands.index+1
3)空间将岛屿连接到原始多边形,因此每个多边形都有适当的岛屿ID。
>>> Final=gpd.tools.sjoin(df, SepIslands, how='left').drop('index_right',1)
>>> Final
geometry IslandID
0 POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0)) 1
1 POLYGON ((10 10, 20 10, 20 20, 10 20, 10 10)) 1
2 POLYGON ((10 10, 10 20, 0 10, 10 10)) 1
3 POLYGON ((40 40, 50 40, 50 30, 40 30, 40 40)) 2
4 POLYGON ((40 40, 50 40, 50 50, 40 50, 40 40)) 2
5 POLYGON ((40 40, 40 50, 30 50, 40 40)) 2
这确实是最好/最有效的方法吗?
答案 0 :(得分:1)
如果每个组之间的间隙相当大,则另一个选择是sklearn.cluster.DBSCAN,以将多边形的质心聚类并将其标记为聚类。
DBSCAN代表带噪声的应用程序的基于密度的空间聚类,它可以将紧密堆积的点组合在一起。在我们的例子中,一个岛中的多边形将被聚集在同一簇中。
这也适用于两个以上的岛屿。
import geopandas as gpd
import pandas as pd
from shapely.geometry import Polygon
from sklearn.cluster import DBSCAN
# Note, EPS_DISTANCE = 20 is a magic number and it needs to be
# * smaller than the gap between any two islands
# * large enough to cluster polygons in one island in same cluster
EPS_DISTANCE = 20
MIN_SAMPLE_POLYGONS = 1
p1=Polygon([(0,0),(10,0),(10,10),(0,10)])
p2=Polygon([(10,10),(20,10),(20,20),(10,20)])
p3=Polygon([(10,10),(10,20),(0,10)])
p4=Polygon([(40,40),(50,40),(50,30),(40,30)])
p5=Polygon([(40,40),(50,40),(50,50),(40,50)])
p6=Polygon([(40,40),(40,50),(30,50)])
df = gpd.GeoDataFrame(geometry=[p1, p2, p3, p4, p5, p6])
# preparation for dbscan
df['x'] = df['geometry'].centroid.x
df['y'] = df['geometry'].centroid.y
coords = df.as_matrix(columns=['x', 'y'])
# dbscan
dbscan = DBSCAN(eps=EPS_DISTANCE, min_samples=MIN_SAMPLE_POLYGONS)
clusters = dbscan.fit(coords)
# add labels back to dataframe
labels = pd.Series(clusters.labels_).rename('IslandID')
df = pd.concat([df, labels], axis=1)
> df
geometry ... IslandID
0 POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0)) ... 0
1 POLYGON ((10 10, 20 10, 20 20, 10 20, 10 10)) ... 0
2 POLYGON ((10 10, 10 20, 0 10, 10 10)) ... 0
3 POLYGON ((40 40, 50 40, 50 30, 40 30, 40 40)) ... 1
4 POLYGON ((40 40, 50 40, 50 50, 40 50, 40 40)) ... 1
5 POLYGON ((40 40, 40 50, 30 50, 40 40)) ... 1
[6 rows x 4 columns]