我有一个包含多边形和多边形的shapefile,如下所示:
name geometry
0 AB10 POLYGON ((-2.116454759005259 57.14656265903432...
1 AB11 (POLYGON ((-2.052573095588467 57.1342600856536...
2 AB12 (POLYGON ((-2.128066321470298 57.0368357386797...
3 AB13 POLYGON ((-2.261525922489881 57.10693578217748...
4 AB14 POLYGON ((-2.261525922489879 57.10693578217748...
第2行和第3行对应Multipolygon,其余为多边形。 我想将几何为Multipolygon类型的行扩展为Polygon行,如下所示。
name geometry
0 AB10 POLYGON ((-2.116454759005259 57.14656265903432...
1 AB11 POLYGON ((-2.052573095588467 57.1342600856536...
2 AB11 POLYGON ((-2.045849648028651 57.13076387483844...
3 AB12 POLYGON ((-2.128066321470298 57.0368357386797...
4 AB12 POLYGON ((-2.096125852304303 57.14808092585477
3 AB13 POLYGON ((-2.261525922489881 57.10693578217748...
4 AB14 POLYGON ((-2.261525922489879 57.10693578217748...
请注意,AB11和AB12 Multipolygon已扩展为多行,其中每行对应一个多边形数据。
我认为这是对地域数据的操纵。是否有一种pythonic方式来实现上述目标?
谢谢!
答案 0 :(得分:0)
我目前解决上述问题的方法有两方面。
步骤1.遍历每一行,如果类型是多面,则应用列表推导。
name geometry
0 AB10 POLYGON ((-2.116454759005259 57.14656265903432...
1 AB11 [POLYGON ((-2.052573095588467 57.1342600856536...
2 AB12 [POLYGON ((-2.128066321470298 57.0368357386797...
3 AB13 POLYGON ((-2.261525922489881 57.10693578217748...
4 AB14 POLYGON ((-2.261525922489879 57.10693578217748...
第2步:使用将行中元素列表扩展为多行的技巧。
df.set_index(['name'])['geometry'].apply(pd.Series).stack().reset_index()
name level_1 0
0 AB10 0 POLYGON ((-2.116454759005259 57.14656265903432...
1 AB11 0 POLYGON ((-2.052573095588467 57.13426008565365...
2 AB11 1 POLYGON ((-2.045849648028651 57.13076387483844...
3 AB12 0 POLYGON ((-2.128066321470298 57.0368357386797,...
4 AB12 1 POLYGON ((-2.096125852304303 57.14808092585477...
5 AB13 0 POLYGON ((-2.261525922489881 57.10693578217748...
6 AB14 0 POLYGON ((-2.261525922489879 57.10693578217748...
请告诉我是否有办法一步到位!
答案 1 :(得分:0)
如果你只有两列,我们可以使用numpy来提高速度。
如果你有像
这样的数据框name geometry 0 0 polygn(x) 1 2 (polygn(x), polygn(x)) 2 3 polygn(x) 3 4 (polygn(x), polygn(x))
然后numpy meshgrid会帮助
def cartesian(x):
return np.vstack(np.array([np.array(np.meshgrid(*i)).T.reshape(-1,2) for i in x.values]))
ndf = pd.DataFrame(cartesian(df),columns=df.columns)
输出:
name geometry 0 0 polygn(x) 1 2 polygn(x) 2 2 polygn(x) 3 3 polygn(x) 4 4 polygn(x) 5 4 polygn(x)
%%timeit
ndf = pd.DataFrame(cartesian(df),columns=df.columns)
1000 loops, best of 3: 679 µs per loop
%%timeit
df.set_index(['name'])['geometry'].apply(pd.Series).stack().reset_index()
100 loops, best of 3: 5.44 ms per loop