Question

我有一个包含多边形和多边形的shapefile，如下所示：

   name                                           geometry
0  AB10  POLYGON ((-2.116454759005259 57.14656265903432...
1  AB11  (POLYGON ((-2.052573095588467 57.1342600856536...
2  AB12  (POLYGON ((-2.128066321470298 57.0368357386797...
3  AB13  POLYGON ((-2.261525922489881 57.10693578217748...
4  AB14  POLYGON ((-2.261525922489879 57.10693578217748...

第2行和第3行对应Multipolygon，其余为多边形。我想将几何为Multipolygon类型的行扩展为Polygon行，如下所示。

   name                                           geometry
0  AB10  POLYGON ((-2.116454759005259 57.14656265903432...
1  AB11  POLYGON ((-2.052573095588467 57.1342600856536...
2  AB11  POLYGON ((-2.045849648028651 57.13076387483844...
3  AB12  POLYGON ((-2.128066321470298 57.0368357386797...
4  AB12  POLYGON ((-2.096125852304303 57.14808092585477
3  AB13  POLYGON ((-2.261525922489881 57.10693578217748...
4  AB14  POLYGON ((-2.261525922489879 57.10693578217748...

请注意，AB11和AB12 Multipolygon已扩展为多行，其中每行对应一个多边形数据。

我认为这是对地域数据的操纵。是否有一种pythonic方式来实现上述目标？

谢谢！

Answer 1

我目前解决上述问题的方法有两方面。

步骤1.遍历每一行，如果类型是多面，则应用列表推导。

   name                                           geometry
0  AB10  POLYGON ((-2.116454759005259 57.14656265903432...
1  AB11  [POLYGON ((-2.052573095588467 57.1342600856536...
2  AB12  [POLYGON ((-2.128066321470298 57.0368357386797...
3  AB13  POLYGON ((-2.261525922489881 57.10693578217748...
4  AB14  POLYGON ((-2.261525922489879 57.10693578217748...

第2步：使用将行中元素列表扩展为多行的技巧。

df.set_index(['name'])['geometry'].apply(pd.Series).stack().reset_index()

  name  level_1                                                  0
0  AB10        0  POLYGON ((-2.116454759005259 57.14656265903432...
1  AB11        0  POLYGON ((-2.052573095588467 57.13426008565365...
2  AB11        1  POLYGON ((-2.045849648028651 57.13076387483844...
3  AB12        0  POLYGON ((-2.128066321470298 57.0368357386797,...
4  AB12        1  POLYGON ((-2.096125852304303 57.14808092585477...
5  AB13        0  POLYGON ((-2.261525922489881 57.10693578217748...
6  AB14        0  POLYGON ((-2.261525922489879 57.10693578217748...

请告诉我是否有办法一步到位！

Answer 2

如果你只有两列，我们可以使用numpy来提高速度。

如果你有像

这样的数据框

    name                geometry
0     0               polygn(x)
1     2  (polygn(x), polygn(x))
2     3               polygn(x)
3     4  (polygn(x), polygn(x))

然后numpy meshgrid会帮助

def cartesian(x): 
    return np.vstack(np.array([np.array(np.meshgrid(*i)).T.reshape(-1,2) for i in x.values]))

ndf = pd.DataFrame(cartesian(df),columns=df.columns)

输出：

  name   geometry
0    0  polygn(x)
1    2  polygn(x)
2    2  polygn(x)
3    3  polygn(x)
4    4  polygn(x)
5    4  polygn(x)

%%timeit
ndf = pd.DataFrame(cartesian(df),columns=df.columns)

1000 loops, best of 3: 679 µs per loop

%%timeit
df.set_index(['name'])['geometry'].apply(pd.Series).stack().reset_index()

100 loops, best of 3: 5.44 ms per loop

在geopanda数据框架中扩展多边形

2 个答案: