我在一个地理数据框中有5868个点,其中包含一些列/属性。在距离小于10 m的点之间,我只想选择一个点作为该区域的表示。我已经使用以下代码完成了:
ships = gpd.read_file(r"D:\Suhendra\Riset BARATA\data ais\lego_python\kepri_201812_ship.shp")
#'ships' have 5868 data/rows. It is geodataframe with some columns
#remove the 'ships' geometry that have less than 10 m distance each other
point_nodes = list(ships['geometry'])
for i in range(len(point_nodes) - 1):
if point_nodes[i] is None:
continue
for j in range(i + 1, len(point_nodes)):
if point_nodes[j] is None:
continue
if point_nodes[i].distance(point_nodes[j]) < 10: #in meter
point_nodes[j] = None
new_point_nodes = gpd.GeoSeries([node for node in point_nodes if node is not None])
#'new_point_nodes' have 5321 data, it is just geoseries with geometry information
结果是5321点(比原始数据减少),但是它只是地理序列而不是像原始数据一样的geodataframe。为了获得与原始数据类似的结果,如何执行以下条件?
答案 0 :(得分:0)
建议不要将结果合并到原始DataFrame中,而是根据计算结果过滤原始DataFrame。为此,我建议在熊猫系列而不是列表上进行距离比较。这样可以保留原始DataFrame的索引,从而再次允许最终轻松过滤
point_nodes = ships['geometry']
# Do distance comparisons on pandas Series instead of list
for i in range(len(point_nodes) - 1):
if point_nodes.iloc[i] is None:
continue
for j in range(i + 1, len(point_nodes)):
if point_nodes.iloc[j] is None:
continue
if point_nodes.iloc[i].distance(point_nodes.iloc[j]) < d:
point_nodes.iloc[j] = None
# Now filter the original DataFrame according to point_nodes (select all elements which are not None in point_nodes)
new_point_nodes = ships.loc[~point_nodes.isnull()]