Question

从元组列表开始，其中包含图上点的x和y坐标；我想删除列表中的重复点；但是，出于我的目的，在10分之内的点我认为是重复项。

我编写了一个似乎可以完成这项工作的函数，但是我敢肯定有更好的方法。在下面的示例数据中：点1、2和5是重复项（彼此之间的距离为10）。我不在乎这三点中的哪一点在淘汰过程中仍然存在。我希望处理的分数不超过100分，其中大约50％被淘汰。谢谢！

def is_close(pointA, pointB, closeness):
    x1, y1  = pointA
    x2, y2 = pointB
    distance = int(((x2-x1)**2 + (y2-y1)**2)**0.5) # distance formula
    if distance < closeness:
        return True
    return False

def remove_close_duplicated(data, closeness):
    if len(data) < 2: # can't have duplicates if there aren't at least 2 points
        return data
    new_list_points = []
    for i, point in enumerate(data):
        if i == 0:
            new_list_points.append(point)
            continue
        close = False
        for new_point in new_list_points:
            if is_close(new_point, point, closeness):
                close = True
                break 
        if close == False:
            new_list_points.append(point)
    return new_list_points

sample_data =[
    (600, 400), # 1
    (601, 401), # 2
    (725, 300), # 3
    (800, 900), # 4
    (601, 400), # 5
]

closeness = 10                  
print(remove_close_duplicated(sample_data, closeness))
'''
output is:
[(600, 400), (725, 300), (800, 900)]
'''

Answer 1

这有两部分：找到紧密对和找到分离良好的集合（近邻关系的传递闭合的等价类，或近邻图的连接分量）。

仅需100点，您就可以用蛮力完成第一部分的工作，但是有效的选择包括将一侧的垃圾箱分组为10个，这样，一个点的所有近邻都必须位于其垃圾箱中或相邻的垃圾箱中，或将点存储在 k - d 树之类的东西中。

对于第二部分，一个标准的解决方案是建立一个不相交的林，在每个相邻对之间应用联合操作（任意选择一个要存储在（新）根中的点）。末端与根相关联的点是所需的简化集。

从列表中删除类似的项目

1 个答案: