给定一个元组/点列表,我试图找出如何对每个元组进行分组,这些元组位于给定的边界(距离)内。这很难解释,但是短代码应该解释我的意思......我根本找不到解决方案,也不能解释问题。
EG:
TPL = [(1, 1), (2, 1), (3, 2), (7, 5), (2, 7), (6, 4), (2, 3), (2, 6), (3, 1)]
Print GroupTPL(TPL, distance=1)
> [
> [(2, 7), (2, 6)],
> [(6, 4), (7, 5)],
> [(3, 2), (3, 1), (2, 3), (1, 1), (2, 1)]
> ]
我尝试过的所有东西,都是垃圾......所以我认为没有理由考虑分享,希望你们有一些提示和技巧。
答案 0 :(得分:3)
我假设您想要将点聚集在一起时打算Chebyshev distance。
在这种情况下,最直接的方法是使用Union Find data structure。
这是我使用过的一个实现:
class UnionFind:
"""Union-find data structure. Items must be hashable."""
def __init__(self):
"""Create a new empty union-find structure."""
self.weights = {}
self.parents = {}
def __getitem__(self, obj):
"""X[item] will return the token object of the set which contains `item`"""
# check for previously unknown object
if obj not in self.parents:
self.parents[obj] = obj
self.weights[obj] = 1
return obj
# find path of objects leading to the root
path = [obj]
root = self.parents[obj]
while root != path[-1]:
path.append(root)
root = self.parents[root]
# compress the path and return
for ancestor in path:
self.parents[ancestor] = root
return root
def union(self, obj1, obj2):
"""Merges sets containing obj1 and obj2."""
roots = [self[obj1], self[obj2]]
heavier = max([(self.weights[r],r) for r in roots])[1]
for r in roots:
if r != heavier:
self.weights[heavier] += self.weights[r]
self.parents[r] = heavier
然后编写函数groupTPL
很简单:
def groupTPL(TPL, distance=1):
U = UnionFind()
for (i, x) in enumerate(TPL):
for j in range(i + 1, len(TPL)):
y = TPL[j]
if max(abs(x[0] - y[0]), abs(x[1] - y[1])) <= distance:
U.union(x, y)
disjSets = {}
for x in TPL:
s = disjSets.get(U[x], set())
s.add(x)
disjSets[U[x]] = s
return [list(x) for x in disjSets.values()]
在你的套装上运行它会产生:
>>> groupTPL([(1, 1), (2, 1), (3, 2), (7, 5), (2, 7), (6, 4), (2, 3), (2, 6), (3, 1)])
[
[(2, 7), (2, 6)],
[(6, 4), (7, 5)],
[(3, 2), (3, 1), (2, 3), (1, 1), (2, 1)]
]
然而,这个实现虽然简单,仍然是O(n^2)
。如果点数增长非常大,则有效的实现将使用k-d trees。
答案 1 :(得分:1)
我的回答很晚;但这很简短而且有效!!
from itertools import combinations
def groupTPL(inputlist):
ptdiff = lambda (p1,p2):(p1,p2,abs(p1[0]-p2[0])+ abs(p1[1]-p2[1]),sqrt((p2[1] - p1[1])**2 + (p2[0] - p1[0])**2 ))
diffs=[ x for x in map(ptdiff, combinations(inputlist,2)) if x[2]==1 or x[3]==sqrt(2)]
nk1=[]
for x in diffs:
if len(nk1)>0:
for y in nk1:
if x[0] in y or x[1] in y:
y.add(x[0])
y.add(x[1])
else:
if set(x[0:2]) not in nk1:
nk1.append(set(x[0:2]))
else:
nk1.append(set(x[0:2]))
return [list(x) for x in nk1]
print groupTPL([(1, 1), (2, 1), (3, 2), (7, 5), (2, 7), (6, 4), (2, 3), (2, 6), (3, 1)])
这将输出为::::
[[(3, 2), (3, 1), (2, 3), (1, 1), (2, 1)], [(6, 4), (7, 5)], [(2, 7), (2, 6)]]
答案 2 :(得分:0)
只是填写一个替代方案,默认情况下不会比musically-ut
给出的Union-Find代码更快,但它与Cython
一起使用很简单,从而实现3倍的加速,但是是默认情况下更快的情况。这不是我的工作,这是在这里找到的东西:https://github.com/MerlijnWajer/Simba/blob/master/Units/MMLCore/tpa.pas
Cython-code:(删除cdef int ...,以及用于Python的int w,int h)
def group_pts(pts, int w, int h):
cdef int t1, t2, c, ec, tc, l
l = len(pts)-1
if (l < 0): return False
result = [list() for i in range(l+1)]
c = 0
ec = 0
while ((l - ec) >= 0):
result[c].append(pts[0])
pts[0] = pts[l - ec]
ec += 1
tc = 1
t1 = 0
while (t1 < tc):
t2 = 0
while (t2 <= (l - ec)):
if (abs(result[c][t1][0] - pts[t2][0]) <= w) and \
(abs(result[c][t1][1] - pts[t2][1]) <= h):
result[c].append(pts[t2])
pts[t2] = pts[l - ec]
ec += 1
tc += 1
t2 -= 1
t2 += 1
t1 += 1
c += 1
return result[0:c]
这可能会稍微优化一下,但我没有花时间这么做。这也允许重复,Union-Find结构不是很高兴。
使用SciPy的kd-tree来处理这个问题会很有趣,毫无疑问会为更大的数据集带来速度。