令x表示p值的向量(即p维空间中的数据点)。
我有两组:n个元素的集合A = {xi,..,xn}和m个元素的集合B = {xj,..,xm},其中| A | > 1和| B | > 1.给定整数k> 0,令dist(x,k,A)返回从A到其最近点的平均欧几里德距离的函数;和dist(x,k,B)从x到其最近点的平均欧几里德距离。
我有以下算法:
Repeat
{
A' = { x in A, such that dist(x, k, A) > dist(x, k, B) }
B' = { x in B, such that dist(x, k, A) < dist(x, k, B) }
A = { x in A such that x not in A' } U B'
B = { x in B such that x not in B' } U A'
}
Until CONDITION == True
终止:当没有更多元素从A移动到B或从B移动到A(即A'和B'变为空)或者当| A |时,CONDITION为True或| B |变得小于或等于1。
1)是否有可能证明该算法终止?
2)如果是这样,是否也可以为终止所需的迭代次数设置上限?
注意:集合S中x与k的最近点,意味着:S中的k点(除x之外),与欧氏距离的最小值为x。
答案 0 :(得分:2)
看起来这个算法可以永远循环,在两个或多个状态之间振荡。我使用以下Python程序通过实验确定了这个:
def mean(seq):
if len(seq) == 0:
raise IndexError("didn't expect empty sequence for mean")
return sum(seq) / float(len(seq))
def dist(a,b):
return abs(a-b)
def mean_dist(x, k, a):
neighbors = {p for p in a if p != x}
neighbors = sorted(neighbors, key=lambda p: dist(p,x))
return mean([dist(x, p) for p in neighbors[:k]])
def frob(a,b,k, verbose = False):
def show(msg):
if verbose:
print msg
seen_pairs = set()
iterations = 0
while True:
iterations += 1
show("Iteration #{}".format(iterations))
a_star = {x for x in a if mean_dist(x, k, a) > mean_dist(x,k,b)}
b_star = {x for x in b if mean_dist(x, k, a) < mean_dist(x,k,b)}
a_temp = {x for x in a if x not in a_star} | b_star
b_temp = {x for x in b if x not in b_star} | a_star
show("\tA`: {}".format(list(a_star)))
show("\tB`: {}".format(list(b_star)))
show("\tA becomes {}".format(list(a_temp)))
show("\tB becomes {}".format(list(b_temp)))
if a_temp == a and b_temp == b:
return a, b
key = (tuple(sorted(a_temp)), tuple(sorted(b_temp)))
if key in seen_pairs:
raise Exception("Infinite loop for values {} and {}".format(list(a_temp),list(b_temp)))
seen_pairs.add(key)
a = a_temp
b = b_temp
import random
#creates a set of random integers, with the given number of elements.
def randSet(size):
a = set()
while len(a) < size:
a.add(random.randint(0, 10))
return a
size = 2
k = 1
#p equals one because I don't feel like doing vector math today
while True:
a = randSet(size)
b = randSet(size)
try:
frob(a,b, k)
except IndexError as e:
continue
except Exception as e:
print "infinite loop detected for initial inputs {} and {}".format(list(a), list(b))
#run the algorithm again, but showing our work this time
try:
frob(a,b,k, True)
except:
pass
break
结果:
infinite loop detected for initial inputs [10, 4] and [1, 5]
Iteration #1
A`: [10, 4]
B`: [1, 5]
A becomes [1, 5]
B becomes [10, 4]
Iteration #2
A`: [1, 5]
B`: [10, 4]
A becomes [10, 4]
B becomes [1, 5]
Iteration #3
A`: [10, 4]
B`: [1, 5]
A becomes [1, 5]
B becomes [10, 4]
在这种情况下,循环永远不会终止,因为A和B会不断完全切换。在尝试更大的设置尺寸时,我发现只有一些元素切换的情况:
infinite loop detected for initial inputs [8, 1, 0] and [9, 4, 5]
Iteration #1
A`: [8]
B`: [9]
A becomes [0, 1, 9]
B becomes [8, 4, 5]
Iteration #2
A`: [9]
B`: [8]
A becomes [0, 1, 8]
B becomes [9, 4, 5]
Iteration #3
A`: [8]
B`: [9]
A becomes [0, 1, 9]
B becomes [8, 4, 5]
此处,元素8和9来回移动,而其他元素保持在原位。