Question

令x表示p值的向量（即p维空间中的数据点）。

我有两组：n个元素的集合A = {xi，..，xn}和m个元素的集合B = {xj，..，xm}，其中| A | ＆GT; 1和| B | ＆GT; 1.给定整数k> 0，令dist（x，k，A）返回从A到其最近点的平均欧几里德距离的函数;和dist（x，k，B）从x到其最近点的平均欧几里德距离。

我有以下算法：

Repeat
{
   A' = { x in A, such that dist(x, k, A) > dist(x, k, B) }
   B' = { x in B, such that dist(x, k, A) < dist(x, k, B) }
   A = { x in A such that x not in A' } U B'
   B = { x in B such that x not in B' } U A'
}
Until CONDITION == True

终止：当没有更多元素从A移动到B或从B移动到A（即A'和B'变为空）或者当| A |时，CONDITION为True或| B |变得小于或等于1。

1）是否有可能证明该算法终止？

2）如果是这样，是否也可以为终止所需的迭代次数设置上限？

注意：集合S中x与k的最近点，意味着：S中的k点（除x之外），与欧氏距离的最小值为x。

Answer 1

看起来这个算法可以永远循环，在两个或多个状态之间振荡。我使用以下Python程序通过实验确定了这个：

def mean(seq):
    if len(seq) == 0:
        raise IndexError("didn't expect empty sequence for mean")
    return sum(seq) / float(len(seq))

def dist(a,b):
    return abs(a-b)

def mean_dist(x, k, a):
    neighbors = {p for p in a if p != x}
    neighbors = sorted(neighbors, key=lambda p: dist(p,x))
    return mean([dist(x, p) for p in neighbors[:k]])

def frob(a,b,k, verbose = False):
    def show(msg):
        if verbose:
            print msg
    seen_pairs = set()
    iterations = 0
    while True:
        iterations += 1
        show("Iteration #{}".format(iterations))
        a_star = {x for x in a if mean_dist(x, k, a) > mean_dist(x,k,b)}
        b_star = {x for x in b if mean_dist(x, k, a) < mean_dist(x,k,b)}
        a_temp = {x for x in a if x not in a_star} | b_star
        b_temp = {x for x in b if x not in b_star} | a_star
        show("\tA`: {}".format(list(a_star)))
        show("\tB`: {}".format(list(b_star)))
        show("\tA becomes {}".format(list(a_temp)))
        show("\tB becomes {}".format(list(b_temp)))
        if a_temp == a and b_temp == b:
            return a, b
        key = (tuple(sorted(a_temp)), tuple(sorted(b_temp)))
        if key in seen_pairs:
            raise Exception("Infinite loop for values {} and {}".format(list(a_temp),list(b_temp)))
        seen_pairs.add(key)
        a = a_temp
        b = b_temp

import random
#creates a set of random integers, with the given number of elements.
def randSet(size):
    a = set()
    while len(a) < size:
        a.add(random.randint(0, 10))
    return a

size = 2
k = 1
#p equals one because I don't feel like doing vector math today

while True:
    a = randSet(size)
    b = randSet(size)
    try:
        frob(a,b, k)
    except IndexError as e:
        continue
    except Exception as e:
        print "infinite loop detected for initial inputs {} and {}".format(list(a), list(b))
        #run the algorithm again, but showing our work this time
        try:
            frob(a,b,k, True)
        except:
            pass
        break

结果：

infinite loop detected for initial inputs [10, 4] and [1, 5]
Iteration #1
        A`: [10, 4]
        B`: [1, 5]
        A becomes [1, 5]
        B becomes [10, 4]
Iteration #2
        A`: [1, 5]
        B`: [10, 4]
        A becomes [10, 4]
        B becomes [1, 5]
Iteration #3
        A`: [10, 4]
        B`: [1, 5]
        A becomes [1, 5]
        B becomes [10, 4]

在这种情况下，循环永远不会终止，因为A和B会不断完全切换。在尝试更大的设置尺寸时，我发现只有一些元素切换的情况：

infinite loop detected for initial inputs [8, 1, 0] and [9, 4, 5]
Iteration #1
        A`: [8]
        B`: [9]
        A becomes [0, 1, 9]
        B becomes [8, 4, 5]
Iteration #2
        A`: [9]
        B`: [8]
        A becomes [0, 1, 8]
        B becomes [9, 4, 5]
Iteration #3
        A`: [8]
        B`: [9]
        A becomes [0, 1, 9]
        B becomes [8, 4, 5]

此处，元素8和9来回移动，而其他元素保持在原位。

如何检查此算法是否可能无法终止？

1 个答案: