给出欧氏距离范围的分区邻居点

时间:2010-12-05 22:47:49

标签: algorithm cluster-analysis partitioning spatial euclidean-distance

给定两个点P,Q和一个delta,我定义了等价关系〜=,其中P~ = Q,如果EuclideanDistance(P,Q) <= delta。现在,给定 n 点的 S ,在示例中S =(A,B,C,D,E,F)和n = 6(事实点)实际上是段的端点可以忽略不计),是否有一种算法在平均情况下具有比O(n ^ 2)更好的复杂度来找到集合的分区(子集的代表性元素是不重要的)?

到目前为止,尝试找到这个问题的理论定义是不成功的:k-means聚类,最近邻搜索和其他似乎我不同的问题。图片显示了我在申请中需要做的事情。

任何提示?感谢

alt text

编辑:虽然实际问题(群集附近点给定某种不变量)应该可以解决,在平均情况下比O(n ^ 2)更好,但是我的确有一个严重的缺陷问题定义:=〜等价关系,因为它不尊重传递属性这一简单事实。我认为这是这个问题不易解决且需要先进技术的主要原因。将很快发布我的实际解决方案:当接近点满足=〜定义时应该工作。当极点分离点不尊重关系但是它们与聚集点的重心有关时会失败。它适用于我的输入数据空间,可能不适合您的输入数据空间。有谁知道这个问题的完整形式问题(有解决方案)?

5 个答案:

答案 0 :(得分:1)

重述问题的一种方法如下:给定一组n个2D点,​​对于每个点p,找到直径为{{1}的圆圈所包含的点集以delta为中心。

天真的线性搜索会提供您提到的p算法。

在我看来,在最糟糕的情况下,这是最好的 。当集合中的所有点都包含在直径<= O(n^2)的圆圈内时,每个delta查询都必须返回n个点,从而产生O(n)整体复杂度

但是,人们应该能够在更合理的数据集上做得更好。 请查看this(尤其是空间分区部分)和KD-trees。后者应该在合理的情况下给你一个子O(n^2)算法。

可能有一种不同的方式来看待问题,这会提供更好的复杂性;我无法想到任何一件事。

答案 1 :(得分:0)

Quadtree绝对是个问题。

您还可以尝试对每个coordonate进行排序并使用这两个列表进行排序(排序为n*log(n),并且您只能检查满足dx <= delta && dy <= delta的点。此外,您可以将它们放入已排序的带有两个指针级别的列表:一个用于在OX上解析,另一个用于OY。

答案 2 :(得分:0)

对于每个点,计算距原点的距离D(n),这是O(n)运算。

使用O(n ^ 2)算法来找到D(a-b)&lt; delta,跳过 D(a)-D(b)&gt;增量。

由于跳过(希望很大)数字,结果平均必须优于O(n ^ 2)。

答案 3 :(得分:0)

这是一个C#KdTree实现,它应该解决“在 delta 中查找点 P 的所有邻居”。它大量使用函数式编程技术(是的,我喜欢Python)。这是测试,但我仍然怀疑对_TreeFindNearest()的理解存在疑虑。解决问题的代码(或伪代码)“在平均情况下给出一个优于O(n ^ 2)的〜=关系的一组 n 点的分区”在另一个答案中发布。

/*
Stripped C# 2.0 port of ``kdtree'', a library for working with kd-trees.
Copyright (C) 2007-2009 John Tsiombikas <nuclear@siggraph.org>
Copyright (C) 2010 Francesco Pretto <ceztko@gmail.com>

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution.
3. The name of the author may not be used to endorse or promote products
   derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR IMPLIED
WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO
EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT
OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING
IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY
OF SUCH DAMAGE.
*/

using System;
using System.Collections.Generic;
using System.Text;

namespace ITR.Data.NET
{
    public class KdTree<T>
    {
        #region Fields

        private Node _Root;
        private int _Count;
        private int _Dimension;
        private CoordinateGetter<T>[] _GetCoordinate;

        #endregion // Fields

        #region Constructors

        public KdTree(params CoordinateGetter<T>[] coordinateGetters)
        {
            _Dimension = coordinateGetters.Length;
            _GetCoordinate = coordinateGetters;
        }

        #endregion // Constructors

        #region Public methods

        public void Insert(T location)
        {
            _TreeInsert(ref _Root, 0, location);
            _Count++;
        }

        public void InsertAll(IEnumerable<T> locations)
        {
            foreach (T location in locations)
                Insert(location);
        }

        public IEnumerable<T> FindNeighborsRange(T location, double range)
        {
            return _TreeFindNeighborsRange(_Root, 0, location, range);
        }

        #endregion // Public methods

        #region Tree traversal

        private void _TreeInsert(ref Node current, int currentPlane, T location)
        {
            if (current == null)
            {
                current = new Node(location);
                return;
            }

            int nextPlane = (currentPlane + 1) % _Dimension;

            if (_GetCoordinate[currentPlane](location) <
                    _GetCoordinate[currentPlane](current.Location))
                _TreeInsert(ref current._Left, nextPlane, location);
            else
                _TreeInsert(ref current._Right, nextPlane, location);
        }

        private IEnumerable<T> _TreeFindNeighborsRange(Node current, int currentPlane,
            T referenceLocation, double range)
        {
            if (current == null)
                yield break;

            double squaredDistance = 0;
            for (int it = 0; it < _Dimension; it++)
            {
                double referenceCoordinate = _GetCoordinate[it](referenceLocation);
                double currentCoordinate = _GetCoordinate[it](current.Location);
                squaredDistance +=
                    (referenceCoordinate - currentCoordinate)
                    * (referenceCoordinate - currentCoordinate);
            }

            if (squaredDistance <= range * range)
                yield return current.Location;

            double coordinateRelativeDistance =
                _GetCoordinate[currentPlane](referenceLocation)
                    - _GetCoordinate[currentPlane](current.Location);
            Direction nextDirection = coordinateRelativeDistance <= 0.0
                ? Direction.LEFT : Direction.RIGHT;
            int nextPlane = (currentPlane + 1) % _Dimension;
            IEnumerable<T> subTreeNeighbors =
                _TreeFindNeighborsRange(current[nextDirection], nextPlane,
                    referenceLocation, range);
            foreach (T location in subTreeNeighbors)
                yield return location;

            if (Math.Abs(coordinateRelativeDistance) <= range)
            {
                subTreeNeighbors =
                    _TreeFindNeighborsRange(current.GetOtherChild(nextDirection),
                        nextPlane, referenceLocation, range);
                foreach (T location in subTreeNeighbors)
                    yield return location;
            }
        }

        #endregion // Tree traversal

        #region Node class

        public class Node
        {
            #region Fields

            private T _Location;
            internal Node _Left;
            internal Node _Right;

            #endregion // Fields

            #region Constructors

            internal Node(T nodeValue)
            {
                _Location = nodeValue;
                _Left = null;
                _Right = null;
            }

            #endregion // Contructors

            #region Children Indexers

            public Node this[Direction direction]
            {
                get { return direction == Direction.LEFT ? _Left : Right; }
            }

            public Node GetOtherChild(Direction direction)
            {
                return direction == Direction.LEFT ? _Right : _Left;
            }

            #endregion // Children Indexers

            #region Properties

            public T Location
            {
                get { return _Location; }
            }

            public Node Left
            {
                get { return _Left; }
            }

            public Node Right
            {
                get { return _Right; }
            }

            #endregion // Properties
        }

        #endregion // Node class

        #region Properties

        public int Count
        {
            get { return _Count; }
            set { _Count = value; }
        }

        public Node Root
        {
            get { return _Root; }
            set { _Root = value; }
        }

        #endregion // Properties
    }

    #region Enums, delegates

    public enum Direction
    {
        LEFT = 0,
        RIGHT
    }

    public delegate double CoordinateGetter<T>(T location);

    #endregion // Enums, delegates
}

答案 4 :(得分:0)

以下C#方法,与KdTree类,Join()(枚举作为参数传递的所有集合)和Shuffled()(返回传递集合的洗牌版本)方法一起解决了我的问题。当referenceVectorsvectorsToRelocate是相同的向量时,可能存在一些有缺陷的案例(在问题中阅读EDIT),就像我在问题中所做的那样。应该完美如果你真的有一些参考向量,那么你我真的很喜欢这种方法。 :d

public static Dictionary<Vector2D, Vector2D> FindRelocationMap(
    IEnumerable<Vector2D> referenceVectors,
    IEnumerable<Vector2D> vectorsToRelocate)
{
    Dictionary<Vector2D, Vector2D> ret = new Dictionary<Vector2D, Vector2D>();

    // Preliminary filling
    IEnumerable<Vector2D> allVectors =
        Utils.Join(referenceVectors, vectorsToRelocate);
    foreach (Vector2D vector in allVectors)
        ret[vector] = vector;

    KdTree<Vector2D> kdTree = new KdTree<Vector2D>(
        delegate(Vector2D vector) { return vector.X; },
        delegate(Vector2D vector) { return vector.Y; });
    kdTree.InsertAll(Utils.Shuffled(ret.Keys));

    HashSet<Vector2D> relocatedVectors = new HashSet<Vector2D>();
    foreach (Vector2D vector in referenceVectors)
    {
        if (relocatedVectors.Contains(vector))
            continue;

        relocatedVectors.Add(vector);

        IEnumerable<Vector2D> neighbors =
            kdTree.FindNeighborsRange(vector, Tolerances.EUCLID_DIST_TOLERANCE);

        foreach (Vector2D neighbor in neighbors)
        {
            ret[neighbor] = vector;
            relocatedVectors.Add(neighbor);
        }
    }

    return ret;
}