当传递'kmeans'参数时,pyflann返回ubuntu上最近邻居的负索引

时间:2014-06-17 16:06:37

标签: python ubuntu k-means nearest-neighbor flann

目标

我想要:

  1. 近似最近邻居图书馆FLANN和
  2. python绑定pyflann
  3. 在运行Ubuntu的AWS ec2实例上正常运行。我的目标是将FLANN与其他人工神经网络实施(如ANNOY和scikit-learn ANN实施)进行比较,以了解哪一个最适合我所工作的公司。我们正在使用数百万维数~500的载体。

    出于这个原因,让FLANN本身工作对我来说很重要,而不是接受替代ANN实施的建议。我知道Radim Rehurek的好blogpost,但我们有一个具体的数据集,我们想要检查各种ANN算法的性能,所以他的博客并没有消除我们对我们自己的数据基准。

    问题

    我已经成功安装了flann和pyflann的版本,但是当被要求使用'kmeans'参数创建ANN索引时,pyflann会返回无意义的结果。例如,请考虑以下python代码及其输出:

    >>> from pyflann import *
    >>> from numpy import *
    >>> from numpy.random import *
    >>> dataset = rand(1000, 100)
    >>> testset = rand(10, 100)
    >>> flann = FLANN()
    >>> result,dists = flann.nn(dataset,testset, 5, algorithm="kmeans")
    >>> print result
    [[ -278697864       32687  -278697864       32687  1677721700]
     [   40632322           6    16778074  1677721700           9]
     [     285184  1509950821          12       25600  1811940196]
     [         15   426661632   140837888          18    16801138]
     [   16779610          21    23986182   107304960          24]
     [-2080373660   190447616          27  1694501978   224002059]
     [         30  1694502490   257556491          33 -2080373404]
     [  207224832          36  1509949572          49           0]
     [   43668848           0  -278698024       32687     8650760]
     [    1006080  1392509796  1397948499         208           0]]
    >>>
    

    自行:

    result,dists = flann.nn(dataset,testset, 5, algorithm="kmeans")
    

    为“testset”中的十个100维向量中的每一个要求五个邻居,输出的数组具有正确的维度:十行对应于“testset”中的十个向量,并且每行具有长度五,反映我要求五个邻居的事实。但是,条目的值不能正确,因为有些是负数,并且许多都在0 - 999范围之外,即可能的最近邻居的索引范围。为了比较,这里是我的终端的输出使用几乎与上面相同的代码,只将“kmeans”改为“kdtree”:

    >>> from pyflann import *
    >>> from numpy import *
    >>> from numpy.random import *
    >>> dataset = rand(1000, 100)
    >>> testset = rand(10, 100)
    >>> flann = FLANN()
    >>> result,dists = flann.nn(dataset,testset, 5, algorithm="kdtree")
    >>> print result
    [[189 363 397 723 685]
     [400 952 892 332 477]
     [560 959 295 591 394]
     [596 652 250  43 448]
     [498 706 543 761 323]
     [334 974 591 620 766]
     [435 386  58 962 421]
     [234 301 189 355 191]
     [857 133 420 544 612]
     [978 995 439 648 627]]
    >>>
    

    这次,所有条目都是0到999之间的非负整数,正如预期的那样。当然,数据是随机生成的,因此结果会有所不同,但使用“kmeans”参数会产生持续的愚蠢结果,而“kdtree”会产生始终合理的结果。

    软件和操作系统详细信息

    (0) Ubuntu发行版:

    Ubuntu 14.04 LTS

    (1) libflann-dev:

    打字:

    sudo aptitude show libflann-dev

    制作:

    Package: libflann-dev
    State: installed
    Automatically installed: no
    Version: 1.8.4-3
    Priority: optional
    Section: universe/libdevel
    Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
    Architecture: amd64
    Uncompressed Size: 11.2 M
    Depends: libflann1.8 (= 1.8.4-3)
    Description: Fast Library for Approximate Nearest Neighbors - development
     FLANN is a library for performing fast approximate nearest neighbor searches in high dimensional spaces. It contains a collection of algorithms found to work best for
     nearest neighbor search and a system for automatically choosing the best algorithm and optimum parameters depending on the dataset.
    
     This package contains development files needed to build FLANN applications.
    Homepage: http://www.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN
    

    (2)打字:

    sudo aptitude show python

    产生

    Package: python
    State: installed
    Automatically installed: no
    Multi-Arch: allowed
    Version: 2.7.5-5ubuntu3
    Priority: optional
    Section: python
    Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
    Architecture: amd64
    Uncompressed Size: 687 k
    Depends: python2.7 (>= 2.7.5-1~), python-minimal (= 2.7.5-5ubuntu3), libpython-stdlib (= 2.7.5-5ubuntu3)
    Suggests: python-doc (= 2.7.5-5ubuntu3), python-tk (>= 2.7.5-1~)
    Conflicts: python-central (< 0.5.5)
    Breaks: python-bz2 (< 1.1-8), python-csv (< 1.0-4), python-email (< 2.5.5-3), update-manager-core (< 0.200.5-2)
    Replaces: python-dev (< 2.6.5-2)
    Provides: python-ctypes, python-email, python-importlib, python-profiler, python-wsgiref, python:any
    Description: interactive high-level object-oriented language (default version)
     Python, the high-level, interactive object oriented language, includes an extensive class library with lots of goodies for network programming, system administration,
     sounds and graphics.
    
     This package is a dependency package, which depends on Debian's default Python version (currently v2.7).
    Homepage: http://www.python.org/
    

    安装方法

    我首先尝试使用以下命令安装FLANN:

    sudo apt-get install libflann1.8
    

    安装pyflann后:

    sudo pip install -e git+git://github.com/Captricity/pyflann.git#egg=pyflann,
    

    我打字:

    python -c 'import pyflann'
    

    并收到错误消息:

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/mnt/working/src/pyflann/pyflann/__init__.py", line 27, in <module>
        from index import *
      File "/mnt/working/src/pyflann/pyflann/index.py", line 27, in <module>
        from bindings.flann_ctypes import *
      File "/mnt/working/src/pyflann/pyflann/bindings/__init__.py", line 30, in <module>
        from flann_ctypes import *
      File "/mnt/working/src/pyflann/pyflann/bindings/flann_ctypes.py", line 169, in <module>
        raise ImportError('Cannot load dynamic library. Did you compile FLANN?')
    ImportError: Cannot load dynamic library. Did you compile FLANN?
    

    然后,在一个新的ec2实例上,我输入了:

    sudo apt-get install libflann-dev
    sudo pip install -e git+git://github.com/Captricity/pyflann.git#egg=pyflann
    

    然后跑

    python -c 'import pyflann'
    
    没有抱怨。但是,我有上面描述的“kmeans”问题。

    注意

    我已成功在我的MacBookPro上安装FLANN和pyflann,一切正常 - 即使使用“kmeans”作为最近邻居查询参数也能产生合理的结果。

0 个答案:

没有答案