目标
我想要:
在运行Ubuntu的AWS ec2实例上正常运行。我的目标是将FLANN与其他人工神经网络实施(如ANNOY和scikit-learn ANN实施)进行比较,以了解哪一个最适合我所工作的公司。我们正在使用数百万维数~500的载体。
出于这个原因,让FLANN本身工作对我来说很重要,而不是接受替代ANN实施的建议。我知道Radim Rehurek的好blogpost,但我们有一个具体的数据集,我们想要检查各种ANN算法的性能,所以他的博客并没有消除我们对我们自己的数据基准。
问题
我已经成功安装了flann和pyflann的版本,但是当被要求使用'kmeans'参数创建ANN索引时,pyflann会返回无意义的结果。例如,请考虑以下python代码及其输出:
>>> from pyflann import *
>>> from numpy import *
>>> from numpy.random import *
>>> dataset = rand(1000, 100)
>>> testset = rand(10, 100)
>>> flann = FLANN()
>>> result,dists = flann.nn(dataset,testset, 5, algorithm="kmeans")
>>> print result
[[ -278697864 32687 -278697864 32687 1677721700]
[ 40632322 6 16778074 1677721700 9]
[ 285184 1509950821 12 25600 1811940196]
[ 15 426661632 140837888 18 16801138]
[ 16779610 21 23986182 107304960 24]
[-2080373660 190447616 27 1694501978 224002059]
[ 30 1694502490 257556491 33 -2080373404]
[ 207224832 36 1509949572 49 0]
[ 43668848 0 -278698024 32687 8650760]
[ 1006080 1392509796 1397948499 208 0]]
>>>
自行:
result,dists = flann.nn(dataset,testset, 5, algorithm="kmeans")
为“testset”中的十个100维向量中的每一个要求五个邻居,输出的数组具有正确的维度:十行对应于“testset”中的十个向量,并且每行具有长度五,反映我要求五个邻居的事实。但是,条目的值不能正确,因为有些是负数,并且许多都在0 - 999范围之外,即可能的最近邻居的索引范围。为了比较,这里是我的终端的输出使用几乎与上面相同的代码,只将“kmeans”改为“kdtree”:
>>> from pyflann import *
>>> from numpy import *
>>> from numpy.random import *
>>> dataset = rand(1000, 100)
>>> testset = rand(10, 100)
>>> flann = FLANN()
>>> result,dists = flann.nn(dataset,testset, 5, algorithm="kdtree")
>>> print result
[[189 363 397 723 685]
[400 952 892 332 477]
[560 959 295 591 394]
[596 652 250 43 448]
[498 706 543 761 323]
[334 974 591 620 766]
[435 386 58 962 421]
[234 301 189 355 191]
[857 133 420 544 612]
[978 995 439 648 627]]
>>>
这次,所有条目都是0到999之间的非负整数,正如预期的那样。当然,数据是随机生成的,因此结果会有所不同,但使用“kmeans”参数会产生持续的愚蠢结果,而“kdtree”会产生始终合理的结果。
软件和操作系统详细信息
(0) Ubuntu发行版:
Ubuntu 14.04 LTS
(1) libflann-dev:
打字:
sudo aptitude show libflann-dev
制作:
Package: libflann-dev
State: installed
Automatically installed: no
Version: 1.8.4-3
Priority: optional
Section: universe/libdevel
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: amd64
Uncompressed Size: 11.2 M
Depends: libflann1.8 (= 1.8.4-3)
Description: Fast Library for Approximate Nearest Neighbors - development
FLANN is a library for performing fast approximate nearest neighbor searches in high dimensional spaces. It contains a collection of algorithms found to work best for
nearest neighbor search and a system for automatically choosing the best algorithm and optimum parameters depending on the dataset.
This package contains development files needed to build FLANN applications.
Homepage: http://www.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN
(2)打字:
sudo aptitude show python
产生
Package: python
State: installed
Automatically installed: no
Multi-Arch: allowed
Version: 2.7.5-5ubuntu3
Priority: optional
Section: python
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
Architecture: amd64
Uncompressed Size: 687 k
Depends: python2.7 (>= 2.7.5-1~), python-minimal (= 2.7.5-5ubuntu3), libpython-stdlib (= 2.7.5-5ubuntu3)
Suggests: python-doc (= 2.7.5-5ubuntu3), python-tk (>= 2.7.5-1~)
Conflicts: python-central (< 0.5.5)
Breaks: python-bz2 (< 1.1-8), python-csv (< 1.0-4), python-email (< 2.5.5-3), update-manager-core (< 0.200.5-2)
Replaces: python-dev (< 2.6.5-2)
Provides: python-ctypes, python-email, python-importlib, python-profiler, python-wsgiref, python:any
Description: interactive high-level object-oriented language (default version)
Python, the high-level, interactive object oriented language, includes an extensive class library with lots of goodies for network programming, system administration,
sounds and graphics.
This package is a dependency package, which depends on Debian's default Python version (currently v2.7).
Homepage: http://www.python.org/
安装方法
我首先尝试使用以下命令安装FLANN:
sudo apt-get install libflann1.8
安装pyflann后:
sudo pip install -e git+git://github.com/Captricity/pyflann.git#egg=pyflann,
我打字:
python -c 'import pyflann'
并收到错误消息:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/mnt/working/src/pyflann/pyflann/__init__.py", line 27, in <module>
from index import *
File "/mnt/working/src/pyflann/pyflann/index.py", line 27, in <module>
from bindings.flann_ctypes import *
File "/mnt/working/src/pyflann/pyflann/bindings/__init__.py", line 30, in <module>
from flann_ctypes import *
File "/mnt/working/src/pyflann/pyflann/bindings/flann_ctypes.py", line 169, in <module>
raise ImportError('Cannot load dynamic library. Did you compile FLANN?')
ImportError: Cannot load dynamic library. Did you compile FLANN?
然后,在一个新的ec2实例上,我输入了:
sudo apt-get install libflann-dev
sudo pip install -e git+git://github.com/Captricity/pyflann.git#egg=pyflann
然后跑
python -c 'import pyflann'
没有抱怨。但是,我有上面描述的“kmeans”问题。
注意
我已成功在我的MacBookPro上安装FLANN和pyflann,一切正常 - 即使使用“kmeans”作为最近邻居查询参数也能产生合理的结果。