汉明距离Matlab到Python

时间:2017-04-09 14:23:51

标签: python matlab numpy

好的,所以我用k-NN方法做两个文件的汉明距离。我试图将Matlab代码翻译成Python,但我已经看了好几个小时,并且不知道导致错误的原因。

Matlab中的代码:

function [ Dist ] = hamming_distance( X,Xtrain )
% Function calculates Hamming distances of elements in set X from elements in set Xtrain. Distances of objects are returned as matrix Dist
% X - set of objects we are comparing N1xD
% Xtrain - set of objects to which X objects are compared N2xD
% Dist - matrix of distances between X and Xtrain objects N1xN2
% N1 - number of elements in X
% N2 - number of elements in Xtrain
% D - number of features (key words)

N1 = size(X,1);
N2 = size(Xtrain,1);
Dist = zeros(N1,N2);
D1 = size(X,2);
for i=1:N1
    for j=1:N2
        temp_matrix = xor(X(i,1:D1),Xtrain(j,1:D1));
        Dist(i,j) = sum(temp_matrix);
    end
end
end

这是我到目前为止用Python编写的内容:

def hamming_distance(X, X_train):
    """
    :param X: set of objects that are going to be compared N1xD
    :param X_train: set of objects compared against param X N2xD
    Functions calculates Hamming distances between all objects from set X  and all object from set X_train.
    Resulting distances are returned as matrices.
    :return: Distance matrix between objects X and X_train X i X_train N1xN2
    """
    N1 = X.shape[0]
    N2 = X_train.shape[0]
    hdist = np.zeros(shape =(N1, N2))
    D1 = X.shape[1]
    for i in range (1,N1):
        for j in range (1, N2):
            temp_matrix = np.logical_xor(X[i,1:D1], X_train[j, 1:D1])
            hdist[i, j] = np.sum(temp_matrix)
    return hdist

错误似乎在Python代码的xor部分。我不明白那里可能出现的问题;我试着把它作为(X[i,1:D1]) ^ (X_train[j, 1:D1]),但它并没有改变任何东西。我检查了logical_xor函数,似乎我有正确的函数输入。我不明白错误的来源。可能是因为不同形状的矩阵?在调整它们的大小时我感到很困惑,我应该将X和X_train更改为数组吗?我尝试了一次,但它没有任何帮助。

错误:

Traceback (most recent call last):
  File "C:\...\test.py", line 90, in test_hamming_distance
    out = hamming_distance(data['X'], data['X_train'])
  File "C:\...\content.py", line 28, in hamming_distance
    temp_matrix = np.logical_xor(X[i,1:D1], X_train[j, 1:D1])
  File "C:\...\Anaconda3\lib\site-packages\scipy\sparse\base.py", line 559, in __getattr__
    raise AttributeError(attr + " not found")
AttributeError: logical_xor not found

我无法更改test.py,只能更改content.py。 Test.py应该工作正常,所以我确定我的功能有误。任何帮助,将不胜感激!

修改 我的文件顶部有:

import numpy as np
写numpy而不是np并没有改变任何东西。我收到错误'numpy wasn't defined'

1 个答案:

答案 0 :(得分:2)

这不起作用的原因是因为XX_train都是scipy稀疏矩阵。 Scipy稀疏矩阵尚不支持逻辑运算,尽管这方面的工作是in-progress

当你调用numpy函数时,这个错误出现在scipy而不是numpy的原因是logical_xor是一个numpy ufunc或“universal function”。旨在与numpy ufuncs交互的Python类可以覆盖ufuncs的行为,而scipy稀疏矩阵可以避免调用不支持的操作,这些操作会将数组转换为密集数组并可能占用所有内存。

您需要使用例如X.toarray()将其转换为密集数组。如果它太大而无法放入内存,则应使用daskbcolz这样的包来为您处理内存管理。

编辑:scipy稀疏矩阵不是ndarray的子​​类。