使用numpy的欧式距离

时间:2019-10-23 12:35:38

标签: python numpy

我正在尝试使用numpy计算两个二进制数据(图像)的欧几里得距离,但结果中却得到nan

def eculideanDistance(features, predict, dist):
    dist += (float(features[0]) - float(predict[0]))
    return math.sqrt(dist)

Output

我正在使用此二进制数据

train_set = {
    0: [
        ["0000000000000111100000000000000000000000000011111110000000000000000000000011111111110000000000000000000111111111111110000000000000000001111111011111100000000000000000111111100000111100000000000000001111111000000011100000000000000011111110000000111100000000000000111111100000000111000000000000001111111000000001110000000000000011111100000000011110000000000000111111000000000011100000000000001111110000000000111000000000000001111110000000000111000000000000011111100000000001110000000000000111111000000000011100000000000001111110000000000111000000000000111111100000000011110000000000001111011000000000111100000000000011110000000000011110000000000000011110000000000011110000000000000111100000000001111100000000000001111000000000111110000000000000011110000000011111000000000000000011100000011111100000000000000000111100011111110000000000000000001111111111111100000000000000000001111111111111000000000000000000011111111111100000000000000000000011111111100000000000000000000000011111000000000000000000000000000011000000000000000000"],
        ["0000000000011111000000000000000000000000001111111000000000000000000000000111111111000000000000000000000011111111111000000000000000000001111111111111000000000000000000111111101111111000000000000000001111110001111111000000000000000011111100001111110000000000000001111111000001111110000000000000011111110000001111100000000000000011111100000001111110000000000001111111000000001111110000000000011111100000000001111100000000000111111000000000011111100000000001111110000000000111111000000000011111100000000000111110000000000111111000000000001111100000000001111110000000000011111000000000011111100000000000111110000000000111111000000000001111100000000000111110000000000011111000000000001111110000000001111110000000000011111100000000111111000000000000011111100000001111111000000000000011111000000111111100000000000000111110000011111110000000000000001111110001111111000000000000000011111111111111100000000000000000111111111111110000000000000000000111111111111000000000000000000000111111111100000000000000000000000111111110000000000000"]
    ],
    1: [
        ["0000000000000000111100000000000000000000000000011111111000000000000000000000000111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000111111111110000000000000000000011111111111110000000000000000111111111111111100000000000000011111111111111110000000000000001111111111111111100000000000000011111111111111111000000000000000001111111111111110000000000000000011111110111111100000000000000000011110001111111000000000000000000000000011111110000000000000000000000000111111000000000000000000000000011111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000001111111100000000000000000000000011111111000000000000000000000000111111110000000000000000000000000011111111000000000000000000000000111111100000000"],
        ["0000000001111100000000000000000000000000001111100000000000000000000000000011111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000011111111000000000000000000000000111111110000000000000000000000001111111000000000000000000000000001111111000000000000000000000000111111110000000000000000000000001111111110000000000000000000000011111111100000000000000000000001111111110000000000000000000000011111111110000000000000000000001111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000000111111111100000000000000000000000111111111000000000000000000000000001111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000001111111000000000000000000001111111111111111111000000000000111111111111111111111000000000001111111111111111111111000000000011111111111111111111110000000000001111111111111111111100000000000001111111111111111111"],
    ]
}

test_set = ["0000000000000000011000000000000000000000000000011111111000000000000000000000011111111111000000000000000000000011111111111000000000000000000001111111111110000000000000000000011111111111100000000000000000011111111111110000000000000000000111111111111100000000000000000001111111111111000000000000000000111111111111110000000000000000111111111111111100000000000000001111111111111111000000000000000001111111111111111000000000000001111111111111111110000000000000111111111111111111100000000000001111111111111111111000000000000001111111111111111110000000000000000010000111111111100000000000000000000001111111110000000000000000000000011111111100000000000000000000000111111111000000000000000000000000111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000011111111111100000000000000000000000111111111100000000000000000000000111111000000000"]

2 个答案:

答案 0 :(得分:0)

您用于欧几里德距离的公式不正确。您将最终计算出负数的平方根,这就是为什么得到NaN的原因。我认为您的意思是:

def euclideanDistance(features, predict, dist):
    diff = (float(features[0]) - float(predict[0]))
    dist += diff * diff 
    return math.sqrt(dist)

(我不确定为什么总是使用索引0,为什么dist变量不仅是返回值,而且是参数,我怀疑这可能也是问题,但我不确定缺乏判断力的背景。)

但是,如果您将图像编码为Numpy数组而不是字符串,Numpy提供了一种直接方法来计算欧几里得范数,如果您进行编码:

a = numpy.array([0,0,1,1])
b = numpy.array([1,0,0,1])
euclidean_norm = numpy.linalg.norm(a-b)

答案 1 :(得分:0)

这不是二进制数据。这是一个存储为字符串的二进制图像,其中像素用0(黑色)或1(白色)表示。

为了使事情变得简单,让我们将数据转换为32 x 32 numpy array并对其进行可视化。

train_set转换为numpy array

train_img = {label: [np.uint8([*sample[0]]).reshape(32, 32) 
    for sample in samples] 
        for label, samples in train_set.items()}

enter image description here

test_set转换为numpy array

test_img = np.uint8([*test_set[0]]).reshape(32, 32)

enter image description here

从这一点出发,使用numpy.linalg.norm使用numpy计算欧几里得距离非常简单。例如:

In [5]: np.linalg.norm(test_img - train_img[0][0])

Out[5]: 2984.7336564591487


In [6]: np.linalg.norm(test_img - train_img[0][1])

Out[6]: 3459.016189612301


In [7]: np.linalg.norm(test_img - train_img[1][0])

Out[7]: 1691.5064291926294


In [8]: np.linalg.norm(test_img - train_img[1][1])

Out[8]: 2650.0669802855928

此答案的完整代码

In [1]: import numpy as np


In [2]: train_set = {

   ...:     0: [

   ...:         ["0000000000000111100000000000000000000000000011111110000000000000000000000011111111110000000000000000000111111111111110000000000000000001111111011111100000000000000000111111100000111100000000000000001111111000000011100000000000000011111110000000111100000000000000111111100000000111000000000000001111111000000001110000000000000011111100000000011110000000000000111111000000000011100000000000001111110000000000111000000000000001111110000000000111000000000000011111100000000001110000000000000111111000000000011100000000000001111110000000000111000000000000111111100000000011110000000000001111011000000000111100000000000011110000000000011110000000000000011110000000000011110000000000000111100000000001111100000000000001111000000000111110000000000000011110000000011111000000000000000011100000011111100000000000000000111100011111110000000000000000001111111111111100000000000000000001111111111111000000000000000000011111111111100000000000000000000011111111100000000000000000000000011111000000000000000000000000000011000000000000000000"],

   ...:         ["0000000000011111000000000000000000000000001111111000000000000000000000000111111111000000000000000000000011111111111000000000000000000001111111111111000000000000000000111111101111111000000000000000001111110001111111000000000000000011111100001111110000000000000001111111000001111110000000000000011111110000001111100000000000000011111100000001111110000000000001111111000000001111110000000000011111100000000001111100000000000111111000000000011111100000000001111110000000000111111000000000011111100000000000111110000000000111111000000000001111100000000001111110000000000011111000000000011111100000000000111110000000000111111000000000001111100000000000111110000000000011111000000000001111110000000001111110000000000011111100000000111111000000000000011111100000001111111000000000000011111000000111111100000000000000111110000011111110000000000000001111110001111111000000000000000011111111111111100000000000000000111111111111110000000000000000000111111111111000000000000000000000111111111100000000000000000000000111111110000000000000"]

   ...:     ],

   ...:     1: [

   ...:         ["0000000000000000111100000000000000000000000000011111111000000000000000000000000111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000111111111110000000000000000000011111111111110000000000000000111111111111111100000000000000011111111111111110000000000000001111111111111111100000000000000011111111111111111000000000000000001111111111111110000000000000000011111110111111100000000000000000011110001111111000000000000000000000000011111110000000000000000000000000111111000000000000000000000000011111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000001111111100000000000000000000000011111111000000000000000000000000111111110000000000000000000000000011111111000000000000000000000000111111100000000"],

   ...:         ["0000000001111100000000000000000000000000001111100000000000000000000000000011111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000011111111000000000000000000000000111111110000000000000000000000001111111000000000000000000000000001111111000000000000000000000000111111110000000000000000000000001111111110000000000000000000000011111111100000000000000000000001111111110000000000000000000000011111111110000000000000000000001111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000000111111111100000000000000000000000111111111000000000000000000000000001111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000001111111000000000000000000001111111111111111111000000000000111111111111111111111000000000001111111111111111111111000000000011111111111111111111110000000000001111111111111111111100000000000001111111111111111111"],

   ...:     ]

   ...: }

   ...: 

   ...: test_set = ["0000000000000000011000000000000000000000000000011111111000000000000000000000011111111111000000000000000000000011111111111000000000000000000001111111111110000000000000000000011111111111100000000000000000011111111111110000000000000000000111111111111100000000000000000001111111111111000000000000000000111111111111110000000000000000111111111111111100000000000000001111111111111111000000000000000001111111111111111000000000000001111111111111111110000000000000111111111111111111100000000000001111111111111111111000000000000001111111111111111110000000000000000010000111111111100000000000000000000001111111110000000000000000000000011111111100000000000000000000000111111111000000000000000000000000111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000011111111111100000000000000000000000111111111100000000000000000000000111111000000000"]

   ...: 


In [3]: train_img = {label: [np.uint8([*sample[0]]).reshape(32, 32) 

   ...:     for sample in samples] 

   ...:         for label, samples in train_set.items()}


In [4]: test_img = np.uint8([*test_set[0]]).reshape(32, 32)


In [5]: np.linalg.norm(test_img - train_img[0][0])

Out[5]: 2984.7336564591487


In [6]: np.linalg.norm(test_img - train_img[0][1])

Out[6]: 3459.016189612301


In [7]: np.linalg.norm(test_img - train_img[1][0])

Out[7]: 1691.5064291926294


In [8]: np.linalg.norm(test_img - train_img[1][1])

Out[8]: 2650.0669802855928