我正在尝试使用numpy计算两个二进制数据(图像)的欧几里得距离,但结果中却得到nan
def eculideanDistance(features, predict, dist):
dist += (float(features[0]) - float(predict[0]))
return math.sqrt(dist)
我正在使用此二进制数据
train_set = {
0: [
["0000000000000111100000000000000000000000000011111110000000000000000000000011111111110000000000000000000111111111111110000000000000000001111111011111100000000000000000111111100000111100000000000000001111111000000011100000000000000011111110000000111100000000000000111111100000000111000000000000001111111000000001110000000000000011111100000000011110000000000000111111000000000011100000000000001111110000000000111000000000000001111110000000000111000000000000011111100000000001110000000000000111111000000000011100000000000001111110000000000111000000000000111111100000000011110000000000001111011000000000111100000000000011110000000000011110000000000000011110000000000011110000000000000111100000000001111100000000000001111000000000111110000000000000011110000000011111000000000000000011100000011111100000000000000000111100011111110000000000000000001111111111111100000000000000000001111111111111000000000000000000011111111111100000000000000000000011111111100000000000000000000000011111000000000000000000000000000011000000000000000000"],
["0000000000011111000000000000000000000000001111111000000000000000000000000111111111000000000000000000000011111111111000000000000000000001111111111111000000000000000000111111101111111000000000000000001111110001111111000000000000000011111100001111110000000000000001111111000001111110000000000000011111110000001111100000000000000011111100000001111110000000000001111111000000001111110000000000011111100000000001111100000000000111111000000000011111100000000001111110000000000111111000000000011111100000000000111110000000000111111000000000001111100000000001111110000000000011111000000000011111100000000000111110000000000111111000000000001111100000000000111110000000000011111000000000001111110000000001111110000000000011111100000000111111000000000000011111100000001111111000000000000011111000000111111100000000000000111110000011111110000000000000001111110001111111000000000000000011111111111111100000000000000000111111111111110000000000000000000111111111111000000000000000000000111111111100000000000000000000000111111110000000000000"]
],
1: [
["0000000000000000111100000000000000000000000000011111111000000000000000000000000111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000111111111110000000000000000000011111111111110000000000000000111111111111111100000000000000011111111111111110000000000000001111111111111111100000000000000011111111111111111000000000000000001111111111111110000000000000000011111110111111100000000000000000011110001111111000000000000000000000000011111110000000000000000000000000111111000000000000000000000000011111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000001111111100000000000000000000000011111111000000000000000000000000111111110000000000000000000000000011111111000000000000000000000000111111100000000"],
["0000000001111100000000000000000000000000001111100000000000000000000000000011111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000011111111000000000000000000000000111111110000000000000000000000001111111000000000000000000000000001111111000000000000000000000000111111110000000000000000000000001111111110000000000000000000000011111111100000000000000000000001111111110000000000000000000000011111111110000000000000000000001111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000000111111111100000000000000000000000111111111000000000000000000000000001111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000001111111000000000000000000001111111111111111111000000000000111111111111111111111000000000001111111111111111111111000000000011111111111111111111110000000000001111111111111111111100000000000001111111111111111111"],
]
}
test_set = ["0000000000000000011000000000000000000000000000011111111000000000000000000000011111111111000000000000000000000011111111111000000000000000000001111111111110000000000000000000011111111111100000000000000000011111111111110000000000000000000111111111111100000000000000000001111111111111000000000000000000111111111111110000000000000000111111111111111100000000000000001111111111111111000000000000000001111111111111111000000000000001111111111111111110000000000000111111111111111111100000000000001111111111111111111000000000000001111111111111111110000000000000000010000111111111100000000000000000000001111111110000000000000000000000011111111100000000000000000000000111111111000000000000000000000000111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000011111111111100000000000000000000000111111111100000000000000000000000111111000000000"]
答案 0 :(得分:0)
您用于欧几里德距离的公式不正确。您将最终计算出负数的平方根,这就是为什么得到NaN
的原因。我认为您的意思是:
def euclideanDistance(features, predict, dist):
diff = (float(features[0]) - float(predict[0]))
dist += diff * diff
return math.sqrt(dist)
(我不确定为什么总是使用索引0
,为什么dist
变量不仅是返回值,而且是参数,我怀疑这可能也是问题,但我不确定缺乏判断力的背景。)
但是,如果您将图像编码为Numpy数组而不是字符串,Numpy提供了一种直接方法来计算欧几里得范数,如果您进行编码:
a = numpy.array([0,0,1,1])
b = numpy.array([1,0,0,1])
euclidean_norm = numpy.linalg.norm(a-b)
答案 1 :(得分:0)
这不是二进制数据。这是一个存储为字符串的二进制图像,其中像素用0
(黑色)或1
(白色)表示。
为了使事情变得简单,让我们将数据转换为32 x 32 numpy array
并对其进行可视化。
train_set
转换为numpy array
train_img = {label: [np.uint8([*sample[0]]).reshape(32, 32)
for sample in samples]
for label, samples in train_set.items()}
test_set
转换为numpy array
test_img = np.uint8([*test_set[0]]).reshape(32, 32)
从这一点出发,使用numpy.linalg.norm
使用numpy
计算欧几里得距离非常简单。例如:
In [5]: np.linalg.norm(test_img - train_img[0][0])
Out[5]: 2984.7336564591487
In [6]: np.linalg.norm(test_img - train_img[0][1])
Out[6]: 3459.016189612301
In [7]: np.linalg.norm(test_img - train_img[1][0])
Out[7]: 1691.5064291926294
In [8]: np.linalg.norm(test_img - train_img[1][1])
Out[8]: 2650.0669802855928
In [1]: import numpy as np
In [2]: train_set = {
...: 0: [
...: ["0000000000000111100000000000000000000000000011111110000000000000000000000011111111110000000000000000000111111111111110000000000000000001111111011111100000000000000000111111100000111100000000000000001111111000000011100000000000000011111110000000111100000000000000111111100000000111000000000000001111111000000001110000000000000011111100000000011110000000000000111111000000000011100000000000001111110000000000111000000000000001111110000000000111000000000000011111100000000001110000000000000111111000000000011100000000000001111110000000000111000000000000111111100000000011110000000000001111011000000000111100000000000011110000000000011110000000000000011110000000000011110000000000000111100000000001111100000000000001111000000000111110000000000000011110000000011111000000000000000011100000011111100000000000000000111100011111110000000000000000001111111111111100000000000000000001111111111111000000000000000000011111111111100000000000000000000011111111100000000000000000000000011111000000000000000000000000000011000000000000000000"],
...: ["0000000000011111000000000000000000000000001111111000000000000000000000000111111111000000000000000000000011111111111000000000000000000001111111111111000000000000000000111111101111111000000000000000001111110001111111000000000000000011111100001111110000000000000001111111000001111110000000000000011111110000001111100000000000000011111100000001111110000000000001111111000000001111110000000000011111100000000001111100000000000111111000000000011111100000000001111110000000000111111000000000011111100000000000111110000000000111111000000000001111100000000001111110000000000011111000000000011111100000000000111110000000000111111000000000001111100000000000111110000000000011111000000000001111110000000001111110000000000011111100000000111111000000000000011111100000001111111000000000000011111000000111111100000000000000111110000011111110000000000000001111110001111111000000000000000011111111111111100000000000000000111111111111110000000000000000000111111111111000000000000000000000111111111100000000000000000000000111111110000000000000"]
...: ],
...: 1: [
...: ["0000000000000000111100000000000000000000000000011111111000000000000000000000000111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000111111111110000000000000000000011111111111110000000000000000111111111111111100000000000000011111111111111110000000000000001111111111111111100000000000000011111111111111111000000000000000001111111111111110000000000000000011111110111111100000000000000000011110001111111000000000000000000000000011111110000000000000000000000000111111000000000000000000000000011111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000001111111100000000000000000000000011111111000000000000000000000000111111110000000000000000000000000011111111000000000000000000000000111111100000000"],
...: ["0000000001111100000000000000000000000000001111100000000000000000000000000011111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000011111111000000000000000000000000111111110000000000000000000000001111111000000000000000000000000001111111000000000000000000000000111111110000000000000000000000001111111110000000000000000000000011111111100000000000000000000001111111110000000000000000000000011111111110000000000000000000001111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000000111111111100000000000000000000000111111111000000000000000000000000001111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000001111111000000000000000000001111111111111111111000000000000111111111111111111111000000000001111111111111111111111000000000011111111111111111111110000000000001111111111111111111100000000000001111111111111111111"],
...: ]
...: }
...:
...: test_set = ["0000000000000000011000000000000000000000000000011111111000000000000000000000011111111111000000000000000000000011111111111000000000000000000001111111111110000000000000000000011111111111100000000000000000011111111111110000000000000000000111111111111100000000000000000001111111111111000000000000000000111111111111110000000000000000111111111111111100000000000000001111111111111111000000000000000001111111111111111000000000000001111111111111111110000000000000111111111111111111100000000000001111111111111111111000000000000001111111111111111110000000000000000010000111111111100000000000000000000001111111110000000000000000000000011111111100000000000000000000000111111111000000000000000000000000111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000011111111111100000000000000000000000111111111100000000000000000000000111111000000000"]
...:
In [3]: train_img = {label: [np.uint8([*sample[0]]).reshape(32, 32)
...: for sample in samples]
...: for label, samples in train_set.items()}
In [4]: test_img = np.uint8([*test_set[0]]).reshape(32, 32)
In [5]: np.linalg.norm(test_img - train_img[0][0])
Out[5]: 2984.7336564591487
In [6]: np.linalg.norm(test_img - train_img[0][1])
Out[6]: 3459.016189612301
In [7]: np.linalg.norm(test_img - train_img[1][0])
Out[7]: 1691.5064291926294
In [8]: np.linalg.norm(test_img - train_img[1][1])
Out[8]: 2650.0669802855928