from numpy import genfromtxt, linalg, array, append, hstack, vstack
#Euclidean distance function
def euclidean(v1, v2):
dist = linalg.norm(v1 - v2)
return dist
#get the .csv files and eliminate heading and unused columns from test
BMUs = genfromtxt('BMU3.csv', delimiter=',')
data = genfromtxt('test.csv', delimiter=',')
data = data[1:, :-2]
i = 0
for obj in data:
D = 0
for BMU in BMUs:
Dist = append(euclidean(obj, BMU[: -2]), BMU[-2:])
D = hstack(Dist)
Map = vstack(D)
#iteration counter
i += 1
if not i % 1000:
print (i, ' of ', len(data))
print (Map)
我想做的是:
这里的问题,或者至少我认为是问题,是hstack和vstack想要输入数组的元组而不是单个数组。这就像我尝试使用它们一样,因为我使用List.append()作为列表,遗憾的是我是一个初学者,我不知道如何以不同的方式做到这一点。
任何帮助都会很棒,谢谢你提前:)
答案 0 :(得分:1)
首先使用说明:
而不是:
from numpy import genfromtxt, linalg, array, append, hstack, vstack
使用
import numpy as np
....
data = np.genfromtxt(....)
....
np.hstack...
其次,远离np.append
。它太容易被误用了。使用np.concatenate
,以便充分了解它的作用。
列表append
更适合增量工作
alist = []
for ....
alist.append(....)
arr = np.array(alist)
==================
没有样本数组(或至少是形状)我猜。但是(n,2)阵列听起来很合理。考虑每对“点”之间的距离,我可以在嵌套列表理解中收集值:
In [121]: data = np.arange(6).reshape(3,2)
In [122]: [[euclidean(d,b) for b in data] for d in data]
Out[122]:
[[0.0, 2.8284271247461903, 5.6568542494923806],
[2.8284271247461903, 0.0, 2.8284271247461903],
[5.6568542494923806, 2.8284271247461903, 0.0]]
并使其成为一个数组:
In [123]: np.array([[euclidean(d,b) for b in data] for d in data])
Out[123]:
array([[ 0. , 2.82842712, 5.65685425],
[ 2.82842712, 0. , 2.82842712],
[ 5.65685425, 2.82842712, 0. ]])
与嵌套循环的等价物:
alist = []
for d in data:
sublist=[]
for b in data:
sublist.append(euclidean(d,b))
alist.append(sublist)
arr = np.array(alist)
有些方法可以在没有循环的情况下执行此操作,但让我们确保基本的Python循环方法首先工作。
===============
如果我想要data
中的每个元素(行)与bmu
中的每个元素(或此处data
)之间的差异(沿着最后一个轴),我可以使用数组广播。结果是(3,3,2)数组:
In [130]: data[None,:,:]-data[:,None,:]
Out[130]:
array([[[ 0, 0],
[ 2, 2],
[ 4, 4]],
[[-2, -2],
[ 0, 0],
[ 2, 2]],
[[-4, -4],
[-2, -2],
[ 0, 0]]])
norm
可以处理更大的维数组并获取axis
参数。
In [132]: np.linalg.norm(data[None,:,:]-data[:,None,:],axis=-1)
Out[132]:
array([[ 0. , 2.82842712, 5.65685425],
[ 2.82842712, 0. , 2.82842712],
[ 5.65685425, 2.82842712, 0. ]])
答案 1 :(得分:0)
感谢您的帮助,我设法实现了伪代码,这里是最终的程序:
import numpy as np
def euclidean(v1, v2):
dist = np.linalg.norm(v1 - v2)
return dist
def makeKNN(dataSet, BMUSet, k, fileOut, test=False):
# take input files
BMUs = np.genfromtxt(BMUSet, delimiter=',')
data = np.genfromtxt(dataSet, delimiter=',')
final = data[1:, :]
if test == False:
data = data[1:, :]
else:
data = data[1:, :-2]
# Calculate all the distances between data and BMUs than reorder BMU with the distances information
dist = np.array([[euclidean(d, b[:-2]) for b in BMUs] for d in data])
BMU_K = np.array([BMUs[np.argsort(d)] for d in dist])
# median over the closest k BMU
Z = np.array([[np.sum(b[:k].T[5]) / k] for b in BMU_K])
# error propagation
Z_err = np.array([[np.sqrt(np.sum(np.power(b[:k].T[5], 2)))] for b in BMU_K])
# Adding z estimates and errors to the data
final = np.concatenate((final, Z, Z_err), axis=1)
# print output file
np.savetxt(fileOut, final, delimiter=',')
print('So long, and thanks for all the fish')
非常感谢,我希望此代码将来能帮助其他人:)