Question

from numpy import genfromtxt, linalg, array, append, hstack, vstack

#Euclidean distance function
def euclidean(v1, v2):
    dist = linalg.norm(v1 - v2)
    return dist

#get the .csv files and eliminate heading and unused columns from test
BMUs = genfromtxt('BMU3.csv', delimiter=',')
data = genfromtxt('test.csv', delimiter=',')
data = data[1:, :-2]

i = 0
for obj in data:
    D = 0
    for BMU in BMUs:
        Dist = append(euclidean(obj, BMU[: -2]), BMU[-2:])
    D = hstack(Dist)

Map = vstack(D)

#iteration counter
i += 1
if not i % 1000:
    print (i, ' of ', len(data))

print (Map)

我想做的是：

从数据中获取对象
计算与BMU的距离（欧几里德（obj，BMU [：-2]）
在BMU阵列的最后两项附近追加距离
创建一个2d矩阵，其中包含所有距离加上数据对象中所有BMU的最后两项（D = hstack（Dist））
创建一个长度等于数据中对象数的矩阵数组。（Map = vstack（D））

这里的问题，或者至少我认为是问题，是hstack和vstack想要输入数组的元组而不是单个数组。这就像我尝试使用它们一样，因为我使用List.append（）作为列表，遗憾的是我是一个初学者，我不知道如何以不同的方式做到这一点。

任何帮助都会很棒，谢谢你提前：）

Answer 1

首先使用说明：

而不是：

from numpy import genfromtxt, linalg, array, append, hstack, vstack

使用

import numpy as np
....
data = np.genfromtxt(....)
....
     np.hstack...

其次，远离np.append。它太容易被误用了。使用np.concatenate，以便充分了解它的作用。

列表append更适合增量工作

alist = []
for ....
    alist.append(....)
arr = np.array(alist)

==================

没有样本数组（或至少是形状）我猜。但是（n，2）阵列听起来很合理。考虑每对“点”之间的距离，我可以在嵌套列表理解中收集值：

In [121]: data = np.arange(6).reshape(3,2)
In [122]: [[euclidean(d,b) for b in data] for d in data]
Out[122]: 
[[0.0, 2.8284271247461903, 5.6568542494923806],
 [2.8284271247461903, 0.0, 2.8284271247461903],
 [5.6568542494923806, 2.8284271247461903, 0.0]]

并使其成为一个数组：

In [123]: np.array([[euclidean(d,b) for b in data] for d in data])
Out[123]: 
array([[ 0.        ,  2.82842712,  5.65685425],
       [ 2.82842712,  0.        ,  2.82842712],
       [ 5.65685425,  2.82842712,  0.        ]])

与嵌套循环的等价物：

alist = []
for d in data:
    sublist=[]
    for b in data:
        sublist.append(euclidean(d,b))
    alist.append(sublist)
arr = np.array(alist)

有些方法可以在没有循环的情况下执行此操作，但让我们确保基本的Python循环方法首先工作。

===============

如果我想要data中的每个元素（行）与bmu中的每个元素（或此处data）之间的差异（沿着最后一个轴），我可以使用数组广播。结果是（3,3,2）数组：

In [130]: data[None,:,:]-data[:,None,:]
Out[130]: 
array([[[ 0,  0],
        [ 2,  2],
        [ 4,  4]],

       [[-2, -2],
        [ 0,  0],
        [ 2,  2]],

       [[-4, -4],
        [-2, -2],
        [ 0,  0]]])

norm可以处理更大的维数组并获取axis参数。

In [132]: np.linalg.norm(data[None,:,:]-data[:,None,:],axis=-1)
Out[132]: 
array([[ 0.        ,  2.82842712,  5.65685425],
       [ 2.82842712,  0.        ,  2.82842712],
       [ 5.65685425,  2.82842712,  0.        ]])

Answer 2

感谢您的帮助，我设法实现了伪代码，这里是最终的程序：

import numpy as np


def euclidean(v1, v2):
    dist = np.linalg.norm(v1 - v2)
    return dist


def makeKNN(dataSet, BMUSet, k, fileOut, test=False):
    # take input files
    BMUs = np.genfromtxt(BMUSet, delimiter=',')
    data = np.genfromtxt(dataSet, delimiter=',')

    final = data[1:, :]
    if test == False:
        data = data[1:, :]
    else:
        data = data[1:, :-2]

# Calculate all the distances between data and BMUs than reorder BMU with the distances information

    dist = np.array([[euclidean(d, b[:-2]) for b in BMUs] for d in data])
    BMU_K = np.array([BMUs[np.argsort(d)] for d in dist])

    # median over the closest k BMU
    Z = np.array([[np.sum(b[:k].T[5]) / k] for b in BMU_K])

    # error propagation
    Z_err = np.array([[np.sqrt(np.sum(np.power(b[:k].T[5], 2)))] for b in BMU_K])

    # Adding z estimates and errors to the data
    final = np.concatenate((final, Z, Z_err), axis=1)

    # print output file
    np.savetxt(fileOut, final, delimiter=',')
    print('So long, and thanks for all the fish')

非常感谢，我希望此代码将来能帮助其他人：）

Python构造一个迭代数组的矩阵

2 个答案: