我的代码计算出我拥有的一组样本中所有点之间的欧式距离。我想知道的是,这通常是最有效的方式,它可以在集合中的所有元素之间执行某些操作,然后对其进行绘制,例如制作一个相关矩阵。
样本索引用于初始化数据帧并提供标签。然后,在three_D_coordinate_tuple_list中将3d坐标作为元组提供,但这可以轻松进行任何测量,然后可变距离可以进行任何操作。我很好奇,想找到一种更有效的解决方案来制作每一列,然后使用pandas或numpy再次合并它们。我的解决方案是否阻塞了所有内存?我该如何使它更清洁?
def euclidean_distance_matrix_maker(three_D_coordinate_tuple_list, index_of_samples):
#list of tuples
#well_id or index as series or list
n=len(three_D_coordinate_tuple_list)
distance_matrix_df=pd.DataFrame(index_of_samples)
for i in range(0, n):
column=[]
#iterates through all elemetns calculates distance vs this element
for j in range(0, n):
distance=euclidean_dist_threeD_for_tuples( three_D_coordinate_tuple_list[i],
three_D_coordinate_tuple_list[j])
column.append(distance)
#adds euclidean distance to a list which overwrites old data frame then
#is appeneded with concat column wise to output matrix
new_column=pd.DataFrame(column)
distance_matrix_df=pd.concat([distance_matrix_df, new_column], axis=1)
distance_matrix_df=distance_matrix_df.set_index(distance_matrix_df.iloc[:,0])
distance_matrix_df=distance_matrix_df.iloc[:,1:]
distance_matrix_df.columns=distance_matrix_df.index
答案 0 :(得分:2)
import numpy as np
x = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
scipy.spatial.distance_matrix
from scipy.spatial import distance_matrix
distance_matrix(x, x)
array([[ 0. , 5.19615242, 10.39230485],
[ 5.19615242, 0. , 5.19615242],
[10.39230485, 5.19615242, 0. ]])
from scipy.spatial.distance import squareform
i, j = np.triu_indices(len(x), 1)
((x[i] - x[j]) ** 2).sum(-1) ** .5
array([ 5.19615242, 10.39230485, 5.19615242])
我们可以用squareform
将其制成正方形
squareform(((x[i] - x[j]) ** 2).sum(-1) ** .5)
array([[ 0. , 5.19615242, 10.39230485],
[ 5.19615242, 0. , 5.19615242],
[10.39230485, 5.19615242, 0. ]])