我有一个用于计算成对距离和数据残差(X,Y,Z)的代码。数据非常大(平均7000行),所以我的兴趣是代码效率。我的初始代码是
import tkinter as tk
from tkinter import filedialog
import pandas as pd
import, numpy as np
from scipy.spatial.distance import pdist, squareform
root = tk.Tk()
root.withdraw()
file_path = filedialog.askopenfilename()
data = pd.read_excel(file_path)
data = np.array(data, dtype=np.float)
npoints, cols = data.shape
pwdistance = np.zeros((npoints, npoints))
pwresidual = np.zeros((npoints, npoints))
for i in range(npoints):
for j in range(npoints):
pwdistance[i][j] = np.sqrt((data[:,0][i]-data[:,0][j])**2 + (data[:,1][i]-data[:,1][j])**2)
pwresidual[i][j] = (data[:,2][i]-data[:,2][j])**2
使用pwdistance
,我将其更改为以下内容,效果非常好。
pwdistance = squareform(pdist(data[:,:2]))
是否有一种计算我的pwresidual
的pythonic方法,所以我不需要使用循环并使我的代码运行得更快?
答案 0 :(得分:1)
一种方法是扩展data
的第二列切片的维度,以形成2D
数组,并从中减去1D
切片本身。这些减法将按照broadcasting
的规则以矢量化方式执行。
因此,只需做 -
pwresidual = (data[:,2,None] - data[:,2])**2
分步运行 -
In [132]: data[:,2,None].shape # Slice extended to a 2D array
Out[132]: (4, 1)
In [133]: data[:,2].shape # Slice as 1D array
Out[133]: (4,)
In [134]: data[:,2,None] - data[:,2] # Subtractions with broadcasting
Out[134]:
array([[ 0. , 0.67791602, 0.13298141, 0.61579315],
[-0.67791602, 0. , -0.54493461, -0.06212288],
[-0.13298141, 0.54493461, 0. , 0.48281174],
[-0.61579315, 0.06212288, -0.48281174, 0. ]])
In [137]: (data[:,2,None] - data[:,2]).shape # Verify output shape
Out[137]: (4, 4)
In [138]: (data[:,2,None] - data[:,2])**2 # Finally elementwise square
Out[138]:
array([[ 0. , 0.45957013, 0.01768406, 0.3792012 ],
[ 0.45957013, 0. , 0.29695373, 0.00385925],
[ 0.01768406, 0.29695373, 0. , 0.23310717],
[ 0.3792012 , 0.00385925, 0.23310717, 0. ]])