为了创建数据密度图像,我将参考Matlab代码datadensity.m中提出的计算。 它似乎比我发现的任何[python代码] [1]简单得多。 但是,计算数据点需要花费相当长的时间。有什么方法可以加快速度吗?是否有更有效的方法使用python语法和/或加速for循环?我的x和y数据有数千个数据点。
这是我的代码:
# create random data
df_density = pd.DataFrame(np.random.randn(100000, 2), columns=list('xy'))
# width, height - dimensions of the density plot
width = 256
height = 256
# minimum and maximum of the input data
limits_1 = min(df_density.x)
limits_2 = max(df_density.x)
limits_3 = min(df_density.y)
limits_4 = max(df_density.y)
# resolution
deltax = (limits_2 - limits_1) / width
deltay = (limits_4 - limits_3) / height
# amount of smear, defaults to size of pixel diagonal
fudge = math.sqrt(deltax**2 + deltay**2)
dmap = np.zeros((height, width))
for ii in range(height-1):
yi = limits_3 + ii * deltay + deltay/2
for jj in range(width-1):
xi = limits_1 + jj * deltax + deltax/2
dd = 0
for kk in range(len(df_density)):
dist2 = (df_density.x[kk] - xi)**2 + (df_density.y[kk] - yi)**2
dd = dd + 1 / (dist2 + fudge)
dmap[ii,jj] = dd
[1]:e.g。 Efficient method of calculating density of irregularly spaced points
答案 0 :(得分:1)
首先,您应该使用范围(宽度)和范围(高度),而不是范围(宽度-1)和范围(高度-1)。这是因为Matlab包含范围的最后一个元素,而python则没有。
至于表现,你可以做很多事情。
首先,不要使用python内置min
和max
函数。由于您使用的是pandas,请使用pandas版本:
limits_1 = min(df_density.x)
limits_2 = max(df_density.x)
limits_3 = min(df_density.y)
limits_4 = max(df_density.y)
# resolution
deltax = (limits_2 - limits_1) / width
deltay = (limits_4 - limits_3) / height
# amount of smear, defaults to size of pixel diagonal
fudge = np.sqrt(deltax**2 + deltay**2)
执行时间:34.5毫秒
# minimum and maximum of the input data
mins = df_density.min()
maxs = df_density.max()
# resolution
deltas = maxs-mins
deltax = deltas.x/width
deltay = deltas.y/height
# amount of smear, defaults to size of pixel diagonal
fudge = math.sqrt(deltax**2 + deltay**2)
执行时间:1.96 ms
您还可以对计算进行矢量化:
for ii in range(height-1):
yi = limits_3 + ii * deltay + deltay/2
for jj in range(width-1):
xi = limits_1 + jj * deltax + deltax/2
执行时间:47.1 ms
%%timeit
yis = limits_3 + np.arange(height-1)*deltay + deltay/2
xis = limits_1 + np.arange(width-1)*deltax + deltax/2
执行时间:20.3μs(这是超过2000加速的因素)。
通过确保大多数计算都发生在标量上,使其更快:
%%timeit
yis = limits_3+deltay/2+deltay*np.arange(height-1)
xis = limits_1+deltax/2+deltax*np.arange(height-1)
执行时间:14.2μs
然而,无论你做什么,最后的循环都会变慢,因为如果你试图对整个事物进行矢量化,你可能会耗尽内存。它可以是部分矢量化的,这可以大大加快速度(虽然它仍然需要几分钟才能运行):
# create random data
df_density = pd.DataFrame(np.random.randn(100000, 2), columns=list('xy'))
# width, height - dimensions of the density plot
width = 256
height = 256
# minimum and maximum of the input data
df_max = df_density.max()
df_min = df_density.min()
x_min = df_min.x
y_min = df_min.y
# resolution
deltas = df_max-df_min
deltax = deltas.x/width
deltay = deltas.y/height
# amount of smear, defaults to size of pixel diagonal
fudge = np.sqrt(deltax**2 + deltay**2)
dmap = np.zeros((height, width))
yis = y_min+deltay/2+deltay*np.arange(height)
xis = x_min+deltax/2+deltax*np.arange(width)
yiss, xiss = np.meshgrid(xis, yis)
for x, y in df_density.values:
dmap+=1./(fudge+(x-xiss)**2+(y-yiss)**2)