如何加快python中数据密度的计算(遵循Matlab datadensity.m)

时间:2015-02-26 11:00:11

标签: python matlab

为了创建数据密度图像,我将参考Matlab代码datadensity.m中提出的计算。 它似乎比我发现的任何[python代码] [1]简单得多。 但是,计算数据点需要花费相当长的时间。有什么方法可以加快速度吗?是否有更有效的方法使用python语法和/或加速for循环?我的x和y数据有数千个数据点。

这是我的代码:

# create random data
df_density = pd.DataFrame(np.random.randn(100000, 2), columns=list('xy'))

# width, height - dimensions of the density plot
width = 256
height = 256
# minimum and maximum of the input data
limits_1 = min(df_density.x)
limits_2 = max(df_density.x)
limits_3 = min(df_density.y)
limits_4 = max(df_density.y)
# resolution
deltax = (limits_2 - limits_1) / width
deltay = (limits_4 - limits_3) / height
# amount of smear, defaults to size of pixel diagonal
fudge = math.sqrt(deltax**2 + deltay**2)

dmap = np.zeros((height, width))
for ii in range(height-1):
    yi = limits_3 + ii * deltay + deltay/2
    for jj in range(width-1):
        xi = limits_1 + jj * deltax + deltax/2
        dd = 0
        for kk in range(len(df_density)):
            dist2 = (df_density.x[kk] - xi)**2 + (df_density.y[kk] - yi)**2
            dd = dd + 1 / (dist2 + fudge)               
        dmap[ii,jj] = dd

[1]:e.g。 Efficient method of calculating density of irregularly spaced points

1 个答案:

答案 0 :(得分:1)

首先,您应该使用范围(宽度)和范围(高度),而不是范围(宽度-1)和范围(高度-1)。这是因为Matlab包含范围的最后一个元素,而python则没有。

至于表现,你可以做很多事情。

首先,不要使用python内置minmax函数。由于您使用的是pandas,请使用pandas版本:

limits_1 = min(df_density.x)
limits_2 = max(df_density.x)
limits_3 = min(df_density.y)
limits_4 = max(df_density.y)
# resolution
deltax = (limits_2 - limits_1) / width
deltay = (limits_4 - limits_3) / height
# amount of smear, defaults to size of pixel diagonal
fudge = np.sqrt(deltax**2 + deltay**2)

执行时间:34.5毫秒

# minimum and maximum of the input data
mins = df_density.min()
maxs = df_density.max()
# resolution
deltas = maxs-mins
deltax = deltas.x/width
deltay = deltas.y/height
# amount of smear, defaults to size of pixel diagonal
fudge = math.sqrt(deltax**2 + deltay**2)

执行时间:1.96 ms

您还可以对计算进行矢量化:

for ii in range(height-1):
    yi = limits_3 + ii * deltay + deltay/2
    for jj in range(width-1):
        xi = limits_1 + jj * deltax + deltax/2

执行时间:47.1 ms

%%timeit
yis = limits_3 + np.arange(height-1)*deltay + deltay/2
xis = limits_1 + np.arange(width-1)*deltax + deltax/2

执行时间:20.3μs(这是超过2000加速的因素)。

通过确保大多数计算都发生在标量上,使其更快:

%%timeit
yis = limits_3+deltay/2+deltay*np.arange(height-1)
xis = limits_1+deltax/2+deltax*np.arange(height-1)

执行时间:14.2μs

然而,无论你做什么,最后的循环都会变慢,因为如果你试图对整个事物进行矢量化,你可能会耗尽内存。它可以是部分矢量化的,这可以大大加快速度(虽然它仍然需要几分钟才能运行):

# create random data
df_density = pd.DataFrame(np.random.randn(100000, 2), columns=list('xy'))

# width, height - dimensions of the density plot
width = 256
height = 256

# minimum and maximum of the input data
df_max = df_density.max()
df_min = df_density.min()
x_min = df_min.x
y_min = df_min.y

# resolution
deltas = df_max-df_min
deltax = deltas.x/width
deltay = deltas.y/height
# amount of smear, defaults to size of pixel diagonal
fudge = np.sqrt(deltax**2 + deltay**2)

dmap = np.zeros((height, width))
yis = y_min+deltay/2+deltay*np.arange(height)
xis = x_min+deltax/2+deltax*np.arange(width)

yiss, xiss = np.meshgrid(xis, yis)
for x, y in df_density.values:
    dmap+=1./(fudge+(x-xiss)**2+(y-yiss)**2)