为什么scipy线性插值比最近邻插值运行得更快?

时间:2015-09-29 20:25:09

标签: python scipy interpolation

我编写了一个例程,将点数据插入到常规网格中。但是,我发现scipy的最近邻插值的实现速度几乎是径向基函数I的两倍,用于线性插值(scipy.interpolate.Rbf

相关代码包括如何构造插值器

if interpolation_mode == 'linear':
    interpolator = scipy.interpolate.Rbf(
        point_array[:, 0], point_array[:, 1], value_array,
        function='linear', smooth=.01)
elif interpolation_mode == 'nearest':
    interpolator = scipy.interpolate.NearestNDInterpolator(
        point_array, value_array)

当插值被称为

result = interpolator(col_coords.ravel(), row_coords.ravel())

运行的样本I具有27个输入插值值点,并且我在近20000 X 20000网格中插值。 (我在内存块大小中这样做,所以我不会爆炸计算机。)

以下是两个cProfile我在相关代码上运行的结果。请注意,最近邻居方案在406秒内运行,而线性方案在256秒内运行。最近的方案主要是对scipy kdTree的调用,这似乎是合理的,除了rbf在很长一段时间内表现优于它。任何想法为什么或我能做些什么来使我最近的方案运行得比线性更快?

线性投放:

     25362 function calls in 225.886 seconds

   Ordered by: internal time
   List reduced from 328 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      253  169.302    0.669  207.516    0.820 C:\Python27\lib\site-packages\scipy\interpolate\rbf.py:112(
_euclidean_norm)
      258   38.211    0.148   38.211    0.148 {method 'reduce' of 'numpy.ufunc' objects}
      252    6.069    0.024    6.069    0.024 {numpy.core._dotblas.dot}
        1    5.077    5.077  225.332  225.332 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post2
8+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:333(interpolate_points_uri)
      252    1.849    0.007    2.137    0.008 C:\Python27\lib\site-packages\numpy\lib\function_base.py:32
85(meshgrid)
      507    1.419    0.003    1.419    0.003 {method 'flatten' of 'numpy.ndarray' objects}
     1268    1.368    0.001    1.368    0.001 {numpy.core.multiarray.array}
      252    1.018    0.004    1.018    0.004 {_gdal_array.BandRasterIONumPy}
        1    0.533    0.533  225.886  225.886 pygeoprocessing\tests\helper_driver.py:10(interpolate)
      252    0.336    0.001  216.716    0.860 C:\Python27\lib\site-packages\scipy\interpolate\rbf.py:225(
__call__)

最近邻居:

     27539 function calls in 405.624 seconds

   Ordered by: internal time
   List reduced from 309 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      252  397.806    1.579  397.822    1.579 {method 'query' of 'ckdtree.cKDTree' objects}
      252    1.875    0.007    1.881    0.007 {scipy.interpolate.interpnd._ndim_coords_from_arrays}
      252    1.831    0.007    2.101    0.008 C:\Python27\lib\site-packages\numpy\lib\function_base.py:3285(meshgrid)
      252    1.034    0.004  400.739    1.590 C:\Python27\lib\site-packages\scipy\interpolate\ndgriddata.py:60(__call__)
        1    1.009    1.009  405.030  405.030 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post28+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:333(interpolate_points_uri)
      252    0.719    0.003    0.719    0.003 {_gdal_array.BandRasterIONumPy}
        1    0.509    0.509  405.624  405.624 pygeoprocessing\tests\helper_driver.py:10(interpolate)
      252    0.261    0.001    0.261    0.001 {numpy.core.multiarray.copyto}
       27    0.125    0.005    0.125    0.005 {_ogr.Layer_CreateFeature}
        1    0.116    0.116    0.254    0.254 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post28+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:362(_parse_point_data)

作为参考,我还包括这两个测试用例的视觉结果。

最近

Nearest

线性

Linear

1 个答案:

答案 0 :(得分:2)

griddata doc:

中运行示例
In [47]: def func(x, y):
          return x*(1-x)*np.cos(4*np.pi*x) * np.sin(4*np.pi*y**2)**2
   ....: 
In [48]: points = np.random.rand(1000, 2)
In [49]: values = func(points[:,0], points[:,1])
In [50]: grid_x, grid_y = np.mgrid[0:1:100j, 0:1:200j]

因此我们有1000个散点,并将插入20,000个。

In [52]: timeit interpolate.griddata(points, values, (grid_x, grid_y),
    method='nearest')
10 loops, best of 3: 83.6 ms per loop

In [53]: timeit interpolate.griddata(points, values, (grid_x, grid_y),
    method='linear')
1 loops, best of 3: 24.6 ms per loop

In [54]: timeit interpolate.griddata(points, values, (grid_x, grid_y), 
    method='cubic')
10 loops, best of 3: 42.7 ms per loop

和2级插值器:

In [55]: %%timeit 
rbfi = interpolate.Rbf(points[:,0],points[:,1],values)
dl = rbfi(grid_x.ravel(),grid_y.ravel())
   ....: 
1 loops, best of 3: 3.89 s per loop

In [56]: %%timeit 
ndi=interpolate.NearestNDInterpolator(points, values)
dl=ndi(grid_x.ravel(),grid_y.ravel())
   ....: 
10 loops, best of 3: 82.6 ms per loop

In [57]: %%timeit 
ldi=interpolate.LinearNDInterpolator(points, values)
dl=ldi(grid_x.ravel(),grid_y.ravel())
 ....
10 loops, best of 3: 25.1 ms per loop

griddata实际上是针对最后两个版本的一步封面调用。

griddata将其方法描述为:

nearest
return the value at the data point closest to the point of
   interpolation. See NearestNDInterpolator for more details.
   Uses scipy.spatial.cKDTree

linear
tesselate the input point set to n-dimensional simplices, 
   and interpolate linearly on each simplex. 
   LinearNDInterpolator details are:
      The interpolant is constructed by triangulating the 
      input data with Qhull [R37], and on each triangle 
      performing linear barycentric interpolation.

cubic (2-D)
return the value determined from a piecewise cubic, continuously 
   differentiable (C1), and approximately curvature-minimizing 
   polynomial surface. See CloughTocher2DInterpolator for more details.

对2阶段版本的进一步测试表明,设置最近的cKTtree非常快;大部分时间都花在第二个插值状态。

另一方面,设置三角形曲面需要比线性插值更长的时间。

我不太了解Rbf方法,说明为什么这么慢。潜在的方法是如此不同,以至于用简单的手工插值方法开发的直觉并没有多大意义。

您的示例以较少的散点开始,并在更精细的网格上进行插值。