我编写了一个例程,将点数据插入到常规网格中。但是,我发现scipy
的最近邻插值的实现速度几乎是径向基函数I的两倍,用于线性插值(scipy.interpolate.Rbf
)
相关代码包括如何构造插值器
if interpolation_mode == 'linear':
interpolator = scipy.interpolate.Rbf(
point_array[:, 0], point_array[:, 1], value_array,
function='linear', smooth=.01)
elif interpolation_mode == 'nearest':
interpolator = scipy.interpolate.NearestNDInterpolator(
point_array, value_array)
当插值被称为
时result = interpolator(col_coords.ravel(), row_coords.ravel())
运行的样本I具有27个输入插值值点,并且我在近20000 X 20000网格中插值。 (我在内存块大小中这样做,所以我不会爆炸计算机。)
以下是两个cProfile
我在相关代码上运行的结果。请注意,最近邻居方案在406秒内运行,而线性方案在256秒内运行。最近的方案主要是对scipy kdTree
的调用,这似乎是合理的,除了rbf
在很长一段时间内表现优于它。任何想法为什么或我能做些什么来使我最近的方案运行得比线性更快?
线性投放:
25362 function calls in 225.886 seconds
Ordered by: internal time
List reduced from 328 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
253 169.302 0.669 207.516 0.820 C:\Python27\lib\site-packages\scipy\interpolate\rbf.py:112(
_euclidean_norm)
258 38.211 0.148 38.211 0.148 {method 'reduce' of 'numpy.ufunc' objects}
252 6.069 0.024 6.069 0.024 {numpy.core._dotblas.dot}
1 5.077 5.077 225.332 225.332 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post2
8+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:333(interpolate_points_uri)
252 1.849 0.007 2.137 0.008 C:\Python27\lib\site-packages\numpy\lib\function_base.py:32
85(meshgrid)
507 1.419 0.003 1.419 0.003 {method 'flatten' of 'numpy.ndarray' objects}
1268 1.368 0.001 1.368 0.001 {numpy.core.multiarray.array}
252 1.018 0.004 1.018 0.004 {_gdal_array.BandRasterIONumPy}
1 0.533 0.533 225.886 225.886 pygeoprocessing\tests\helper_driver.py:10(interpolate)
252 0.336 0.001 216.716 0.860 C:\Python27\lib\site-packages\scipy\interpolate\rbf.py:225(
__call__)
最近邻居:
27539 function calls in 405.624 seconds
Ordered by: internal time
List reduced from 309 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
252 397.806 1.579 397.822 1.579 {method 'query' of 'ckdtree.cKDTree' objects}
252 1.875 0.007 1.881 0.007 {scipy.interpolate.interpnd._ndim_coords_from_arrays}
252 1.831 0.007 2.101 0.008 C:\Python27\lib\site-packages\numpy\lib\function_base.py:3285(meshgrid)
252 1.034 0.004 400.739 1.590 C:\Python27\lib\site-packages\scipy\interpolate\ndgriddata.py:60(__call__)
1 1.009 1.009 405.030 405.030 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post28+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:333(interpolate_points_uri)
252 0.719 0.003 0.719 0.003 {_gdal_array.BandRasterIONumPy}
1 0.509 0.509 405.624 405.624 pygeoprocessing\tests\helper_driver.py:10(interpolate)
252 0.261 0.001 0.261 0.001 {numpy.core.multiarray.copyto}
27 0.125 0.005 0.125 0.005 {_ogr.Layer_CreateFeature}
1 0.116 0.116 0.254 0.254 C:\Python27\lib\site-packages\pygeoprocessing-0.3.0a8.post28+n5b1ee2de0d07-py2.7-win32.egg\pygeoprocessing\geoprocessing.py:362(_parse_point_data)
作为参考,我还包括这两个测试用例的视觉结果。
最近
线性
答案 0 :(得分:2)
在griddata
doc:
In [47]: def func(x, y):
return x*(1-x)*np.cos(4*np.pi*x) * np.sin(4*np.pi*y**2)**2
....:
In [48]: points = np.random.rand(1000, 2)
In [49]: values = func(points[:,0], points[:,1])
In [50]: grid_x, grid_y = np.mgrid[0:1:100j, 0:1:200j]
因此我们有1000个散点,并将插入20,000个。
In [52]: timeit interpolate.griddata(points, values, (grid_x, grid_y),
method='nearest')
10 loops, best of 3: 83.6 ms per loop
In [53]: timeit interpolate.griddata(points, values, (grid_x, grid_y),
method='linear')
1 loops, best of 3: 24.6 ms per loop
In [54]: timeit interpolate.griddata(points, values, (grid_x, grid_y),
method='cubic')
10 loops, best of 3: 42.7 ms per loop
和2级插值器:
In [55]: %%timeit
rbfi = interpolate.Rbf(points[:,0],points[:,1],values)
dl = rbfi(grid_x.ravel(),grid_y.ravel())
....:
1 loops, best of 3: 3.89 s per loop
In [56]: %%timeit
ndi=interpolate.NearestNDInterpolator(points, values)
dl=ndi(grid_x.ravel(),grid_y.ravel())
....:
10 loops, best of 3: 82.6 ms per loop
In [57]: %%timeit
ldi=interpolate.LinearNDInterpolator(points, values)
dl=ldi(grid_x.ravel(),grid_y.ravel())
....
10 loops, best of 3: 25.1 ms per loop
griddata
实际上是针对最后两个版本的一步封面调用。
griddata
将其方法描述为:
nearest
return the value at the data point closest to the point of
interpolation. See NearestNDInterpolator for more details.
Uses scipy.spatial.cKDTree
linear
tesselate the input point set to n-dimensional simplices,
and interpolate linearly on each simplex.
LinearNDInterpolator details are:
The interpolant is constructed by triangulating the
input data with Qhull [R37], and on each triangle
performing linear barycentric interpolation.
cubic (2-D)
return the value determined from a piecewise cubic, continuously
differentiable (C1), and approximately curvature-minimizing
polynomial surface. See CloughTocher2DInterpolator for more details.
对2阶段版本的进一步测试表明,设置最近的cKTtree非常快;大部分时间都花在第二个插值状态。
另一方面,设置三角形曲面需要比线性插值更长的时间。
我不太了解Rbf方法,说明为什么这么慢。潜在的方法是如此不同,以至于用简单的手工插值方法开发的直觉并没有多大意义。
您的示例以较少的散点开始,并在更精细的网格上进行插值。