使用匀称

Question

我正在处理2D地理数据。我有很长的轮廓路径列表。现在我想确定我所在域中的每个点，它内部有多少个轮廓（即我想计算轮廓所代表的特征的空间频率分布）。

为了说明我想做什么，这是第一个非常天真的实现：

Private Declare PtrSafe Function URLDownloadToFile Lib "urlmon" _
    Alias "URLDownloadToFileA" (ByVal pCaller As Long, _
    ByVal szURL As String, ByVal szFileName As String, _
    ByVal dwReserved As Long, ByVal lpfnCB As Long) As Long
    Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)

sub ftp()
URLDownloadToFile 0, _
        "sftp://username:P)O(i4b6@sftp.st.gov/internet/in/201711_80.1035", _
        ThisWorkbook.Path & "\FTP\File1.txt", 0, 0
end sub

当然，这与所有显式循环一样写得尽可能低效，因此需要永远。

我该如何有效地做到这一点？

编辑：根据要求添加了一些示例数据。请注意，我的真实域名是150 ** 2（就分辨率而言），因为我通过切片原始数组创建了样本坐标：import numpy as np from shapely.geometry import Polygon, Point def comp_frequency(paths,lonlat): """ - paths: list of contour paths, made up of (lon,lat) tuples - lonlat: array containing the lon/lat coordinates; shape (nx,ny,2) """ frequency = np.zeros(lonlat.shape[:2]) contours = [Polygon(path) for path in paths] # Very naive and accordingly slow implementation for (i,j),v in np.ndenumerate(frequency): pt = Point(lonlat[i,j,:]) for contour in contours: if contour.contains(pt): frequency[i,j] += 1 return frequency lon = np.array([ [-1.10e+1,-7.82+0,-4.52+0,-1.18+0, 2.19e+0,5.59e+0,9.01+0,1.24+1,1.58+1,1.92+1,2.26+1], [-1.20e+1,-8.65+0,-5.21+0,-1.71+0, 1.81e+0,5.38e+0,8.97+0,1.25+1,1.61+1,1.96+1,2.32+1], [-1.30e+1,-9.53+0,-5.94+0,-2.29+0, 1.41e+0,5.15e+0,8.91+0,1.26+1,1.64+1,2.01+1,2.38+1], [-1.41e+1,-1.04+1,-6.74+0,-2.91+0, 9.76e-1,4.90e+0,8.86+0,1.28+1,1.67+1,2.06+1,2.45+1], [-1.53e+1,-1.15+1,-7.60+0,-3.58+0, 4.98e-1,4.63e+0,8.80+0,1.29+1,1.71+1,2.12+1,2.53+1], [-1.66e+1,-1.26+1,-8.55+0,-4.33+0,-3.00e-2,4.33e+0,8.73+0,1.31+1,1.75+1,2.18+1,2.61+1], [-1.81e+1,-1.39+1,-9.60+0,-5.16+0,-6.20e-1,3.99e+0,8.66+0,1.33+1,1.79+1,2.25+1,2.70+1], [-1.97e+1,-1.53+1,-1.07+1,-6.10+0,-1.28e+0,3.61e+0,8.57+0,1.35+1,1.84+1,2.33+1,2.81+1], [-2.14e+1,-1.69+1,-1.21+1,-7.16+0,-2.05e+0,3.17e+0,8.47+0,1.37+1,1.90+1,2.42+1,2.93+1], [-2.35e+1,-1.87+1,-1.36+1,-8.40+0,-2.94e+0,2.66e+0,8.36+0,1.40+1,1.97+1,2.52+1,3.06+1], [-2.58e+1,-2.08+1,-1.54+1,-9.86+0,-3.99e+0,2.05e+0,8.22+0,1.44+1,2.05+1,2.65+1,3.22+1]]) lat = np.array([ [ 29.6, 30.3, 30.9, 31.4, 31.7, 32.0, 32.1, 32.1, 31.9, 31.6, 31.2], [ 32.4, 33.2, 33.8, 34.4, 34.7, 35.0, 35.1, 35.1, 34.9, 34.6, 34.2], [ 35.3, 36.1, 36.8, 37.3, 37.7, 38.0, 38.1, 38.1, 37.9, 37.6, 37.1], [ 38.2, 39.0, 39.7, 40.3, 40.7, 41.0, 41.1, 41.1, 40.9, 40.5, 40.1], [ 41.0, 41.9, 42.6, 43.2, 43.7, 44.0, 44.1, 44.0, 43.9, 43.5, 43.0], [ 43.9, 44.8, 45.6, 46.2, 46.7, 47.0, 47.1, 47.0, 46.8, 46.5, 45.9], [ 46.7, 47.7, 48.5, 49.1, 49.6, 49.9, 50.1, 50.0, 49.8, 49.4, 48.9], [ 49.5, 50.5, 51.4, 52.1, 52.6, 52.9, 53.1, 53.0, 52.8, 52.4, 51.8], [ 52.3, 53.4, 54.3, 55.0, 55.6, 55.9, 56.1, 56.0, 55.8, 55.3, 54.7], [ 55.0, 56.2, 57.1, 57.9, 58.5, 58.9, 59.1, 59.0, 58.8, 58.3, 57.6], [ 57.7, 59.0, 60.0, 60.8, 61.5, 61.9, 62.1, 62.0, 61.7, 61.2, 60.5]]) lonlat = np.dstack((lon,lat)) paths = [ [(-1.71,34.4),(1.81,34.7),(5.15,38.0),(4.9,41.0),(4.63,44.0),(-0.03,46.7),(-4.33,46.2),(-9.6,48.5),(-8.55,45.6),(-3.58,43.2),(-2.91,40.3),(-2.29,37.3),(-1.71,34.4)], [(0.976,40.7),(-4.33,46.2),(-0.62,49.6),(3.99,49.9),(4.33,47.0),(4.63,44.0),(0.976,40.7)], [(2.9,55.8),(2.37,56.0),(8.47,56.1),(3.17,55.9),(-2.05,55.6),(-1.28,52.6),(-0.62,49.6),(4.33,47.0),(8.8,44.1),(2.29,44.0),(2.71,43.9),(3.18,46.5),(3.25,49.4),(3.33,52.4),(2.9,55.8)], [(2.25,35.1),(2.26,38.1),(8.86,41.1),(5.15,38.0),(5.38,35.0),(9.01,32.1),(2.25,35.1)]] frequency = comp_frequency(paths,lonlat)。

Answer 1

如果您输入的多边形实际上是轮廓线，那么您最好直接使用输入网格，而不是计算轮廓并测试是否有一个点在其中。

等高线遵循网格数据的恒定值。每个轮廓都是一个多边形，它将输入网格的区域大于该值。

如果您需要知道给定点内部有多少轮廓，则在点的位置对输入网格进行采样并运行返回的＆＃34; z＆＃34;值。如果您知道自己创建的轮廓值，则可以直接从中提取其内部轮廓的数量。

例如：

import numpy as np
from scipy.interpolate import RegularGridInterpolator
import matplotlib.pyplot as plt

# One of your input gridded datasets
y, x = np.mgrid[-5:5:100j, -5:5:100j]
z = np.sin(np.hypot(x, y)) + np.hypot(x, y) / 10

contour_values = [-1, -0.5, 0, 0.5, 1, 1.5, 2]

# A point location...
x0, y0 = np.random.normal(0, 2, 2)

# Visualize what's happening...
fig, ax = plt.subplots()
cont = ax.contourf(x, y, z, contour_values, cmap='gist_earth')
ax.plot([x0], [y0], marker='o', ls='none', color='salmon', ms=12)
fig.colorbar(cont)

# Instead of working with whether or not the point intersects the
# contour polygons we generated, we'll turn the problem on its head:

# Sample the grid at the point location
interp = RegularGridInterpolator((x[0,:], y[:,0]), z)
z0 = interp([x0, y0])

# How many contours would the point be inside?
num_inside = sum(z0 > c for c in contour_values)[0]

ax.set(title='Point is inside {} contours'.format(num_inside))
plt.show()

Answer 2

所以同时我找到了一个很好的解决方案，感谢一位同事在某个时候实现了类似的东西（基于this blog post）。

使用匀称

的旧的，非常缓慢的方法

首先，这是我的参考实现使用的形状，这只是我的第一个＆＃34;天真＆＃34;的一个有点精致的版本。在开幕式上接近。它很容易理解和工作，但速度很慢。

import numpy as np
from shapely.geometry import Polygon, Point

def spatial_contour_frequency_shapely(paths,lon,lat):

    frequency = np.zeros(lon.shape)
    contours = [Polygon(path) for path in paths]

    for (i,j),v in np.ndenumerate(frequency):
        pt = Point([lon[i,j],lat[i,j]])
        for contour in contours:
            if contour.contains(pt):
                frequency[i,j] += 1

    return frequency

使用PIL的新的非常快速的解决方案

我的（几乎）最终解决方案不再使用形状，而是使用PIL（Python Imaging Library）中的图像处理技术。这种解决方案要快得多，尽管这种形式只适用于规则的矩形网格（见下面的评论）。

import numpy as np
from PIL import Image, ImageDraw

def _spatial_contour_frequency_pil(paths,lon,lat,regular_grid=False,
        method_ind=None):

    def get_indices(points,lon,lat,tree=None,regular=False):

        def get_indices_regular(points,lon,lat):
            lon,lat = lon.T,lat.T
            def _get_ij(lon,lat,x,y):
                lon0 = lon[0,0]
                lat0 = lat[0,0]
                lon1 = lon[-1,-1]
                lat1 = lat[-1,-1]
                nx,ny = lon.shape
                dx = (lon1-lon0)/nx
                dy = (lat1-lat0)/ny
                i = int((x-lon0)/dx)
                j = int((y-lat0)/dy)
                return i,j
            return [_get_ij(lon,lat,x,y) for x,y in points]

        def get_indices_irregular(points,tree,shape):

            dist,idx = tree.query(points,k=1)
            ind = np.column_stack(np.unravel_index(idx,lon.shape))
            return [(i,j) for i,j in ind]

        if regular:
            return get_indices_regular(points,lon,lat)
        return get_indices_irregular(points,tree,lon.T.shape)

    tree = None
    if not regular_grid:
        lonlat = np.column_stack((lon.T.ravel(),lat.T.ravel()))
        tree = sp.spatial.cKDTree(lonlat)

    frequency = np.zeros(lon.shape)
    for path in paths:
        path_ij = get_indices(path,lon,lat,tree=tree,regular=regular_grid)
        raster_poly = Image.new("L",lon.shape,0)
        rasterize = ImageDraw.Draw(raster_poly)
        rasterize.polygon(path_ij,1)
        mask = sp.fromstring(raster_poly.tobytes(),'b')
        mask.shape = raster_poly.im.size[1],raster_poly.im.size[0]
        frequency += mask

    return frequency

应该注意的是，这两种方法的结果并不相同。使用PIL方法识别的特征略大于用形状方法识别的特征，但实际上并不比另一个更好。

计时

以下是使用简化数据集创建的一些计时（不是来自开放帖子的半人工示例数据）：

spatial_contour_frequency/shapely             :   191.8843
spatial_contour_frequency/pil                 :     0.3287
spatial_contour_frequency/pil-tree-inside     :     2.3629
spatial_contour_frequency/pil-regular_grid    :     0.3276

最耗时的步骤是在轮廓点的不规则lon / lat网格上找到索引。其中最耗时的部分是cKDTree的构造，这就是我将其移出get_indices的原因。为了更好地理解这一点，pil-tree-inside是在get_indices内创建树的版本。 pil-regular-grid与regular_grid=True一致，对于我的数据集，它会产生错误的结果，但会给出一个在常规网格上运行速度的想法。

总的来说，现在我已经成功地消除了非常规网格（pil与pil-regular-grid）的影响，这是我在开始时所希望的！：）

Answer 3

.pipe(gulp.dest("app"))

这里是时间的结果....其中comp_frequency_lc就是那个使用列表理解

def comp_frequency_lc(paths,lonlat):

    import operator
    frequency = np.zeros(lonlat.shape[:2])
    contours = [Polygon(path) for path in paths]

    for (i,j),v in np.ndenumerate(frequency):
        pt = Point(lonlat[i,j,:])
        [
            operator.setitem(frequency,(i,j),
                    operator.getitem(frequency,(i,j))+1)
            for contour in contours if contour.contains(pt)
         ]

    return frequency

    print(comp_frequency(paths,lonlat))

**result in**:

[[ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  1.  0.  0.  1.  2.  2.  2.]
 [ 0.  1.  0.  0.  1.  0.  0.  1.  1.  1.  1.]
 [ 0.  2.  0.  0.  2.  0.  0.  2.  2.  2.  1.]
 [ 0.  2.  0.  0.  1.  0.  0.  1.  1.  1.  2.]
 [ 0.  1.  0.  0.  0.  0.  0.  1.  2.  1.  1.]
 [ 0.  1.  1.  0.  0.  0.  0.  1.  1.  0.  0.]
 [ 0.  1.  1.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]

计算每个场点在轮廓内的频率

3 个答案:

使用匀称

使用PIL的新的非常快速的解决方案

计时