RTree:计算另一组点的每个点内的邻域中的点数

时间:2017-06-19 04:22:26

标签: gis geospatial spatial-index r-tree geopandas

为什么这不会返回每个邻域中的点数(边界框)?

if ($$("a.top-link-cart")) {

每个循环都会产生相同的结果。

第二个问题,我怎样才能找到第k个最近邻居?

有关问题本身的更多信息:

  • 我们这样做的规模非常小,例如华盛顿州,美国或不列颠哥伦比亚省,加拿大

  • 我们希望尽可能多地使用geopandas,因为它类似于pandas并支持空间索引:RTree

  • 例如,sindex这里有方法最近,交叉点等。

如果您需要更多信息,请发表评论。这是GeoPandasBase类中的代码

if ($$("a.top-link-cart").length) {

我试过理查德的例子,但它没有工作

import geopandas as gpd

def radius(points_neighbour, points_center, new_field_name, r):
    """
    :param points_neighbour:
    :param points_center:
    :param new_field_name: new field_name attached to points_center
    :param r: radius around points_center
    :return:
    """
    sindex = points_neighbour.sindex
    pts_in_neighbour = []
    for i, pt_center in points_center.iterrows():
        nearest_index = list(sindex.intersection((pt_center.LATITUDE-r, pt_center.LONGITUDE-r, pt_center.LATITUDE+r, pt_center.LONGITUDE+r)))
        pts_in_this_neighbour = points_neighbour[nearest_index]
        pts_in_neighbour.append(len(pts_in_this_neighbour))
    points_center[new_field_name] = gpd.GeoSeries(pts_in_neighbour)

要下载形状文件,请转到https://catalogue.data.gov.bc.ca/dataset/hellobc-activities-and-attractions-listing并选择ArcView下载

2 个答案:

答案 0 :(得分:3)

我认为你做错了,而不是直接回答你的问题。在争论之后,我会给出一个更好的答案。

为什么你做错了

r-tree适用于两个或三个欧几里德维度的边界框查询。

您正在三维空间中弯曲的二维曲面上查找经度 - 纬度点。结果是您的坐标系将产生奇点和不连续性:180°W与180°E相同,2°E到90°N接近2°W到90°N。 r树没有抓住这些东西!

但是,即使它们是一个很好的解决方案,你的想法是采用 lat±r lon±r 产生一个方形区域;相反,你可能想要在你的观点周围有一个圆形区域。

如何正确行事

  1. 不是将点保持为lon-lat格式,而是使用spherical coordinate conversion将它们转换为xyz格式。现在他们处于3D欧几里德空间,没有奇点或不连续性。

  2. 将点放在三维kd-tree中。这使您可以在 O(log n)时间内快速提出诸如“此时k最近邻居是什么?”之类的问题。和“这些点的半径 r 中的所有点是什么?” SciPy附带an implementation

  3. 对于您的半径搜索,从Great Circle radius转换为chord:这使得3空间中的搜索相当于包裹在球体表面上的圆上的半径搜索(在这种情况下,地球)。

  4. 正确执行的代码

    我已经在Python中实现了上述内容作为演示。请注意,所有球形点都使用lon = [ - 180,180],lat = [ - 90,90]方案以(经度,纬度)/(x-y)格式存储。所有3D点都以(x,y,z)格式存储。

    #/usr/bin/env python3
    
    import numpy as np
    import scipy as sp
    import scipy.spatial
    
    Rearth = 6371
    
    #Generate uniformly-distributed lon-lat points on a sphere
    #See: http://mathworld.wolfram.com/SpherePointPicking.html
    def GenerateUniformSpherical(num):
      #Generate random variates
      pts      = np.random.uniform(low=0, high=1, size=(num,2))
      #Convert to sphere space
      pts[:,0] = 2*np.pi*pts[:,0]          #0-360 degrees
      pts[:,1] = np.arccos(2*pts[:,1]-1)   #0-180 degrees
      #Convert to degrees
      pts = np.degrees(pts)
      #Shift ranges to lon-lat
      pts[:,0] -= 180
      pts[:,1] -= 90
      return pts
    
    def ConvertToXYZ(lonlat):
      theta  = np.radians(lonlat[:,0])+np.pi
      phi    = np.radians(lonlat[:,1])+np.pi/2
      x      = Rearth*np.cos(theta)*np.sin(phi)
      y      = Rearth*np.sin(theta)*np.sin(phi)
      z      = Rearth*np.cos(phi)
      return np.transpose(np.vstack((x,y,z)))
    
    #Get all points which lie with `r_km` Great Circle kilometres of the query
    #points `qpts`.
    def GetNeighboursWithinR(qpts,kdtree,r_km):
      #We need to convert Great Circle kilometres into chord length kilometres in
      #order to use the kd-tree
      #See: http://mathworld.wolfram.com/CircularSegment.html
      angle        = r_km/Rearth
      chord_length = 2*Rearth*np.sin(angle/2)
      pts3d        = ConvertToXYZ(qpts)
      #See: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.query_ball_point.html#scipy.spatial.KDTree.query_ball_point
      #p=2 implies Euclidean distance, eps=0 implies no approximation (slower)
      return kdtree.query_ball_point(pts3d,chord_length,p=2,eps=0) 
    
    
    ##############################################################################
    #WARNING! Do NOT alter pts3d or kdtree will malfunction and need to be rebuilt
    ##############################################################################
    
    ##############################
    #Correctness tests on the North, South, East, and West poles, along with Kolkata
    ptsll = np.array([[0,90],[0,-90],[0,0],[-180,0],[88.3639,22.5726]])
    pts3d = ConvertToXYZ(ptsll)
    kdtree = sp.spatial.KDTree(pts3d, leafsize=10) #Stick points in kd-tree for fast look-up
    
    qptsll = np.array([[-3,88],[5,-85],[10,10],[-178,3],[175,4]])
    GetNeighboursWithinR(qptsll, kdtree, 2000)
    
    ##############################
    #Stress tests
    ptsll = GenerateUniformSpherical(100000)    #Generate uniformly-distributed lon-lat points on a sphere
    pts3d = ConvertToXYZ(ptsll)                 #Convert points to 3d
    #See: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.html
    kdtree = sp.spatial.KDTree(pts3d, leafsize=10) #Stick points in kd-tree for fast look-up
    
    qptsll = GenerateUniformSpherical(100)      #We'll find neighbours near these points
    GetNeighboursWithinR(qptsll, kdtree, 500)
    

答案 1 :(得分:3)

我附上的代码应该通过一些小的修改来做你想做的事。

我认为你的问题出现了两个原因之一:

  1. 您没有正确构建空间索引。您对我的评论的回复表明您并不完全了解空间索引是如何制作的。

  2. 空间查询的边界框未正确构建。

  3. 我将在下面讨论两种可能性。

    构建空间索引

    事实证明,只需键入以下内容即可构建空间索引:

    sindex = gpd_df.sindex
    

    魔术。

    gpd_df.sindex从哪里获取数据?它假定数据以geometry格式存储在名为shapely的列中。如果您尚未向此类列添加数据,则会发出警告。

    数据框的正确初始化如下所示:

    #Generate random points throughout Oregon
    x = np.random.uniform(low=oregon_xmin, high=oregon_xmax, size=10000)
    y = np.random.uniform(low=oregon_ymin, high=oregon_ymax, size=10000)
    
    #Turn the lat-long points into a geodataframe
    gpd_df = gpd.GeoDataFrame(data={'x':x, 'y':y})
    #Set up point geometries so that we can index the data frame
    #Note that I am using x-y points!
    gpd_df['geometry'] = gpd_df.apply(lambda row: shapely.geometry.Point((row['x'], row['y'])), axis=1)
    
    #Automagically constructs a spatial index from the `geometry` column
    gpd_df.sindex 
    

    在您的问题中查看上述类型的示例代码将有助于诊断您的问题并继续解决它。

    由于在缺少几何列时没有得到非常明显的警告geopandas引发:

      

    AttributeError:尚未设置几何数据集(预期在'几何'列中。

    我认为你可能已经完成了这一部分。

    构造边界框

    在你的问题中,你形成一个像这样的边界框:

    nearest_index = list(sindex.intersection((pt_center.LATITUDE-r, pt_center.LONGITUDE-r, pt_center.LATITUDE+r, pt_center.LONGITUDE+r)))
    

    事实证明,边界框的格式为:

    (West, South, East, North)
    

    至少,它们适用于X-Y风格的点,例如shapely.geometry.Point(Lon,Lat)

    在我的代码中,我使用以下内容:

    bbox = (cpt.x-radius, cpt.y-radius, cpt.x+radius, cpt.y+radius)
    

    工作示例

    将上述内容放在一起引导我进入这个工作实例。请注意,我还演示了如何按距离对点进行排序,回答第二个问题。

    #!/usr/bin/env python3
    
    import numpy as np
    import numpy.random
    import geopandas as gpd
    import shapely.geometry
    import operator
    
    oregon_xmin = -124.5664
    oregon_xmax = -116.4633
    oregon_ymin = 41.9920
    oregon_ymax = 46.2938
    
    def radius(gpd_df, cpt, radius):
      """
      :param gpd_df: Geopandas dataframe in which to search for points
      :param cpt:    Point about which to search for neighbouring points
      :param radius: Radius about which to search for neighbours
      :return:       List of point indices around the central point, sorted by
                     distance in ascending order
      """
      #Spatial index
      sindex = gpd_df.sindex
      #Bounding box of rtree search (West, South, East, North)
      bbox = (cpt.x-radius, cpt.y-radius, cpt.x+radius, cpt.y+radius)
      #Potential neighbours
      good = []
      for n in sindex.intersection(bbox):
        dist = cpt.distance(gpd_df['geometry'][n])
        if dist<radius:
          good.append((dist,n))
      #Sort list in ascending order by `dist`, then `n`
      good.sort() 
      #Return only the neighbour indices, sorted by distance in ascending order
      return [x[1] for x in good]
    
    #Generate random points throughout Oregon
    x = np.random.uniform(low=oregon_xmin, high=oregon_xmax, size=10000)
    y = np.random.uniform(low=oregon_ymin, high=oregon_ymax, size=10000)
    
    #Turn the lat-long points into a geodataframe
    gpd_df = gpd.GeoDataFrame(data={'x':x, 'y':y})
    #Set up point geometries so that we can index the data frame
    gpd_df['geometry'] = gpd_df.apply(lambda row: shapely.geometry.Point((row['x'], row['y'])), axis=1)
    
    #The 'x' and 'y' columns are now stored as part of the geometry, so we remove
    #their columns in order to save space
    del gpd_df['x']
    del gpd_df['y']
    
    for i, row in gpd_df.iterrows():
      neighbours = radius(gpd_df,row['geometry'],0.5)
      print(neighbours)
      #Use len(neighbours) here to construct a new row for the data frame
    

    (我在评论中一直要求的是看起来像上述内容的代码,但这些代码证明了您的问题。请注意使用random来简洁地生成用于实验的数据集。)