PostGIS最近邻搜索结果无序?

时间:2014-05-29 19:28:15

标签: django postgresql postgis nearest-neighbor

我有一个Django / PostgreSQL应用程序,可以显示哪个用户离特定用户最近。它使用PostGIS 2.0 KNN(K Nearest Neighbors)< - > ORDER BY子句中的运算符列出最近的用户。我在初始数据集中发现的是,其中两个搜索结果出现故障(所有距离都是从加利福尼亚州洛杉矶测量的):

Member, City, State, Distance (miles)

user1, North Las Vegas, NV, 239
user2, Phoenix, AZ, 365
user3, Provo, UT, 568
user4, Twin Falls, ID, 630
user5, Albuquerque, NM, 673
user6, Portland, OR, 828
user7, Bozeman, MT, 896
user8, Seattle, WA, 962
user9, Boulder, CO, 834       <- Out of order!
user10, Laramie, WY, 862      <- Out of order!
user11, Naperville, IL, 1756

成员名称只是Django的contrib.auth.models用户类的用户名列。包含几何信息的UserAccount类定义如下:

class UserAccount(models.Model):
    user = models.OneToOneField(User, primary_key=True, unique=True)
    address_line_1 = models.CharField(max_length=30)
    address_line_2 = models.CharField(max_length=30, blank=True)
    city = models.CharField(max_length=30)
    region = models.CharField(max_length=30, blank=True)
    postal_code = models.CharField(max_length=10, blank=True)
    country = models.ForeignKey('Country')
    measurement_sys = models.CharField(max_length=5)  # US or Metric

    # User's home (default) and current longitude and latitude
    home_lon = models.FloatField(default=0.0)
    home_lat = models.FloatField(default=0.0)
    current_lon = models.FloatField(default=0.0)
    current_lat = models.FloatField(default=0.0)

    # GeoDjango-specific fields 
    home_point = models.PointField(srid=4326)
    current_point = models.PointField(srid=4326)
    objects = models.GeoManager()

这是我的Django视图中的查询:

def members(request, template):
    """View all members of the website."""
    uid = request.session['uid']   # PK from User table

    # Get the current user's lon/lat and measurement system
    try:
        ua = UserAccount.objects.get(user_id=uid)
        lon = ua.current_lon
        lat = ua.current_lat
        measurement_sys = ua.measurement_sys
    except UserAccount.DoesNotExist as e:
        return HttpResponseRedirect(reverse('unable-to-display-members'))

    # Define the proximity query.
    if measurement_sys == 'US':
        multiplier = 0.000621371  # Convert to miles
    else:
        multiplier = 0.001  # Convert to kilometers

    query = "SELECT \
                ua.user_id, \
                au.username, \
                ua.city, \
                ua.region, \
                ST_Distance( \
                    ua.current_point::geography, \
                    ST_GeographyFromText( \
                        'SRID=4326;POINT(" \
                            + str(lon) \
                            + " " \
                            + str(lat) + \
                        ")' \
                    ) \
                )*" + str(multiplier) + " AS distance \
            FROM \
                user_account ua \
                INNER JOIN \
                auth_user au \
                ON (ua.user_id = au.id) \
            WHERE ua.user_id != %s \
            ORDER BY \
                ua.current_point::geometry \
                <-> \
                'SRID=4326;POINT(" + str(lon) + " " + str(lat) + ")'::geometry \
            LIMIT 250;"

    # Run the proximity query
    raw_queryset = UserAccount.objects.raw(query, [uid])

    # Paginate results
    user_list = [user for user in raw_queryset]
    list_size = len(list(user_list))
    paginator = Paginator(user_list, 10, 4)
    paginator._count = list_size

    page = request.GET.get('page')
    try:
        users = paginator.page(page)
    except PageNotAnInteger:
        users = paginator.page(1)
    except EmptyPage:
        users = paginator.page(paginator.num_pages)
    return render(request, template, {'users': users})

我的查询中有什么问题吗? KNN操作员有时会“打嗝”并且无法恢复某些结果吗?我问这个是因为当我尝试从我的表中取出两个无序记录,然后为地址更远的用户添加额外的记录(即在IL,LA,MI,NC,PA,NY和ME),所有结果都是正确的顺序。

顺便说一句,我的输入位于here

谢谢!

1 个答案:

答案 0 :(得分:2)

更新的答案:

Postgis有两个近似解决方案,用于kNN邻居功能,因为September 2011

  • 使用&lt; - &gt;运算符,使用边界框的中心得到最近邻居来计算对象间距离。
  • 使用&lt;#&gt;运算符,您使用边界框自己获得最近邻居来计算对象间距离。

你的问题是,两者都是近似的,所以它们并不完美。因此,如果您想获得最佳的250个结果,您可以使用它们中的任何一个来检索例如最佳的1000个结果,然后通过ST_DISTANCE和LIMIT 250订购相同的结果,以获得大约1000个中的最佳250个结果。

示例:

SELECT * FROM 
    (SELECT *,ST_DISTANCE(current_point::geography, 'SRID=4326;POINT(" + str(lon) + " " + str(lat) + ")'::geography ) AS st_dist
    FROM ua
    ORDER BY current_point::geometry <-> 'SRID=4326;POINT(" + str(lon) + " " + str(lat) + ")'::geometry 
    LIMIT 1000) AS s
    ORDER BY st_dist LIMIT 250;