Question

1000万用户gps数据，结构如下：

userId
startGps
endGps

一个用户有两个gps，起点和终点。如果来自不同用户的两个点的距离大于1km。我们定义用户可能是密切关系。

userA startGpsA endGpsA
userB startGpsB endGpsB

function relation(userGpsA A, userGpsB B)
    if(distance(A.startGps , B.startGps) > 1km || distance(A.startGps , B.endGps) > 1km || distance(A.endGps , B.endGps) > 1km)
        return A,B
    return null

我怎么能快速找到这些关系？

Answer 1

一种可能的算法使用空间＆＃39;桶。减少计算时间。它不会做特殊的线程技巧，但会减少很多用户要比较的数量（取决于存储桶的大小）。

这个想法是把相同的桶装进去。每个用户已经相距不远，并在“桶”上创建索引。允许获得相邻的桶＃39低成本。

我们假设我们有

class User{
    long userId;
    GPS start;
    GPS stop;
}

class GPS{
    long x;
    long y;
}

首先，我们为索引用户创建一个类：

class BucketEntity implements Comparable<BucketEntity>{
 User origin;
 long x;
 long y
}
class Bucket extends Set<BucketEntity {
}

对于每个用户，我们将创建两个BucketEntity，一个用于＆＃39; start＆＃39;和一个结束＆＃39;。我们将把BucketEntity存储到一个特定索引的数据结构中，以便轻松地检索最近的其他BucketEntity。

class Index extends ConcurrentHashMap<BucketEntity,Bucket> {
      // Overload the 'put' implementation to correctly manage the Bucket (null initialy, etc...)
}

我们所需要的只是实现＆＃39; hash＆＃39; （以及BucketEntity类中的＆＃39; equals＆＃39;方法。如果两个BucketEntity不是这样，那么＆＃39; hash＆＃39;和＃39; equals＆＃39;的规范是相同的对于给定的BucketEntity，我们还希望能够计算与另一个Bucket空间相邻的Bucket的哈希函数。

为了获得＆＃39; hash＆＃39;的正确行为和＃等于＆＃39;一个好的/快速的解决方案是进行精确度降低＆＃39;。总之，如果你有＆＃39; x = 1248813＆＃39;你用＆＃39; x = 124＆＃39;替换它（除以1000）就像将gps-meter精度改为gps-km精度一样。

public static long scall = 1000;
boolean equals(BucketEntity that)
{
   if (this == that) return true;
   if (this.x / scall == that.x / scall &&
       this.y / scall == that.y / scall)
      return true;
   return false;
}

// Maybe an 'int' is not enough to correctly hash your data
// if so you have to create you own implementation of Map
// with a special "long hashCode()" support.
int hashCode()
{
     // We put the 'x' bits in the high level, and the 'y' bits in the low level.
     // So the 'x' and 'y' don't conflict.
     // Take extra-care of the value of 'scall' relatively to your data and the max value of 'int'. scall == 10000 should be a maximum.
     return (this.x / scall) * scall + (this.y / scall);
}

正如你在hashCode（）方法中看到的那样，彼此接近的Bucket真的接近hashCode（），如果我给你一个Bucket，你也可以计算空间相邻的Bucket hashCode（）。

现在你可以获得与你给定的BucketEntity在同一个Bucket中的BucketEntity。要获得相邻的存储桶，您需要创建9个虚拟BucketEntity来获取（）＆＃39; get（）＆＃39; BucketEntity Bucket周围的Bucket / null。

   List<BucketEntity> shortListToCheck = // A List not a Set !
   shortListToCheck.addindex.get(new BucketEntity(user, (x / scall)+1  , (y/scall)+1 )));
   shortListToCheck.addindex.get(new BucketEntity(user, (x / scall)+1  , (y/scall) )));
   shortListToCheck.addindex.get(new BucketEntity(user, (x / scall)+1  , (y/scall)-1 )));
   shortListToCheck.addindex.get(new BucketEntity(user, (x / scall)+1  , (y/scall)+1 )));
   shortListToCheck.addindex.get(new BucketEntity(user, (x / scall)    , (y/scall) )));
   shortListToCheck.addindex.get(new BucketEntity(user, (x / scall)-1  , (y/scall)-1 )));
   shortListToCheck.addindex.get(new BucketEntity(user, (x / scall)-1  , (y/scall)+1 )));
   shortListToCheck.addindex.get(new BucketEntity(user, (x / scall)-1  , (y/scall) )));
   shortListToCheck.addindex.get(new BucketEntity(user, (x / scall)-1  , (y/scall)-1 )));

get（）所有与9个虚拟BucketEntry匹配的Buckets（可以为null）。对于给定9个桶的每个用户，确实按照您在问题中提供的方式计算距离。

然后玩＆＃39; scall＆＃39;。你有没有看到，这里的多线程没有真正的限制。也许下一级算法优化是基于适应性缩放大小的自适应/递归桶大小。

查找gps数据之间的关系

1 个答案: