SQL-需要以组内的最小距离有效地配对两个实体

时间:2018-07-27 16:52:15

标签: sql join distance vertica

在此示例中,我有一个表,其中包含人员列表,组类别和每个人员的位置(长/纬度坐标)。一个人可以分为多个组。这是一个示例表:

Person  Group   Long     Lat
1       1       11       23
2       1       12       24
.       .       .        .
.       .       .        .
.       .       .        .
2       2       12       24

我还有另一个表,其中列出了企业,它们的位置以及与第一个表中的分组匹配的共享组。同样,企业可以分为多个组。表格示例:

Busns   Group   Long     Lat
5       1       5        6
6       1       6        7
.       .       .        .
.       .       .        .
.       .       .        .
5       2       5        6

我想按人员和按组,使企业之间的距离最小。事实证明,这是一项非常消耗内存的任务。目前,我通过RIGHT JOIN创建了一个巨大的表格,该表格然后针对每个组来衡量人与企业之间的距离。然后创建另一个,为每个组中的每个人查找最小距离,然后执行INNER JOIN以便将原始表配对。示例代码:

DROP TABLE IF EXISTS DistancePairs;
CREATE LOCAL TEMPORARY TABLE DistancePairs ON COMMIT PRESERVE ROWS AS (
SELECT  a.Person
        ,a.Group
        ,b.Business
        ,a.Latitude AS PersonLat
        ,a.Longitude AS PersonLong
        ,b.Latitude AS BusinessLat
        ,b.Longitude AS BusinessLong
        ,0.621371*DISTANCEV(a.Latitude,a.Longitude,b.Latitude,b.Longitude) AS AproxDistance
FROM people a
RIGHT JOIN business b
ON a.Group = b.Group
);

DROP TABLE IF EXISTS MinDist;
CREATE LOCAL TEMPORARY TABLE MinDist ON COMMIT PRESERVE ROWS AS (
SELECT DISTINCT
    Person
    ,Group
    ,MIN(AproxDistance) AS AproxDistance
FROM Distance Pairs
);

SELECT  a.Person
        ,a.Group
        ,a.Business
        ,a.AproxDistance
FROM DistancePairs a
JOIN MindDist b
ON a.Person = b.Person
AND a.Group = b.Group
AND a.AproxDistance = b.AproxDistance
;

有更好的方法吗?给定我正在使用的数据集的大小,这将非常糟糕,并且运行数小时。原始的Person和Business表已经使用WHERE语句创建,以限制其大小。

1 个答案:

答案 0 :(得分:1)

您可以尝试在查询中加入一个联接,然后再加上一个LIMIT子句吗?

我只有一点点示例数据,因此我无法真正对其意义或废话进行测试。但是这里:

WITH
-- this is your input data ...
persons        ( Person, grp,  Long,    Lat ) AS (
          SELECT 1   ,   1   ,   11  ,    23
UNION ALL SELECT 2   ,   1   ,   12  ,    24
UNION ALL SELECT 2   ,   2   ,   12  ,    24
)
,
-- and this, is also your input data ....
businesses     (Busns,  grp,  Long,    Lat) AS (
          SELECT 5   ,   1   ,   5  ,     6
UNION ALL SELECT 6   ,   1   ,   6  ,     7
UNION ALL SELECT 5   ,   2   ,   5  ,     6
)
,
-- real WITH clause would start here ....
join_and_calc AS (
SELECT
  person
, p.grp
, busns
, p.lat
, p.long
, b.lat
, b.long
, 0.621371 * DISTANCEV(p.lat,p.long,b.lat,b.long) AS app_dist
FROM persons    p
JOIN businesses b USING(grp)
)
SELECT
  *
FROM join_and_calc
LIMIT 1 OVER(PARTITION BY person,grp,busns ORDER BY app_dist)
;

我得到的结果是:

 person | grp | busns | lat | long | lat | long |     app_dist     
--------+-----+-------+-----+------+-----+------+------------------
      1 |   1 |     5 |  23 |   11 |   6 |    5 | 1235.42458453758
      1 |   1 |     6 |  23 |   11 |   7 |    6 | 1149.36524763703
      2 |   1 |     5 |  24 |   12 |   6 |    5 | 1322.28298287477
      2 |   1 |     6 |  24 |   12 |   7 |    6 | 1234.90557929051
      2 |   2 |     5 |  24 |   12 |   6 |    5 | 1322.28298287477

祝你好运- 马可