围绕已知集群中心的聚类点

时间:2017-11-20 19:00:38

标签: sql postgresql postgis

我有一组点(~1000)和一组聚类中心(~100)。我现在想要将已知集群中心考虑在内的一组点集群。所有集群都应从已知的集群中心开始向外扩展,收集距离集群内部最近点不到x米的所有点。

我现在有以下非常标准的PostGIS dbscan查询:

WITH clusters AS (
  SELECT
    landmark_id, coordinate,
    ST_ClusterDBSCAN(coordinate, eps := (30 / 111111.0), minpoints := 10) OVER() AS cluster_id
  FROM landmarks 
  WHERE coordinate IS NOT NULL
)
SELECT
  cluster.id, cluster.landmark_ids,
  ST_Centroid(cluster.geometry) AS coordinate,
  ST_AsGeoJSON(cluster.geometry) AS geometry
FROM (
  SELECT
    cluster_id AS id,
    array_agg(landmark_id) AS landmark_ids,
    ST_ConvexHull(ST_Collect(coordinate)) AS geometry
  FROM clusters
  WHERE cluster_id IS NOT NULL
  GROUP BY cluster_id
) AS cluster;

任何指针如何我可以调整上面的查询或编写另一个查询来做我想要的而不诉诸程序代码(如果是这样我会很感激关于它的一些指示)?

1 个答案:

答案 0 :(得分:1)

已经在群集中,我不确定你是指那些被第一个群集拾取的,还是包括那些你会递归接收的群集。

此解决方案仅与原始群集进行比较,不会尝试基于递归群集匹配。那将需要一个递归查询,我怀疑它是否会产生更好的答案。

也不确定为什么你决定使用convexhull计算你的质心,我会假设你想要真正的质心,这可以针对ST_Collect输出完成。

WITH cluster1 AS (
  SELECT
    landmark_id, coordinate,
    ST_ClusterDBSCAN(coordinate, eps := (30 / 111111.0), minpoints := 10) OVER() AS cluster_id
  FROM landmarks 
  WHERE coordinate IS NOT NULL
),
clustered AS ( SELECT * FROM cluster1 WHERE cluster_id IS NOT NULL ) 
clusterall AS (
SELECT 
    l.landmark_id, l.coordinate, c.cluster_id
 FROM landmarks AS l
    CROSS JOIN 
    -- find closest cluster
        LATERAL (SELECT cluster_id 
                FROM clustered AS c 
            ORDER BY  c.coordinate <-> l.coordinate LIMIT 1 ) AS c
    -- only look for landmarks not matched to a cluster
    WHERE l.landmark_id NOT IN(SELECT c.landmark_id FROM clustered AS c)
UNION ALL
SELECT c.landmark_id, c.coordinate, c.cluster_id
    FROM cluster1 
)
SELECT
  cluster.id, cluster.landmark_ids,
  ST_Centroid(cluster.geometry) AS coordinate,
  ST_AsGeoJSON(cluster.geometry) AS geometry
FROM (
  SELECT
    cluster_id AS id,
    array_agg(landmark_id) AS landmark_ids,
    ST_ConvexHull(ST_Collect(coordinate)) AS geometry
  FROM clusterall
  GROUP BY cluster_id
) AS cluster;