对表" d"的FROM子句条目的无效引用

时间:2016-03-02 10:02:28

标签: postgresql k-means

作为k-means算法的一部分,我试图更新每个项目所属的集群,如下面的查询。问题是,我似乎无法在嵌套查询中引用表d。

UPDATE algorithms.km_crimes d SET cluster_id = c.id 
FROM (SELECT id FROM algorithms.km_cluster_centres c 
ORDER BY |/ (POW(d.latitude-c.latitude,2)+POW(d.longitude-c.longitude,2))      
ASC LIMIT 1) AS c
WHERE d.cluster_id IS DISTINCT FROM c.id;

有人可以建议如何重组查询吗?我已经尝试了太多的修改来计算

2 个答案:

答案 0 :(得分:1)

根据您要转换的MySQL example,您根本不需要更改此第一个查询。

算法不关心在每次迭代中重新分配cluster_id的次数;它只需要在没有移动任何集群中心时停止。幸运的是,第二个查询更容易修复。

这似乎有效:

CREATE TABLE km_data (id serial, cluster_id int, lat double precision, lng double precision);
CREATE TABLE km_clusters (id serial, lat double precision, lng double precision);

CREATE OR REPLACE FUNCTION kmeans(k int) RETURNS VOID LANGUAGE plpgsql AS $$
BEGIN
  TRUNCATE km_clusters;

  INSERT INTO km_clusters (lat, lng)
  SELECT lat, lng FROM km_data
  ORDER BY random() LIMIT k;

  LOOP
    UPDATE km_data d SET cluster_id = (
      SELECT id FROM km_clusters c 
      ORDER BY |/(POW(d.lat-c.lat,2)+POW(d.lng-c.lng,2)) LIMIT 1
    );

    UPDATE km_clusters c
    SET lat=d.lat, lng=d.lng
    FROM (
      SELECT
        cluster_id, 
        AVG(lat) AS lat,
        AVG(lng) AS lng
      FROM km_data
      GROUP BY cluster_id
    ) d 
    WHERE
      c.id=d.cluster_id AND
      ABS(c.lat-d.lat) < 0.001 AND
      ABS(c.lng-d.lng) < 0.001;

    EXIT WHEN NOT FOUND;
  END LOOP;
END $$;

如果你想要更高的精度,你可以调整最后WHERE子句中的数字,虽然这看起来像一个非常不精确的算法开始。

答案 1 :(得分:0)

Have you tried to do it without alias?

UPDATE algorithms.km_crimes SET cluster_id = c.id 
FROM (SELECT id FROM algorithms.km_cluster_centres c 
    ORDER BY |/ (POW(algorithms.km_crimes.latitude-c.latitude,2)+POW(algorithms.km_crimes.longitude-c.longitude,2)) ASC LIMIT 1) AS c
    WHERE algorithms.km_crimes.cluster_id IS DISTINCT FROM c.id;