作为k-means算法的一部分,我试图更新每个项目所属的集群,如下面的查询。问题是,我似乎无法在嵌套查询中引用表d。
UPDATE algorithms.km_crimes d SET cluster_id = c.id
FROM (SELECT id FROM algorithms.km_cluster_centres c
ORDER BY |/ (POW(d.latitude-c.latitude,2)+POW(d.longitude-c.longitude,2))
ASC LIMIT 1) AS c
WHERE d.cluster_id IS DISTINCT FROM c.id;
有人可以建议如何重组查询吗?我已经尝试了太多的修改来计算
答案 0 :(得分:1)
根据您要转换的MySQL example,您根本不需要更改此第一个查询。
算法不关心在每次迭代中重新分配cluster_id
的次数;它只需要在没有移动任何集群中心时停止。幸运的是,第二个查询更容易修复。
这似乎有效:
CREATE TABLE km_data (id serial, cluster_id int, lat double precision, lng double precision);
CREATE TABLE km_clusters (id serial, lat double precision, lng double precision);
CREATE OR REPLACE FUNCTION kmeans(k int) RETURNS VOID LANGUAGE plpgsql AS $$
BEGIN
TRUNCATE km_clusters;
INSERT INTO km_clusters (lat, lng)
SELECT lat, lng FROM km_data
ORDER BY random() LIMIT k;
LOOP
UPDATE km_data d SET cluster_id = (
SELECT id FROM km_clusters c
ORDER BY |/(POW(d.lat-c.lat,2)+POW(d.lng-c.lng,2)) LIMIT 1
);
UPDATE km_clusters c
SET lat=d.lat, lng=d.lng
FROM (
SELECT
cluster_id,
AVG(lat) AS lat,
AVG(lng) AS lng
FROM km_data
GROUP BY cluster_id
) d
WHERE
c.id=d.cluster_id AND
ABS(c.lat-d.lat) < 0.001 AND
ABS(c.lng-d.lng) < 0.001;
EXIT WHEN NOT FOUND;
END LOOP;
END $$;
如果你想要更高的精度,你可以调整最后WHERE
子句中的数字,虽然这看起来像一个非常不精确的算法开始。
答案 1 :(得分:0)
Have you tried to do it without alias?
UPDATE algorithms.km_crimes SET cluster_id = c.id
FROM (SELECT id FROM algorithms.km_cluster_centres c
ORDER BY |/ (POW(algorithms.km_crimes.latitude-c.latitude,2)+POW(algorithms.km_crimes.longitude-c.longitude,2)) ASC LIMIT 1) AS c
WHERE algorithms.km_crimes.cluster_id IS DISTINCT FROM c.id;