我有一张表,其中包含Postgresql中search_terms的列表以及其搜索次数:
我正在尝试编写一个将它们组合在一起的查询,即我想看到电动踏板车已经被搜索了27次,而不是20次,其中一种是4错误,另一种是3错误。我想使用相似性函数,以便可以玩极限游戏。
我一直在尝试按相似性进行分组的方法,但是没有成功:
SELECT
search_term,
SUM(count)
FROM
t2
GROUP BY (SELECT set_limit(0.8);
SELECT similarity(n1.search_term, n2.search_term) AS sim, n1.search_term, n2.search_term
FROM t2 n1
JOIN t2 n2 ON n1.search_term <> n2.search_term
AND n1.search_term % n2.search_term
ORDER BY sim DESC)
任何帮助都将不胜感激!
答案 0 :(得分:0)
值0.8不够。因为您的示例中的相似度为0.6以上
尝试此查询
SELECT sim, ss, sum(countt)
FROM (
SELECT sim, '|'||string_agg(s1, '|')||'|' ss
FROM (
SELECT similarity(n1.search_term, n2.search_term) AS sim,
n1.search_term s1, n2.search_term s2
FROM t1 n1
JOIN t1 n2 ON n1.search_term <> n2.search_term
AND n1.search_term % n2.search_term
) t2
WHERE sim > 0.6
GROUP BY sim
) t3
LEFT JOIN t1 n3 ON ss like '%|'||n3.search_term||'|%'
GROUP BY ss, sim
ORDER BY sim DESC