计算PostgreSQL中相似字符串的数量

时间:2018-11-22 18:09:54

标签: postgresql grouping similarity

我有一张表,其中包含Postgresql中search_terms的列表以及其搜索次数:

Search Term Table

我正在尝试编写一个将它们组合在一起的查询,即我想看到电动踏板车已经被搜索了27次,而不是20次,其中一种是4错误,另一种是3错误。我想使用相似性函数,以便可以玩极限游戏。

我一直在尝试按相似性进行分组的方法,但是没有成功:

SELECT 
search_term,
SUM(count)

FROM 
t2

GROUP BY (SELECT set_limit(0.8);

SELECT similarity(n1.search_term, n2.search_term) AS sim, n1.search_term, n2.search_term
FROM   t2 n1
JOIN   t2 n2 ON n1.search_term <> n2.search_term
               AND n1.search_term % n2.search_term
ORDER  BY sim DESC)

任何帮助都将不胜感激!

1 个答案:

答案 0 :(得分:0)

值0.8不够。因为您的示例中的相似度为0.6以上

尝试此查询

SELECT sim, ss, sum(countt)
  FROM (
    SELECT sim, '|'||string_agg(s1,  '|')||'|' ss
      FROM (
        SELECT similarity(n1.search_term, n2.search_term) AS sim, 
               n1.search_term s1, n2.search_term s2
          FROM t1 n1
          JOIN t1 n2 ON n1.search_term <> n2.search_term
           AND n1.search_term % n2.search_term
           ) t2    
     WHERE sim > 0.6
     GROUP BY sim 
       ) t3
  LEFT JOIN t1 n3 ON ss like '%|'||n3.search_term||'|%' 
 GROUP BY ss, sim
 ORDER BY sim DESC

这里是我的样本-http://sqlfiddle.com/#!17/1d705/35