使用多个表中的每一行,使用最相似的名称连接记录

时间:2017-07-05 01:45:43

标签: postgresql join sql-order-by limit

平台: PostgreSQL

表格

shortlist:  name (text), city (text)...
data1:      name (text), ranking (integer), score1 (double)...
data2:      name (text), ranking (integer), score1 (double)...
data3:      name (text), ranking (integer), score1 (double)...
data4:      name (text), ranking (integer), score1 (double)...

数量相似的数据表格数量有限。

我想将shortlist中的每一行加入每个data表中由similarity(shortlist.name, data#.name)确定的名称最相似的行。

相同想法的伪代码:

for each s_row in shortlist:
    select shortlist.*
    join (SELECT data1.*, similarity(s_row.name, data1.name) AS sim FROM data1 ORDER BY sim DESC LIMIT 1)
    join (SELECT data2.*, similarity(s_row.name, data2.name) AS sim FROM data2 ORDER BY sim DESC LIMIT 1)
    join (SELECT data3.*, similarity(s_row.name, data3.name) AS sim FROM data3 ORDER BY sim DESC LIMIT 1)
    join (SELECT data4.*, similarity(s_row.name, data4.name) AS sim FROM data4 ORDER BY sim DESC LIMIT 1)

有没有办法在SQL中执行此操作?

1 个答案:

答案 0 :(得分:1)

我不完全确定你的意思是这样的:

select s.name, 
       d1.name as d1_name, 
       d2.name as d2_name
from shortlist s 
  left join lateral (
    SELECT data1.*, similarity(s.name, data1.name) AS sim 
    FROM data1 
    ORDER BY sim 
    DESC LIMIT 1
  ) d1 on true
  left join lateral (
    SELECT data2.*, similarity(s.name, data2.name) AS sim 
    FROM data2 
    ORDER BY sim DESC 
    LIMIT 1
  ) d2 on true

您希望每个表都有一个外部联接(left join),否则如果至少有一个表中没有匹配项,您将看不到任何内容。