Question

我需要帮助制作一个高级的Postgres查询。我试图找到两个相邻的单词的句子，直接使用Postgres，而不是一些命令语言扩展。我的表是：

TABLE word (spelling text, wordid serial)
TABLE sentence (sentenceid serial)
TABLE item (sentenceid integer, position smallint, wordid integer)

我有一个简单的查询来查找单个单词的句子：

SELECT DISTINCT sentence.sentenceid 
FROM item,word,sentence 
WHERE word.spelling = 'word1' 
  AND item.wordid = word.wordid 
  AND sentence.sentenceid = item.sentenceid

我想依次用其他单词（ word2 ）过滤该查询的结果，其对应项目的 item.sentenceid 等于当前查询结果＆＃ 39; s（ item 或句子）＆＃39; s sentenceid ，其中 item.position 等于当前查询结果 item.position + 1 。如何以高效的方式优化我的查询以实现此目标？

Answer 1

我认为这符合您的要求，抱歉但我现在不记得如何在不使用join子句的情况下编写它。基本上，我包括一个自我加入项目和单词表，以获得每个项目的句子上的下一项。如果查询规划器不喜欢我的嵌套选择，你也可以尝试连接单词表。

SELECT distinct sentence.sentenceid 
FROM item inner join word 
        on item.wordid = word.wordid
    inner join sentence
        on sentence.sentenceid = item.sentenceid 
    left join (select sentence.sentenceid,
                                item.position,
                                word.spelling from subsequent_item 
                    inner join subsequent_word 
                        on item.wordid = word.wordid) subsequent
        on subsequent.sentenceid = item.sentenceid
            and subsequent.position = item.position +1
where   word.spelling = 'word1' and subsequent.spelling = 'word2';

Answer 2

更简单的解决方案，但仅在item.position s：

中没有间隙时才给出结果

SELECT DISTINCT sentence.sentenceid 
  FROM sentence 
  JOIN item ON sentence.sentenceid = item.sentenceid
  JOIN word ON item.wordid = word.wordid
  JOIN item AS next_item ON sentence.sentenceid = next_item.sentenceid
                        AND next_item.position = item.position + 1
  JOIN word AS next_word ON next_item.wordid = next_word.wordid
 WHERE word.spelling = 'word1'
   AND next_word.spelling = 'word2'

更一般的解决方案，使用window functions：

SELECT DISTINCT sentenceid
FROM (SELECT sentence.sentenceid,
             word.spelling,
             lead(word.spelling) OVER (PARTITION BY sentence.sentenceid
                                           ORDER BY item.position)
        FROM sentence 
        JOIN item ON sentence.sentenceid = item.sentenceid
        JOIN word ON item.wordid = word.wordid) AS pairs
 WHERE spelling = 'word1'
   AND lead = 'word2'

修改：也是一般解决方案（允许间隙），但仅限加入：

SELECT DISTINCT sentence.sentenceid
  FROM sentence 
  JOIN item ON sentence.sentenceid = item.sentenceid
  JOIN word ON item.wordid = word.wordid
  JOIN item AS next_item ON sentence.sentenceid = next_item.sentenceid
                        AND next_item.position > item.position
  JOIN word AS next_word ON next_item.wordid = next_word.wordid
  LEFT JOIN item AS mediate_word ON sentence.sentenceid = mediate_word.sentenceid
                                AND mediate_word.position > item.position
                                AND mediate_word.position < next_item.position
 WHERE mediate_word.wordid IS NULL
   AND word.spelling = 'word1'
   AND next_word.spelling = 'word2'

Answer 3

select
  *
from mytable
where
  round( 0.1 / ts_rank_cd( to_tsvector(mycolumn), to_tsquery('word1 & word2') ) <= 1

这实际上会有效，假设您没有使用A-D重量标签，否则您需要将0.1更改为其他内容。

你也想要添加一个tsvector @@ tsquery where子句。

在Pg中找到彼此相邻的两个单词的句子

3 个答案: