在Pg中找到彼此相邻的两个单词的句子

时间:2014-05-14 03:02:10

标签: sql postgresql full-text-search

我需要帮助制作一个高级的Postgres查询。我试图找到两个相邻的单词的句子,直接使用Postgres,而不是一些命令语言扩展。我的表是:

TABLE word (spelling text, wordid serial)
TABLE sentence (sentenceid serial)
TABLE item (sentenceid integer, position smallint, wordid integer)

我有一个简单的查询来查找单个单词的句子:

SELECT DISTINCT sentence.sentenceid 
FROM item,word,sentence 
WHERE word.spelling = 'word1' 
  AND item.wordid = word.wordid 
  AND sentence.sentenceid = item.sentenceid 

我想依次用其他单词( word2 )过滤该查询的结果,其对应项目的 item.sentenceid 等于当前查询结果&# 39; s( item 句子)' s sentenceid ,其中 item.position 等于当前查询结果 item.position + 1 。如何以高效的方式优化我的查询以实现此目标?

3 个答案:

答案 0 :(得分:1)

我认为这符合您的要求,抱歉但我现在不记得如何在不使用join子句的情况下编写它。基本上,我包括一个自我加入项目和单词表,以获得每个项目的句子上的下一项。如果查询规划器不喜欢我的嵌套选择,你也可以尝试连接单词表。

SELECT distinct sentence.sentenceid 
FROM item inner join word 
        on item.wordid = word.wordid
    inner join sentence
        on sentence.sentenceid = item.sentenceid 
    left join (select sentence.sentenceid,
                                item.position,
                                word.spelling from subsequent_item 
                    inner join subsequent_word 
                        on item.wordid = word.wordid) subsequent
        on subsequent.sentenceid = item.sentenceid
            and subsequent.position = item.position +1
where   word.spelling = 'word1' and subsequent.spelling = 'word2';

答案 1 :(得分:1)

更简单的解决方案,但仅在item.position s:

中没有间隙时才给出结果
SELECT DISTINCT sentence.sentenceid 
  FROM sentence 
  JOIN item ON sentence.sentenceid = item.sentenceid
  JOIN word ON item.wordid = word.wordid
  JOIN item AS next_item ON sentence.sentenceid = next_item.sentenceid
                        AND next_item.position = item.position + 1
  JOIN word AS next_word ON next_item.wordid = next_word.wordid
 WHERE word.spelling = 'word1'
   AND next_word.spelling = 'word2'

更一般的解决方案,使用window functions

SELECT DISTINCT sentenceid
FROM (SELECT sentence.sentenceid,
             word.spelling,
             lead(word.spelling) OVER (PARTITION BY sentence.sentenceid
                                           ORDER BY item.position)
        FROM sentence 
        JOIN item ON sentence.sentenceid = item.sentenceid
        JOIN word ON item.wordid = word.wordid) AS pairs
 WHERE spelling = 'word1'
   AND lead = 'word2'

修改:也是一般解决方案(允许间隙),但仅限加入:

SELECT DISTINCT sentence.sentenceid
  FROM sentence 
  JOIN item ON sentence.sentenceid = item.sentenceid
  JOIN word ON item.wordid = word.wordid
  JOIN item AS next_item ON sentence.sentenceid = next_item.sentenceid
                        AND next_item.position > item.position
  JOIN word AS next_word ON next_item.wordid = next_word.wordid
  LEFT JOIN item AS mediate_word ON sentence.sentenceid = mediate_word.sentenceid
                                AND mediate_word.position > item.position
                                AND mediate_word.position < next_item.position
 WHERE mediate_word.wordid IS NULL
   AND word.spelling = 'word1'
   AND next_word.spelling = 'word2'

答案 2 :(得分:1)

select
  *
from mytable
where
  round( 0.1 / ts_rank_cd( to_tsvector(mycolumn), to_tsquery('word1 & word2') ) <= 1

这实际上会有效,假设您没有使用A-D重量标签,否则您需要将0.1更改为其他内容。

你也想要添加一个tsvector @@ tsquery where子句。