Question

我有这三个表：

创建表格单词（id integer，word text，freq integer）;
创建表句（id integer，sentence text）;
创建表索引（wordId integer，sentenceId integer，position integer）;

索引是一个倒排索引，表示哪个词出现在哪个句子中。进一步说，我有一个表格单词和句子的id索引。

此查询确定发生给定单词的句子并返回第一个匹配：

select S.sentence from sentences S, words W, index I
where W.word = '#erhoehungen' and W.id = I.wordId and S.id = I.sentenceId
limit 1;

但是当我想要检索一个句子时，两个单词一起出现，如：

select S.sentence from sentences S, words W, index I
where W.word = '#dreikampf' and I.wordId = W.id and S.id = I.sentenceId and
S.id in (
    select S.id from sentences S, words W, index I
    where W.word = 'bruederle' and W.id = I.wordId and S.id = I.sentenceId
)
limit 1;

这个查询要慢得多。加速它有什么诀窍吗？以下我做的事情到目前为止：

将shared_buffer增加到32MB
将work_mem增加到15MB
对所有表格进行分析
如上所述，在单词id和句子id

的问候。

€秩：

以下是explain analyze查询语句的输出：http://pastebin.com/t2M5w4na

这三个create语句实际上是我原来的create语句。我应该在表句和单词中添加主键并将它们作为索引中的外键引用吗？但是我应该为索引表使用什么主键？ SentId和wordId在一起并不是唯一的，即使我添加了表示单词在句子中的位置的pos，它也不是唯一的。

更新为：

创建表格字（id整数，word文本，freq整数，主键（id））;
创建表句（id整数，句子文本，主键（id））;
create table index（wordId integer，sentenceId integer，position integer，foreign key（wordId）引用词（id），外键（sentenceId）引用句子（sentenceId））;

Answer 1

我想这应该更有效率：

SELECT s.id, s.sentence FROM words w
JOIN INDEX i ON w.id = i.wordId
JOIN sentences s ON i.sentenceId = s.id
WHERE w.word IN ('#dreikampf', 'bruederle')
GROUP BY s.id, s.sentence
HAVING COUNT(*) >= 2

确保IN子句中的项目数量与HAVING子句中的项目数量相匹配。

小提琴here。

Answer 2

您似乎没有列wordId，sentenceId上的索引。请创建它们，查询将更快地运行。

CREATE INDEX idx_index_wordId ON index USING btree (wordId);
CREATE INDEX idx_index_sentenceId ON index USING btree (sentenceId);

使用保留字index作为表名不是一个好主意 - 在某些情况下您可能需要将其转义。您可能还应该将列id添加到index表并将其设为主键。

请使用Mosty Mostacho查询，并在制作索引后显示explain analyze输出。可能它可以更快地工作。

更新

请尝试新的查询：

select S.sentence from sentences S where S.id in
(select sentenceId from index I where 
I.wordId in (select id from words where word IN ('#dreikampf', 'bruederle'))
group by I.sentenceId
having count(distinct I.wordId) = 2
limit 1)

postgreSQL嵌套查询执行缓慢

2 个答案: