Question

假设我们有一个图书网站，其中有10万多本书和1.000.000行标签。

用户会经常搜索带有他们喜欢的标签的图书，而同时又没有不喜欢的标签。

什么是搜索频繁用户请求的最佳方法？

假设用户想要标签为15和25的书（书应该有2个标签，而不是任何一个），而用户则不需要标签为50和99且按等级排序的书。按照常识，我们会将结果限制为5，并使用OFFSET进行更多操作。

书籍：

id | rating
1  | 5
2  | 5
3  | 1

tags_books：

book_id | tag_id 
    1   | 15
    1   | 25
    1   | 50
    2   | 15
    2   | 25

P.S。解决方案之一是用总和来进行请求，但是对于频繁请求的大表，按照我的理解，它会很慢：

select b.id from books b 
left join tags_books tb on tb.book_id = b.id 
group by b.id 
having sum(case when tb.tag_id in (1,2,3) then 1 else 0 end) >= 2 
and sum(case when tb.tag_id in (11,12,13) then 1 else 0 end) = 0
ORDER BY b.rating LIMIT 5 OFFSET 0

Answer 1

为此，我建议使用exists而不存在`：

selet b.*
from books b
where exists (select 1 from tags_books tb where tb.book_id = b.id and tb.tag_id = 15
             ) and
      exists (select 1 from tags_books tb where tb.book_id = b.id and tb.tag_id = 25
             ) and
      not exists (select 1 from tags_books tb where tb.book_id = b.id and tb.tag_id in (50, 99)
             ) ;

为了提高性能，您希望在tags_books(book_id, tag_id)上建立索引。

如果您将其表述为聚合，我建议：

select bt.book_id
from book_tags bt
where bt.tag_id in (15, 25, 50, 99)
group by bt.book_id
having count(*) filter (where bt.tag_id in (15, 25)) = 2 and
       count(*) filter (where bt.tag_id in (50, 99)) = 0;

Postgresql多对多表搜索的最佳方法-排除和包含标签

1 个答案: