在多对多相关表中进行高效搜索

时间:2012-02-11 17:25:53

标签: mysql many-to-many innodb sql-optimization

我通过第三个连接表有两个与多对多相关的表:产品和类别。每种产品可以分为几类。这是典型的多对多关系:

products
-------------
id
product_name


categories
-------------
id
category_name


products_to_categories
-------------
product_id
caregory_id

我希望允许用户搜索某些所选类别中的产品,而不是同时选择其他类别的产品。

示例:查找所有类别中的所有产品"计算机"和"软件",但不属于类别"游戏","编程"和"教育"。

以下是我设计的查询:

SELECT product_name
FROM products
WHERE
    EXISTS (SELECT product_id FROM products_to_categories WHERE category_id = 1 AND product_id = products.id) 
    AND EXISTS (SELECT product_id FROM products_to_categories WHERE category_id = 2 AND product_id = products.id) 
    AND NOT EXISTS (SELECT product_id FROM products_to_categories WHERE category_id = 3 AND product_id = products.id)
    AND NOT EXISTS (SELECT product_id FROM products_to_categories WHERE category_id = 4 AND product_id = products.id) 
    AND NOT EXISTS (SELECT product_id FROM products_to_categories WHERE category_id = 5 AND product_id = products.id)
ORDER BY id

有效。但它的速度非常慢,以至于我无法在生产中使用它。所有的idex都已到位,但是这个查询会产生5个相关的子查询,表格很大。

有没有办法在没有依赖子查询的情况下解决相同的任务或以其他方式优化此查询?

更新

索引是:

products: PRIMARY KEY (id)
categories: PRIMARY KEY (id)
products_to_categories: PRIMARY KEY (product_id, caregory_id)

所有表格都是InnoDB

4 个答案:

答案 0 :(得分:2)

请发布表的定义(以便显示使用的引擎和定义的索引)。

您还可以发布查询的执行计划(使用EXPLAIN语句)。

您还可以尝试以各种方式重写查询。这是一个:

SELECT p.product_name
FROM products  AS p
  JOIN products_to_categories  AS pc1
    ON pc1.category_id = 1 
    AND pc1.product_id = p.id
  JOIN products_to_categories  AS pc2
    ON  pc2.category_id = 2 
    AND pc2.product_id = p.id
WHERE
    NOT EXISTS 
    ( SELECT * 
      FROM products_to_categories  AS pc 
      WHERE pc.category_id IN (3, 4, 5)
        AND pc.product_id = p.id
    )

更新:您没有(category_id, product_id)索引。尝试添加它。

答案 1 :(得分:0)

SELECT product_name
FROM products
-- we can use an inner join as an optimization, as some categories MUST exist
INNER JOIN products_to_categories ON products.product_id=products_to_categories.product_id
WHERE 
  products_to_categories.category_id NOT IN (3,4,5) -- substitute unwanted category IDs
  AND EXISTS (SELECT product_id FROM products_to_categories WHERE category_id = 1 AND product_id = products.id) 
  AND EXISTS (SELECT product_id FROM products_to_categories WHERE category_id = 2 AND product_id = products.id) 

答案 2 :(得分:0)

我删除了我的答案,因为其他答案更全面。只是一般提示。要减少语句中AND的数量,可以使用IN运算符检查多个类别

where category_id IN(1,2)

where category_id NOT IN(1,2)

答案 3 :(得分:0)

我认为你想避免使用in条款,因为SQL服务器会执行多个查询或执行"或",这将比我在下面粘贴的效率低,因为它可能无法利用索引。

您还可以删除#product_categories_filtered临时表,并在一个大查询中完成所有操作,并根据需要使用别名子查询。您可能想要使用不同的配置并查看哪一个最好,但临时表在我的应用程序中从未成为性能问题,除非有人试图查询具有数十亿条记录的内容。我使用了#product_categories_filtered,因为在某些情况下,当您将查询中断以使用较少的连接时,SQL Server查询运行得更好,尤其是在product之类的较大表上。

create table #includes (category_id int not null primary key)
create table #excludes (category_id int not null primary key)

insert #includes (category_id) 
    select 1
    union all select 2
insert #excludes (category_id) 
    select 3
    union all select 4
    union all select 5

select 
  pc.product_id
into #product_catories_filtered
from 
  product_categories pc
  join #includes i 
    on pc.category_id = i.category_id
  left join #excludes e 
    on pc.category_id = i.category_id
where 
  e.category_id is null


select distinct
  p.product_name
from 
  #product_categories_filtered pc
  join products p
    on pc.product_id = p.id
order by 
  p.id