I have an application which contains a many-to-many relationship. I have a need to select all rows from one table that are associated to all rows from a variable set in the other table.
For example, I would need to select all foo
entities that are associated to bar
entities A
, B
, C
and E
. The user could select 1, 5, 12 or 50 bar
entities to filter the foo
entities by
The relevant fields from the tables: (ids are using uuid)
/* ~20k rows */
CREATE TABLE `foo` (
`id` char(36) COLLATE utf8_unicode_ci NOT NULL,
`title` text COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`),
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
/* ~30k rows */
CREATE TABLE `bar` (
`id` char(36) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
/* ~150k rows */
CREATE TABLE `foo_bar` (
`id` char(36) COLLATE utf8_unicode_ci NOT NULL,
`foo_id` char(36) COLLATE utf8_unicode_ci DEFAULT NULL,
`bar_id` char(36) COLLATE utf8_unicode_ci DEFAULT NULL,
PRIMARY KEY (`id`),
KEY `foo_id_foreign` (`foo_id`),
KEY `bar_id_foreign` (`bar_id`),
CONSTRAINT `bar_id_foreign` FOREIGN KEY (`bar_id`)
REFERENCES `bar` (`id`) ON DELETE CASCADE,
CONSTRAINT `foo_id_foreign` FOREIGN KEY (`foo_id`)
REFERENCES `foo` (`id`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
I have tried two different approaches that I have seen from different SO answers: multiple joins and a subquery. Multiple joins seems to work reasonably well, but doesn't seem like it would be highly scalable. Running a subquery seems like it should scale better but runs for hours.
Multi join. It works, but every additional join increases the elapsed time exponentially, as expected. 3 bar
s takes around 800ms, which is definitely high. The explain looks reasonable.
select `foo`.*
from `foo`
inner join foo_bar `fb1` on `fb1`.`foo_id` = `foo`.`id`
inner join bar `b1` on `b1`.`id` = `fb1`.`bar_id` AND `b1`.`id` = :some_uuid1
inner join foo_bar `fb2` on `fb2`.`foo_id` = `foo`.`id`
inner join bar `b2` on `b2`.`id` = `fb2`.`bar_id` AND `b2`.`id` = :some_uuid2
inner join foo_bar `fb3` on `fb3`.`foo_id` = `foo`.`id`
inner join bar `b3` on `b3`.`id` = `fb3`.`bar_id` AND `b3`.`id` = :some_uuid3
group by `foo`.`id`
order by `foo`.`title` asc
limit 25 offset 0
Subquery. Runs indefinitely. Same effect with where in (subquery)
as inner join subquery
, though the explain ends up looking a little different.
select `foo`.*
from `foo`
inner join (
select `foo_id`
from `foo_bar`
inner join `bar`
on `bar`.`id` = `foo_bar`.`bar_id`
where `bar`.`id` in (:some_uuid1, :some_uuid2, :some_uuid3)
group by `foo_id`
having COUNT(*) = 3
) as `subset` on `foo`.`id` = `subset`.`foo_id`
order by `foo`.`title` asc
limit 25 offset 0
explain:
id select_type table type key key_len rows extra
1 PRIMARY derived ALL NULL NULL 6618 Using temporary; Using filesort
1 PRIMARY foo eq_ref PRIMARY 108 1
2 DERIVED bar const PRIMARY 108 1 Using index; Using temporary; Using filesort
2 DERIVED foo_bar ref bar_id_foreign 109 16094 Using where
My question is are there any optimizations that I could apply to make this situation usable and scalable?
答案 0 :(得分:1)
Your normalization is fine. It's good that you have a joining table foo_bar
to handle the many to many relationship.
As far as optimizing your JOIN
, you don't need to add a new join each time you want to check for a new id, you can use the IN operator:
INNER JOIN foo_bar fb1 ON fb1.foo_id = foo.id AND fb1.id
IN (some_uuid1, some_uuid2, some_uuid3);
Then, if you want to get rows where it matches all three of those, the whole query would look something like this:
SELECT foo.id, foo.title
FROM foo
INNER JOIN foo_bar fb ON fb.foo_id = foo.id AND fb.id IN (some_uuid1, some_uuid2, some_uuid3)
GROUP BY foo.id
HAVING COUNT(*) = 3
ORDER BY foo.title
LIMIT 25;