我正在制作一个查询,允许我按分数订购食谱。
结构是一个传单包含一个或多个flyer_items
,其中可以包含一个或多个ingredients_to_flyer_item
(此表将成分链接到传单项目)。另一个表ingredient_to_recipe
将相同的成分链接到一个或多个配方。最后包含.sql文件的链接。
我想得到recipe_id和作为食谱一部分的每种成分的MAX价格权重的SUM(由ingredient_to_recipe链接),但如果食谱有多个成分属于同一个flyers_item,则应计算一次
SELECT itr.recipe_id,
SUM(itr.weight),
SUM(max_price_weight),
SUM(itr.weight + max_price_weight) AS score
FROM
( SELECT MAX(itf.max_price_weight) AS max_price_weight,
itf.flyer_item_id,
itf.ingredient_id
FROM
(SELECT ifi.ingredient_id,
MAX(i.price_weight) AS max_price_weight,
ifi.flyer_item_id
FROM flyer_items i
JOIN ingredient_to_flyer_item ifi ON i.id = ifi.flyer_item_id
WHERE i.flyer_id IN (1,
2)
GROUP BY ifi.ingredient_id ) itf
GROUP BY itf.flyer_item_id) itf2
JOIN `ingredient_to_recipe` AS itr ON itf2.`ingredient_id` = itr.`ingredient_id`
WHERE recipe_id = 5730
GROUP BY itr.`recipe_id`
ORDER BY score DESC
LIMIT 0,10
查询几乎正常,因为大多数结果都很好,但对于某些行,某些成分会被忽略,并且不会按照应有的分数计算。
| recipe_id | 'score' with current query | what 'score' should be | explanation |
|-----------|----------------------------|------------------------|-----------------------------------------------------------------------------|
| 8376 | 51 | 51 | Good result |
| 3152 | 1 | 18 | Only 1 ingredient having a score of one is counted, should be 4 ingredients |
| 4771 | 41 | 45 | One ingredient worth score 4 is ignored |
| 10230 | 40 | 40 | Good result |
| 8958 | 39 | 39 | Good result |
| 4656 | 28 | 34 | One ingredient worth 6 is ignored |
| 11338 | 1 | 10 | 2 ingredients, worth 4 and 5 are ignored |
我很难找到解释它的简单方法。如果有任何其他方法可以帮助,请告诉我
以下是运行查询,测试示例和测试用例的演示数据库的链接:https://nofile.io/f/F4YSEu8DWmT/meta.zip
非常感谢。
这是我能做到的最远的。在子查询中,结果总是很好,但是,我已经完全取消了'flyer_item_id'组。所以通过这个查询,我得到了好成绩,但是如果配方的许多成分是相同的flyer_item_item,它们将被计数多次(对于recipe_id = 10557而不是好56,得分将是59,因为2个成分值3在flyers_item中。我唯一需要的是每个食谱计算一个MAX(price_weight)每个flyer_item_id(我最初尝试通过'flyer_item_id'对第一个group_by ingredient_id进行分组。
SELECT itr.recipe_id,
SUM(itr.weight) as total_ingredient_weight,
SUM(itf.price_weight) as total_price_weight,
SUM(itr.weight+itf.price_weight) as score
FROM
(SELECT fi1.id, MAX(fi1.price_weight) as price_weight, ingredient_to_flyer_item.ingredient_id as ingredient_id, recipe_id
FROM flyer_items fi1
INNER JOIN (
SELECT flyer_items.id as id, MAX(price_weight) as price_weight, ingredient_to_flyer_item.ingredient_id as ingredient_id
FROM flyer_items
JOIN ingredient_to_flyer_item ON flyer_items.id = ingredient_to_flyer_item.flyer_item_id
GROUP BY id
) fi2 ON fi1.id = fi2.id AND fi1.price_weight = fi2.price_weight
JOIN ingredient_to_flyer_item ON fi1.id = ingredient_to_flyer_item.flyer_item_id
JOIN ingredient_to_recipe ON ingredient_to_flyer_item.ingredient_id = ingredient_to_recipe.ingredient_id
GROUP BY ingredient_to_flyer_item.ingredient_id) AS itf
INNER JOIN `ingredient_to_recipe` AS `itr` ON `itf`.`ingredient_id` = `itr`.`ingredient_id`
GROUP BY `itr`.`recipe_id`
ORDER BY `score` DESC
LIMIT 10
以下是解释,但我不确定它是否有用,因为最后一个工作部分仍然缺失:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | |
|----|-------------|--------------------------|------------|--------|-------------------------------|---------------|---------|-------------------------------------------------------|--------|----------|---------------------------------|---|
| 1 | PRIMARY | itr | NULL | ALL | recipe_id,ingredient_id | NULL | NULL | NULL | 151800 | 100.00 | Using temporary; Using filesort | |
| 1 | PRIMARY | <derived2> | NULL | ref | <auto_key0> | <auto_key0> | 4 | metadata3.itr.ingredient_id | 10 | 100.00 | NULL | |
| 2 | DERIVED | ingredient_to_flyer_item | NULL | ALL | NULL | NULL | NULL | NULL | 249 | 100.00 | Using temporary; Using filesort | |
| 2 | DERIVED | fi1 | NULL | eq_ref | id_2,id,price_weight | id_2 | 4 | metadata3.ingredient_to_flyer_item.flyer_item_id | 1 | 100.00 | NULL | |
| 2 | DERIVED | <derived3> | NULL | ref | <auto_key0> | <auto_key0> | 9 | metadata3.ingredient_to_flyer_item.flyer_item_id,m... | 10 | 100.00 | NULL | |
| 2 | DERIVED | ingredient_to_recipe | NULL | ref | ingredient_id | ingredient_id | 4 | metadata3.ingredient_to_flyer_item.ingredient_id | 40 | 100.00 | NULL | |
| 3 | DERIVED | ingredient_to_flyer_item | NULL | ALL | NULL | NULL | NULL | NULL | 249 | 100.00 | Using temporary; Using filesort | |
| 3 | DERIVED | flyer_items | NULL | eq_ref | id_2,id,flyer_id,price_weight | id_2 | 4 | metadata3.ingredient_to_flyer_item.flyer_item_id | 1 | 100.00 | NULL | |
我设法找到一个有效的查询,但现在我必须加快速度,运行需要500多天。
SELECT sum(ff.price_weight) as price_weight, sum(ff.weight) as weight, sum(ff.price_weight+ff.weight) as score, ff.recipe_id FROM
(
SELECT DISTINCT
itf.flyer_item_id as flyer_item_id,
itf.recipe_id,
itf.weight,
aprice_weight AS price_weight
FROM
(SELECT itfin.flyer_item_id AS flyer_item_id,
itfin.price_weight AS aprice_weight,
itfin.ingredient_id,
itr.recipe_id,
itr.weight
FROM
(SELECT ifi2.flyer_item_id, ifi2.ingredient_id as ingredient_id, MAX(ifi2.price_weight) as price_weight
FROM
ingredient_to_flyer_item ifi1
INNER JOIN (
SELECT id, MAX(price_weight) as price_weight, ingredient_to_flyer_item.ingredient_id as ingredient_id, ingredient_to_flyer_item.flyer_item_id
FROM ingredient_to_flyer_item
GROUP BY ingredient_id
) ifi2 ON ifi1.price_weight = ifi2.price_weight AND ifi1.ingredient_id = ifi2.ingredient_id
WHERE flyer_id IN (1,2)
GROUP BY ifi1.ingredient_id) AS itfin
INNER JOIN `ingredient_to_recipe` AS `itr` ON `itfin`.`ingredient_id` = `itr`.`ingredient_id`
) AS itf
) ff
GROUP BY recipe_id
ORDER BY `score` DESC
LIMIT 20
这是EXPLAIN:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | |
|----|-------------|--------------------------|------------|-------|----------------------------------------------|---------------|---------|---------------------|------|----------|---------------------------------|---|
| 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 1318 | 100.00 | Using temporary; Using filesort | |
| 2 | DERIVED | <derived4> | NULL | ALL | NULL | NULL | NULL | NULL | 37 | 100.00 | Using temporary | |
| 2 | DERIVED | itr | NULL | ref | ingredient_id | ingredient_id | 4 | itfin.ingredient_id | 35 | 100.00 | NULL | |
| 4 | DERIVED | <derived5> | NULL | ALL | NULL | NULL | NULL | NULL | 249 | 100.00 | Using temporary; Using filesort | |
| 4 | DERIVED | ifi1 | NULL | ref | ingredient_id,itx_full,price_weight,flyer_id | ingredient_id | 4 | ifi2.ingredient_id | 1 | 12.50 | Using where | |
| 5 | DERIVED | ingredient_to_flyer_item | NULL | index | ingredient_id,itx_full | ingredient_id | 4 | NULL | 249 | 100.00 | NULL | |
答案 0 :(得分:1)
听起来像“爆炸内爆”。这是查询的JOIN
和GROUP BY
。
JOIN
从连接的表中收集适当的行组合; 然后 GROUP BY
COUNTs
,SUMs
等,为您提供聚合的虚增值。有两种常见的修复方法,都涉及与JOIN
分开进行聚合。
案例1:
SELECT ...
( SELECT SUM(x) FROM t2 WHERE id = ... ) AS sum_x,
...
FROM t1 ...
如果您需要来自t2的多个聚合,那么这种情况会变得笨拙,因为它一次只允许一个聚合。
案例2:
SELECT ...
FROM ( SELECT grp,
SUM(x) AS sum_x,
COUNT(*) AS ct
FROM t2 ) AS s
JOIN t1 ON t1.grp = s.grp
您有2 JOINs
和3 GROUP BYs
,因此我建议您从内到外调试(并重写)您的查询。
SELECT ifi.ingredient_id,
MAX(price_weight) as max_price_weight,
flyer_item_id
from flyer_items i
join ingredient_to_flyer_item ifi ON i.id = ifi.flyer_item_id
where flyer_id in (1, 2)
group by ifi.ingredient_id
但是我无法帮助你,因为你没有通过它所在的表(或别名)来限定price_weight
。(对于其他一些列也是如此。)
(实际上,MAX
和MIN
不会获得夸大的价值; AVG
会得到略微错误的值; COUNT
和SUM
会出现“错误”值。)
因此,我会把剩下的作为“练习”留给读者“。
<强>索引强>
itr: (ingredient_id, recipe_id) -- for the JOIN and WHERE and GROUP BY
itr: (recipe_id, ingredient_id, weight) -- for 1st Update
(There is no optimization available for the ORDER BY and LIMIT)
flyer_items: (flyer_id, price_weight) -- unless flyer_id is the PRIMARY KEY
ifi: (flyer_item_id, ingredient_id)
ifi: (ingredient_id, flyer_item_id) -- for 1st Update
请为相关表格提供`SHOW CREATE TABLE。
请提供EXPLAIN SELECT ...
。
如果ingredient_to_flyer_item
是多个:多个映射表,请按照提示here进行操作。同上ingredient_to_recipe
?
GROUP BY itf.flyer_item_id
可能无效,因为它不包含非聚合的ifi.ingredient_id
。请参阅“only_full_group_by”。
<强>重新配制强>
完成对INDEXes
的评估后,请尝试以下操作。 警告:我不知道它是否能正常工作。
JOIN `ingredient_to_recipe` AS itr ON itf2.`ingredient_id` = itr.`ingredient_id`
到
JOIN ( SELECT recipe_id,
ingredient_id,
SUM(weight) AS sum_weight
FROM ingredient_to_recipe ) AS itr
并将这些计算出的总和更改为SELECT
以替换SUMs
。 (我怀疑我没有正确处理ingredient_id
。)
你在运行什么版本的MySQL / MariaDB?
答案 1 :(得分:1)
我一直想看看这个,但不幸的是,到目前为止还没有时间。我认为这个查询会为您提供所需的结果。
SELECT recipe_id, SUM(weight) AS weight, SUM(max_price_weight) AS price_weight, SUM(weight + max_price_weight) AS score
FROM (SELECT recipe_id, ingredient_id, MAX(weight) AS weight, MAX(price_weight) AS max_price_weight
FROM (SELECT itr.recipe_id, MIN(itr.ingredient_id) AS ingredient_id, MAX(itr.weight) AS weight, fi.id, MAX(fi.price_weight) AS price_weight
FROM ingredient_to_recipe itr
JOIN ingredient_to_flyer_item itfi ON itfi.ingredient_id = itr.ingredient_id
JOIN flyer_items fi ON fi.id = itfi.flyer_item_id
GROUP BY itr.recipe_id, fi.id) ri
GROUP BY recipe_id, ingredient_id) r
GROUP BY recipe_id
ORDER BY score DESC
LIMIT 10
首先按flyer_item_id
分组然后再MIN(ingredient_id)
分组,以考虑配方中具有相同flyer_item_id
的成分。然后它将结果汇总得到你想要的分数。如果我使用
HAVING recipe_id IN (8376, 3152, 4771, 10230, 8958, 4656, 11338)
子句它给出了以下结果,这些结果与你的&#34;得分应该是&#34;上面一栏:
recipe_id weight price_weight score
8376 10 41 51
4771 5 40 45
10230 10 30 40
8958 15 24 39
4656 15 19 34
3152 0 18 18
11338 0 10 10
我不确定此查询在您的系统上执行的速度有多快,它与我的笔记本电脑上的查询相当(我希望它会慢得多)。我非常确定有一些可能的优化措施,但我们还没有时间仔细研究它们。
我希望这能为您提供更多帮助,帮助您找到可行的解决方案。
答案 2 :(得分:0)
我不确定我是否完全理解这个问题。在我看来,你正在按错误的列flyer_items.id
进行分组。您应该按照ingredient_id
列进行分组。如果你这样做,那对我来说更有意义。以下是我的看法:
select
itr.recipe_id,
sum(itr.weight),
sum(max_price_weight),
sum(itr.weight + max_price_weight) as score
from (
select
ifi.ingredient_id,
max(price_weight) as max_price_weight
from flyer_items i
join ingredients_to_flyer_item ifi on i.id = ifi.flyer_item_id
where flyer_id in (1, 2)
group by ifi.ingredient_id
) itf
join `ingredient_to_recipe` as itr on itf.`ingredient_id` = itr.`ingredient_id`
group by itr.`recipe_id`
order by score desc
limit 0,10;
我希望它有所帮助。