我的查询花了很长时间,并希望在这里提出它,希望我错过了一些东西 - 这是查询(它基本上是在说“给我所有至少有一个职位的资金” )
SELECT org_name.legacy_id,
org_name. name,
org_desc.description,
org_name.instrument_style_code,
org_name.investment_orientation,
org_name.is_active,
org_name.organization_id,
mgr_org.eng_name as manager_name,
mgrs.manager_org_id as manager_organization_id,
mgrs.manager_legacy_id as manager_legacy_id
FROM ownership_organization_names org_name
INNER JOIN (SELECT fund.legacy_id
FROM ownership_organization_names fund
INNER JOIN ownership_ownerships own
ON fund.legacy_id = own.legacy_id
LEFT JOIN ownership_unconsolidated_holding_positions pos
ON own.ownership_id = pos.ownership_id
GROUP BY fund.legacy_id
HAVING COUNT(pos.holding_position_id) > 0) funds_with_positions
ON funds_with_positions.legacy_id = org_name.legacy_id
LEFT JOIN ownership_organization_descriptions org_desc
on org_name.legacy_id = org_desc.legacy_id
LEFT JOIN ownership_fund_mgrs mgrs
on org_name.legacy_id = mgrs.fund_legacy_id
LEFT JOIN organization mgr_org
on mgr_org.id = mgrs.manager_org_id
内部查询持续时间为42秒,获取时间为320秒(听起来不正确!)并返回135,683行。
整个查询持续时间为372秒,获取时间为2秒(肯定听起来不对)
以下是查询的解释(350秒持续时间)和格式化(或缺乏)的道歉
1 PRIMARY <derived2> ALL 135683
1 PRIMARY org_name ref PRIMARY PRIMARY 8 funds_with_positions.legacy_id 22303
1 PRIMARY org_desc eq_ref PRIMARY PRIMARY 8 funds_with_positions.legacy_id 1
1 PRIMARY mgrs ref PRIMARY PRIMARY 8 people_directory.org_name.legacy_id 665
1 PRIMARY mgr_org eq_ref PRIMARY PRIMARY 8 people_directory.mgrs.manager_org_id 1
2 DERIVED fund index PRIMARY PRIMARY 16 46728 Using index
2 DERIVED own ref legacy_id_idx legacy_id_idx 9 people_directory.fund.legacy_id 15 Using where
2 DERIVED pos ref ownership_id_idx ownership_id_idx 9 people_directory.own.ownership_id 3
我已经将每个连接列编入索引,并通过将子查询移动到INNER JOIN而不是WHERE来获得巨大的性能提升。
我也尝试创建一个索引临时表并加入它但是我发现填充它需要360秒 - 它上面的外连接然后变得微不足道(比如1秒)它告诉我内部查询是可怕的没有优化,但我不知道我能做些什么来进一步优化它
我也来自Microsoft SQL背景,但假设所有其他原则都是相同的。我已经看到各种线程讨论改变数据库存储引擎和调整缓冲区大小,但我想看看在使用这些措施之前我是否已经用尽了优化查询本身的所有可能性
的更新: 的 最终,最大的性能提升来自于观察到我在内部查询中有一个不必要的连接,从大约360秒减少到大约70秒。然而,尝试一些其他逻辑上等效的优化技术产生了一些有趣的怪癖:
正如所建议的那样,我试过了:
SELECT
org_name.legacy_id,
org_name.`name`,
org_desc.description,
org_name.instrument_style_code,
org_name.investment_orientation,
org_name.is_active,
org_name.organization_id,
mgr_org.eng_name as manager_name,
mgrs.manager_org_id as manager_organization_id,
mgrs.manager_legacy_id as manager_legacy_id
FROM ownership_organization_names org_name
INNER JOIN (SELECT own.legacy_id
FROM ownership_ownerships own
WHERE EXISTS (SELECT 1
FROM ownership_unconsolidated_holding_positions pos
WHERE own.ownership_id = pos.ownership_id)
) funds_with_positions ON funds_with_positions.legacy_id = org_name.legacy_id
LEFT JOIN ownership_organization_descriptions org_desc on org_name.legacy_id = org_desc.legacy_id
LEFT JOIN ownership_fund_mgrs mgrs on org_name.legacy_id = mgrs.fund_legacy_id
LEFT JOIN organization mgr_org on mgr_org.id = mgrs.manager_org_id
MySQL Workbench报告查询持续时间为242.422秒,获取部分超时,客户端返回错误“错误代码:2008 MySQL客户端内存不足”
将WHERE EXISTS样式子查询移动到WHERE子句中最终会返回,但是获取持续时间为0.234秒/ 157.781秒。我怀疑这根本不准确
我很好奇这种将派生表作为子查询移动到WHERE子句中的优化方法背后的想法 - 不会在派生表中先前加入它会减少之前的结果集查询而不是稍后在WHERE子句中?
当然我承认我不熟悉WHERE EXISTS运算符,或者至少我从不认为经常使用它 - 它在性能/内存使用方面与子查询/派生表方法有什么影响我原来有吗?
答案 0 :(得分:2)
关注子查询:
(SELECT fund.legacy_id
FROM ownership_organization_names fund INNER JOIN
ownership_ownerships own
ON fund.legacy_id = own.legacy_id LEFT JOIN
ownership_unconsolidated_holding_positions pos
ON own.ownership_id = pos.ownership_id
GROUP BY fund.legacy_id
HAVING COUNT(pos.holding_position_id) > 0
) funds_with_positions
我观察者不需要fund
。您可以使用own.legacy_id
。并且,left outer join
是不必要的。你只是在寻找比赛。这简化了查询:
(SELECT own.legacy_id
FROM ownership_ownerships own JOIN
ownership_unconsolidated_holding_positions pos
ON own.ownership_id = pos.ownership_id
GROUP BY own.legacy_id
HAVING COUNT(*) > 0
) funds_with_positions
此查询需要显式聚合,这可能很昂贵。我倾向于尝试以下表现:
(SELECT own.legacy_id
FROM ownership_ownerships own
WHERE EXISTS (SELECT 1
FROM ownership_unconsolidated_holding_positions pos
WHERE own.ownership_id = pos.ownership_id
)
) funds_with_positions
这整个子查询只是用作过滤器。所以,我的最后建议是完全删除子查询并包含以下where
子句:
WHERE EXISTS (SELECT 1
FROM ownership_ownerships own
WHERE own.legacy_id = orgname.legacy_id AND
EXISTS (SELECT 1
FROM ownership_unconsolidated_holding_positions pos
WHERE own.ownership_id = pos.ownership_id
)
)
我假设这些表都具有正确的处理索引。对于篇幅,您需要ownership_unconsolidated_holding_positions(ownership_id)
和ownership_ownerships(legacy_id, ownership_id)
上的索引。
答案 1 :(得分:1)
假设pos.holding_position_id
不是NULLable,只要COUNT(pos.holding_position_id) > 0
中有匹配的记录,ownership_unconsolidated_holding_positions
就会返回,
所以你不应该真的使用LEFT OUTER JOIN
但是明确地依赖于JOIN,因为它会在游戏早期过滤掉事物。正如您对问题的描述已经说明的那样,
子查询仅用于查明是否有可用于给定组织的资金。听起来,你可以更好地使用更具可读性的WHERE EXISTS()
。
额外的好处是你不再需要聚合查找以避免双打。
此外,别名fund
和org_name
都指向同一个表。这是故意的,因为多个记录可以具有相同的legacy_id吗? (很有可能!)
或者两者都会引用相同的记录?
如果后者是真的,你可以进一步优化查询。
SELECT org_name.legacy_id,
org_name. name,
org_desc.description,
org_name.instrument_style_code,
org_name.investment_orientation,
org_name.is_active,
org_name.organization_id,
mgr_org.eng_name as manager_name,
mgrs.manager_org_id as manager_organization_id,
mgrs.manager_legacy_id as manager_legacy_id
FROM ownership_organization_names org_name
LEFT JOIN ownership_organization_descriptions org_desc
on org_name.legacy_id = org_desc.legacy_id
LEFT JOIN ownership_fund_mgrs mgrs
on org_name.legacy_id = mgrs.fund_legacy_id
LEFT JOIN organization mgr_org
on mgr_org.id = mgrs.manager_org_id
WHERE EXISTS ( SELECT *
FROM ownership_organization_names fund
JOIN ownership_ownerships own
ON fund.legacy_id = own.legacy_id
JOIN ownership_unconsolidated_holding_positions pos
ON own.ownership_id = pos.ownership_id
WHERE funds.legacy_id = org_name.legacy_id )