使用内部查询查询优化

时间:2014-03-25 21:53:26

标签: mysql sql query-optimization

我的查询花了很长时间,并希望在这里提出它,希望我错过了一些东西 - 这是查询(它基本上是在说“给我所有至少有一个职位的资金” )

SELECT org_name.legacy_id,
       org_name.                       name,
       org_desc.description,
       org_name.instrument_style_code,
       org_name.investment_orientation,
       org_name.is_active,
       org_name.organization_id,
       mgr_org.eng_name                as manager_name,
       mgrs.manager_org_id             as manager_organization_id,
       mgrs.manager_legacy_id          as manager_legacy_id
  FROM ownership_organization_names org_name
 INNER JOIN (SELECT fund.legacy_id
               FROM ownership_organization_names fund
              INNER JOIN ownership_ownerships own
                 ON fund.legacy_id = own.legacy_id
               LEFT JOIN ownership_unconsolidated_holding_positions pos
                 ON own.ownership_id = pos.ownership_id
              GROUP BY fund.legacy_id
             HAVING COUNT(pos.holding_position_id) > 0) funds_with_positions
    ON funds_with_positions.legacy_id = org_name.legacy_id
  LEFT JOIN ownership_organization_descriptions org_desc
    on org_name.legacy_id = org_desc.legacy_id
  LEFT JOIN ownership_fund_mgrs mgrs
    on org_name.legacy_id = mgrs.fund_legacy_id
  LEFT JOIN organization mgr_org
    on mgr_org.id = mgrs.manager_org_id

内部查询持续时间为42秒,获取时间为320秒(听起来不正确!)并返回135,683行。

整个查询持续时间为372秒,获取时间为2秒(肯定听起来不对)

以下是查询的解释(350秒持续时间)和格式化(或缺乏)的道歉

1   PRIMARY <derived2>  ALL                 135683  
1   PRIMARY org_name    ref PRIMARY PRIMARY 8   funds_with_positions.legacy_id  22303   
1   PRIMARY org_desc    eq_ref  PRIMARY PRIMARY 8   funds_with_positions.legacy_id  1   
1   PRIMARY mgrs    ref PRIMARY PRIMARY 8   people_directory.org_name.legacy_id 665 
1   PRIMARY mgr_org eq_ref  PRIMARY PRIMARY 8   people_directory.mgrs.manager_org_id    1   
2   DERIVED fund    index   PRIMARY PRIMARY 16      46728   Using index
2   DERIVED own ref legacy_id_idx   legacy_id_idx   9   people_directory.fund.legacy_id 15  Using where
2   DERIVED pos ref ownership_id_idx    ownership_id_idx    9   people_directory.own.ownership_id   3

我已经将每个连接列编入索引,并通过将子查询移动到INNER JOIN而不是WHERE来获得巨大的性能提升。

我也尝试创建一个索引临时表并加入它但是我发现填充它需要360秒 - 它上面的外连接然后变得微不足道(比如1秒)它告诉我内部查询是可怕的没有优化,但我不知道我能做些什么来进一步优化它

我也来自Microsoft SQL背景,但假设所有其他原则都是相同的。我已经看到各种线程讨论改变数据库存储引擎和调整缓冲区大小,但我想看看在使用这些措施之前我是否已经用尽了优化查询本身的所有可能性

更新: 的 最终,最大的性能提升来自于观察到我在内部查询中有一个不必要的连接,从大约360秒减少到大约70秒。然而,尝试一些其他逻辑上等效的优化技术产生了一些有趣的怪癖:

正如所建议的那样,我试过了:

SELECT 
    org_name.legacy_id,
    org_name.`name`,
    org_desc.description,
    org_name.instrument_style_code,
    org_name.investment_orientation,
    org_name.is_active,
    org_name.organization_id,
    mgr_org.eng_name as manager_name,
    mgrs.manager_org_id as manager_organization_id,
    mgrs.manager_legacy_id as manager_legacy_id
FROM ownership_organization_names org_name
INNER JOIN (SELECT own.legacy_id
  FROM ownership_ownerships own 
  WHERE EXISTS (SELECT 1
                FROM ownership_unconsolidated_holding_positions pos
                WHERE own.ownership_id = pos.ownership_id)
 ) funds_with_positions ON funds_with_positions.legacy_id = org_name.legacy_id
LEFT JOIN ownership_organization_descriptions org_desc on org_name.legacy_id = org_desc.legacy_id
LEFT JOIN ownership_fund_mgrs mgrs on org_name.legacy_id = mgrs.fund_legacy_id
LEFT JOIN organization mgr_org on mgr_org.id = mgrs.manager_org_id

MySQL Workbench报告查询持续时间为242.422秒,获取部分超时,客户端返回错误“错误代码:2008 MySQL客户端内存不足”

将WHERE EXISTS样式子查询移动到WHERE子句中最终会返回,但是获取持续时间为0.234秒/ 157.781秒。我怀疑这根本不准确

我很好奇这种将派生表作为子查询移动到WHERE子句中的优化方法背后的想法 - 不会在派生表中先前加入它会减少之前的结果集查询而不是稍后在WHERE子句中?

当然我承认我不熟悉WHERE EXISTS运算符,或者至少我从不认为经常使用它 - 它在性能/内存使用方面与子查询/派生表方法有什么影响我原来有吗?

2 个答案:

答案 0 :(得分:2)

关注子查询:

     (SELECT fund.legacy_id
      FROM ownership_organization_names fund INNER JOIN
           ownership_ownerships own
           ON fund.legacy_id = own.legacy_id LEFT JOIN
           ownership_unconsolidated_holding_positions pos
           ON own.ownership_id = pos.ownership_id
      GROUP BY fund.legacy_id
      HAVING COUNT(pos.holding_position_id) > 0
     ) funds_with_positions

我观察者不需要fund。您可以使用own.legacy_id。并且,left outer join是不必要的。你只是在寻找比赛。这简化了查询:

     (SELECT own.legacy_id
      FROM ownership_ownerships own JOIN
           ownership_unconsolidated_holding_positions pos
           ON own.ownership_id = pos.ownership_id
      GROUP BY own.legacy_id
      HAVING COUNT(*) > 0
     ) funds_with_positions

此查询需要显式聚合,这可能很昂贵。我倾向于尝试以下表现:

     (SELECT own.legacy_id
      FROM ownership_ownerships own 
      WHERE EXISTS (SELECT 1
                    FROM ownership_unconsolidated_holding_positions pos
                    WHERE own.ownership_id = pos.ownership_id
                   )
     ) funds_with_positions

这整个子查询只是用作过滤器。所以,我的最后建议是完全删除子查询并包含以下where子句:

WHERE EXISTS (SELECT 1
              FROM ownership_ownerships own 
              WHERE own.legacy_id = orgname.legacy_id AND
                    EXISTS (SELECT 1
                            FROM ownership_unconsolidated_holding_positions pos
                            WHERE own.ownership_id = pos.ownership_id
                           )
             ) 

我假设这些表都具有正确的处理索引。对于篇幅,您需要ownership_unconsolidated_holding_positions(ownership_id)ownership_ownerships(legacy_id, ownership_id)上的索引。

答案 1 :(得分:1)

假设pos.holding_position_id不是NULLable,只要COUNT(pos.holding_position_id) > 0中有匹配的记录,ownership_unconsolidated_holding_positions就会返回, 所以你不应该真的使用LEFT OUTER JOIN但是明确地依赖于JOIN,因为它会在游戏早期过滤掉事物。正如您对问题的描述已经说明的那样, 子查询仅用于查明是否有可用于给定组织的资金。听起来,你可以更好地使用更具可读性的WHERE EXISTS()。 额外的好处是你不再需要聚合查找以避免双打。 此外,别名fundorg_name都指向同一个表。这是故意的,因为多个记录可以具有相同的legacy_id吗? (很有可能!) 或者两者都会引用相同的记录? 如果后者是真的,你可以进一步优化查询。

SELECT org_name.legacy_id,
       org_name.                       name,
       org_desc.description,
       org_name.instrument_style_code,
       org_name.investment_orientation,
       org_name.is_active,
       org_name.organization_id,
       mgr_org.eng_name                as manager_name,
       mgrs.manager_org_id             as manager_organization_id,
       mgrs.manager_legacy_id          as manager_legacy_id
  FROM ownership_organization_names org_name
  LEFT JOIN ownership_organization_descriptions org_desc
    on org_name.legacy_id = org_desc.legacy_id
  LEFT JOIN ownership_fund_mgrs mgrs
    on org_name.legacy_id = mgrs.fund_legacy_id
  LEFT JOIN organization mgr_org
    on mgr_org.id = mgrs.manager_org_id
 WHERE EXISTS ( SELECT *
                  FROM ownership_organization_names fund
                  JOIN ownership_ownerships own
                    ON fund.legacy_id = own.legacy_id
                  JOIN ownership_unconsolidated_holding_positions pos
                    ON own.ownership_id = pos.ownership_id
                 WHERE funds.legacy_id = org_name.legacy_id )