超慢MySQL - 需要帮助!

时间:2010-09-19 12:06:37

标签: sql database mysql

我有一个超慢查询,我在这里发布:http://pastebin.com/E5sdRi7e。当我做一个EXPLAIN时,我得到了以下内容:

id  select_type         table       type    possible_keys  key           key_len  ref                                 rows  Extra
1   PRIMARY             <derived2>  ALL     NULL           NULL          NULL     NULL                                5     Using filesort
2   DERIVED             Workflow    ALL     PRIMARY        NULL          NULL     NULL                                9     Using temporary; Using filesort
2   DERIVED             <derived3>  ALL     NULL           NULL          NULL     NULL                                141   Using where; Using join buffer
2   DERIVED             DataSource  ALL     PRIMARY        NULL          NULL     NULL                                1310  Using where; Using join buffer
2   DERIVED             <derived4>  ALL     NULL           NULL          NULL     NULL                                1310  Using where; Using join buffer
2   DERIVED             User        eq_ref  PRIMARY        PRIMARY       4        LatestDataSourceActivityLog.UserId  1
4   DERIVED             t1          ALL     NULL           NULL          NULL     NULL                                5400  Using where; Using temporary; Using filesort
5   DEPENDENT SUBQUERY  t2          ref     DataSourceId   DataSourceId  4        companyname_db.t1.DataSourceId      4
3   DERIVED             DataSource  range   PRIMARY        PRIMARY       4        NULL                                142   Using where

上表告诉我什么?它是否有助于我确定哪些字段应编入索引?

非常感谢任何帮助。

查询

SELECT WrappedData.*
FROM   (SELECT ParentLeafNodeDataSource.Id,
               LatestDataSourceActivityLog.UserId,
               DataSource.Status AS StatusCode,
               ( CASE
                   WHEN User.Name IS NULL THEN 'CompanyName'
                   ELSE User.Name
                 END )           AS `Username`,
               Workflow.Name     AS WorkflowName,
               LatestDataSourceActivityLog.Timestamp
        FROM   DataSource,
               Workflow,
               (SELECT *
                FROM   DataSource
                WHERE  DataSource.Id IN ( 0, 1, 2, 3,
                                          4, 5, 6, 7,
                                          8, 9, 10, 11,
                                          12, 13, 16, 21,
                                          22, 23, 24, 25,
                                          26, 27, 28, 29,
                                          30, 31, 32, 33,
                                          34, 35, 36, 37,
                                          38, 39, 40, 41,
                                          42, 43, 44, 45,
                                          46, 47, 48, 49,
                                          50, 51, 52, 53,
                                          54, 55, 56, 57,
                                          58, 59, 60, 61,
                                          62, 63, 64, 65,
                                          66, 67, 68, 69,
                                          70, 71, 72, 73,
                                          74, 75, 76, 77,
                                          78, 79, 80, 81,
                                          83, 84, 85, 86,
                                          87, 88, 89, 90,
                                          91, 92, 93, 94,
                                          95, 96, 97, 98,
                                          99, 100, 101, 102,
                                          103, 104, 105, 106,
                                          107, 108, 109, 110,
                                          111, 112, 113, 114,
                                          115, 116, 117, 118,
                                          119, 120, 142, 1293,
                                          1294, 1295, 1296, 1297,
                                          1298, 1299, 143, 1300,
                                          1301, 1302, 1303, 1304,
                                          1305, 1306, 144, 146,
                                          145, 1307, 1308, 1309,
                                          1310, 147, 149, 148,
                                          150, 151 )) AS ParentLeafNodeDataSource,
               (SELECT t1.*
                FROM   DataSourceActivityLog AS t1
                WHERE  Timestamp = (SELECT Max(t2.Timestamp)
                                    FROM   DataSourceActivityLog AS t2
                                    WHERE  t1.DataSourceId = t2.DataSourceId)
                GROUP  BY t1.DataSourceId) AS LatestDataSourceActivityLog
               LEFT JOIN User
                 ON User.Id = LatestDataSourceActivityLog.UserId
        WHERE  ParentLeafNodeDataSource.Status = '203'
                OR ParentLeafNodeDataSource.Status = '204'
                   AND Workflow.Id = ParentLeafNodeDataSource.WorkflowId
                   AND LatestDataSourceActivityLog.DataSourceId = ParentLeafNodeDataSource.Id
                   AND DataSource.Id = LatestDataSourceActivityLog.DataSourceId
                   AND LatestDataSourceActivityLog.UserId = 1
        GROUP  BY ParentLeafNodeDataSource.Id) AS WrappedData
ORDER  BY WrappedData.`Timestamp` DESC

3 个答案:

答案 0 :(得分:2)

很难确切地说,但这里有几个重构的东西。

关于性能,首先要看的是GROUP函数。

           (SELECT t1.*
            FROM   DataSourceActivityLog AS t1
            WHERE  Timestamp = (SELECT Max(t2.Timestamp)
                                FROM   DataSourceActivityLog AS t2
                                WHERE  t1.DataSourceId = t2.DataSourceId)
            GROUP  BY t1.DataSourceId) AS LatestDataSourceActivityLog

这可以完全消除MAX的使用

           (SELECT t1.*
            FROM   DataSourceActivityLog AS t1
            WHERE  Timestamp = (SELECT t2.Timestamp
                                FROM   DataSourceActivityLog AS t2
                                WHERE  t1.DataSourceId = t2.DataSourceId
                                ORDER BY t2.Timestamp DESC
                                LIMIT 1)
            GROUP  BY t1.DataSourceId) AS LatestDataSourceActivityLog

可能不是一个大的性能问题,但在这里你可以使用IFNULL或COALESCE而不是CASE。

( CASE
    WHEN User.Name IS NULL THEN 'CompanyName'
    ELSE User.Name
END )

相反

( IFNULL(User.Name,'CompanyName' )

就索引而言,它们通过简化查找来提高SELECT性能,但由于索引也必须更新,因此它们会降低写入操作的速度。如果您的应用程序没有大量写入,则应该对常用的列进行索引,尤其是在大型表中。

在这个查询中,看起来你可以通过向DataSourceId添加索引来获益,但我无法测试是否有任何收益。主键已经被编入索引。

答案 1 :(得分:1)

我会尝试以下方法:

  • 外部包装器完全没用,将ORDER BY放在内部查询中应该是相同的
  • 尝试重写子查询以用作JOIN的
  • 然后将WHERE子句移动到相关的JOINS,以便中间结果集变小
  • 查看WHERE和JOIN应该制作哪些索引。

快速尝试(我不确定结果是否相同)

SELECT
    dsa.Status AS StatusCode,
    dsb.Id,
    dsl.UserId,
    dsl.Timestamp
    wf.Name AS WorkflowName,
    COALESCE(u.Name, 'CompanyName') AS `Username`
FROM 
    DataSource dsa
    INNER JOIN DataSource dsb
        ON  dsb.Id IN ( 0, 1, 2, 3, 4, 5, 6, 7, etc ))
        AND dsb.Status = '203' OR dsb.Status = '204'
    INNER JOIN DataSourceActivityLog dsl
        ON  dsl.DataSourceId=dsa.Id
        AND dsl.DataSourceId=dsb.Id
        AND dsl.UserId = 1
        AND dsl.Timestamp=(
            SELECT MAX(t2.Timestamp)
            FROM   DataSourceActivityLog AS dslt
            WHERE  dslt.DataSourceId = dsl.DataSourceId
        )
    INNER JOIN Workflow wf
                   ON  wf.Id = dsb.WorkflowId
    LEFT JOIN User u
        ON u.Id = dsl.UserId
GROUP  BY
    dsl.Id
ORDER  BY
    dsl.Timestamp DESC

也许使用Zurahn的重构来摆脱子查询中的GROUP BY

索引为:

  • DataSource.WorkFlowId,DataSource.Status
  • DataSourceActivityLog.Timestamp,DataSourceActivityLog.UserId,DataSourceActivityLog.DataSourceId

实际上,我得出结论,dsb(最初是ParentLeafNodeDataSource)实际上是数据的来源,这可以填充WHERE子句。我个人尝试从数据源开始,然后加入其余的数据。这通常会导致查询很容易理解实际选择的内容。而不是最后的JOIN突然减少了结果集。因此重新排序JOIN可以做到这一点,它将类似于:

SELECT
    dsa.Status AS StatusCode,
    dsb.Id,
    dsl.UserId,
    dsl.Timestamp
    wf.Name AS WorkflowName,
    COALESCE(u.Name, 'CompanyName') AS `Username`
FROM 
    DataSource dsb
    INNER JOIN Workflow wf
        ON  dsb.WorkflowId=wf.Id
    INNER JOIN DataSourceActivityLog dsl
        ON  dsl.DataSourceId=dsb.Id
        AND dsl.UserId=1
        AND dsl.Timestamp=(
            SELECT MAX(t2.Timestamp)
            FROM   DataSourceActivityLog AS dslt
            WHERE  dslt.DataSourceId = dsl.DataSourceId
        )
    INNER JOIN DataSource dsa
        ON  dsl.DataSourceId=dsa.Id
    LEFT JOIN User u
        ON dsl.UserId=u.Id
WHERE
    dsb.Id IN ( 0, 1, 2, 3, 4, 5, 6, 7, etc ))
    AND dsb.Status = '203' OR dsb.Status = '204'
GROUP  BY
    dsl.Id
ORDER  BY
    dsl.Timestamp DESC

答案 2 :(得分:0)

您是否考虑过MySql Query Profiler

这是您了解性能问题的方法。

如果没有这一步,大多数人会遗憾地更喜欢在你的查询上写笑话而不是试图帮助你。