从服务器日志表

时间:2016-05-11 14:41:10

标签: sql sql-server-2008-r2 query-performance

有一个表(SQL Server 2008 R2)可以保存多个服务器的up / down日志。服务器定期进行ping操作,并将其状态(向上或向下)写入此表。它有这样的结构:

CREATE TABLE StatusLog
(
  LogID INT PRIMARY KEY,
  ServerID INT,
  QueryDate DATETIME,
  ServerStatus VARCHAR(50)
)

示例数据

INSERT INTO StatusLog
VALUES
(1, '1724', '2016-04-16 09:28:00.000', 'up'),
(2, '1724', '2016-04-16 09:29:00.000', 'up'),
(3, '1724', '2016-04-16 09:30:00.000', 'down'),
(6, '1724', '2016-04-16 09:31:00.000', 'down'),
(8, '1724', '2016-04-16 09:32:00.000', 'down'),
(9, '1724', '2016-04-16 09:33:00.000', 'down'),
(17, '1724', '2016-04-16 09:33:40.000', 'up'),
(18, '1724', '2016-04-16 09:34:00.000', 'up')

我试图在给定时间段内找到特定服务器的总停机时间。 在上面的数据提取中,ID为1724的服务器的状态变为" down"在09:30:00然后回到" up"在09:33:40,这是220秒的总停机时间。

我的方法是:

  1. 对于每个" down block",找到" down"记录并将其QueryDate设置为新列中的下行开始时间。这很快。
  2. 在另一个新专栏中,找到第一个" up"在该开始时间之后记录并将其QueryDate设置为停机时间的结束。这相当快。
  3. 但是,仅对下方块中的第一个向下记录执行此操作,而不对下方块中的其他向下执行此操作,否则您错误地多次计算相同的停机时间。现在要做到这一点,我需要查看行号,这是事情变得混乱和缓慢的地方。
  4. 最后,从彼此中提取它们,并且你有那段时间的停机时间
  5. 总结所有停机时间以查找总停机时间。
  6. 我编写了以下脚本,但速度非常慢(每台服务器都有数十万条日志记录)

    DECLARE @StartDate DATE = '2016-04-01'
    DECLARE @EndDate DATE = '2016-04-30'
    DECLARE @ServerID INT = '1724' 
    
    ;WITH CTE_StatusLog AS 
    (
    SELECT LogID, QueryDate, ServerStatus, 
        ROW_NUMBER() OVER (ORDER BY QueryDate) AS RN 
    FROM StatusLog 
    WHERE ServerID = @ServerID
        AND QueryDate BETWEEN @StartDate AND @EndDate
    )
    
    SELECT LogID, 
           QueryDate,
           ServerStatus,
           RN,
           DownStarted = CASE WHEN s1.ServerStatus = 'down' 
                              THEN s1.QueryDate END,
           DownEnded = (SELECT TOP 1 QueryDate 
                        FROM CTE_StatusLog AS s2 
                        WHERE s2.QueryDate > s1.QueryDate
                        AND s1.ServerStatus = 'down'
                        AND s2.ServerStatus = 'up'
                        AND (SELECT s3.ServerStatus 
                        FROM CTE_StatusLog AS s3 
                        WHERE s3.RN = s1.RN-1) <> 'down'
                    ORDER BY s2.QueryDate),
           DownDuration = DATEDIFF(SECOND, 
                    CASE WHEN s1.ServerStatus = 'down' 
                        THEN s1.QueryDate END, 
                    (SELECT TOP 1 QueryDate 
                    FROM CTE_StatusLog AS s2 
                    WHERE s2.QueryDate > s1.QueryDate
                    AND s1.ServerStatus = 'down'
                    AND s2.ServerStatus = 'up'
                    AND (SELECT s3.ServerStatus 
                        FROM CTE_StatusLog AS s3 
                        WHERE s3.RN = s1.RN-1) <> 'down'
                    ORDER BY s2.QueryDate))
    FROM CTE_StatusLog AS s1
    WHERE QueryDate BETWEEN @StartDate AND @EndDate
    ORDER BY s1.RN
    

    输出:

    LogID   QueryDate               ServerStatus   RN   DownStarted             DownEnded               DownDuration
    1       2016-04-16 09:28:00.000 up             1    NULL                    NULL                    NULL
    2       2016-04-16 09:29:00.000 up             2    NULL                    NULL                    NULL
    3       2016-04-16 09:30:00.000 down           3    2016-04-16 09:30:00.000 2016-04-16 09:33:40.000 220
    6       2016-04-16 09:31:00.000 down           4    2016-04-16 09:31:00.000 NULL                    NULL
    8       2016-04-16 09:32:00.000 down           5    2016-04-16 09:32:00.000 NULL                    NULL
    9       2016-04-16 09:33:00.000 down           6    2016-04-16 09:33:00.000 NULL                    NULL
    17      2016-04-16 09:33:40.000 up             7    NULL                    NULL                    NULL
    18      2016-04-16 09:34:00.000 up             8    NULL                    NULL                    NULL
    

    如何改进此脚本或是否有更好的方法来计算此表结构的停机时间?

2 个答案:

答案 0 :(得分:1)

如果您只需要总停机时间,您可以弄清楚每行代表什么:假设每个下行代表自上次检查该服务器以来停机时间的秒数。然后SUM那些行:

DECLARE @StartDate DATE = '2016-04-01'
DECLARE @EndDate DATE = '2016-04-30'
DECLARE @ServerID INT = '1724'

SELECT
individualRows.ServerId,
individualRows.ServerStatus,
SUM(secondsInState) AS TotalTime
FROM
(Select
statusLog.ServerId,
statusLog.QueryDate,
statusLog.ServerStatus,
DateDiff(second, PreviousStatus.QueryDate, statusLog.QueryDate) as secondsInState
FROM
StatusLog
left outer join
StatusLog AS PreviousStatus
ON StatusLog.ServerId = PreviousStatus.ServerId
AND PreviousStatus.QueryDate < StatusLog.QueryDate
AND PreviousStatus.QueryDate = ( SELECT Max(QueryDate) FROM statusLog sl2 where sl2.ServerId= StatusLog.ServerId and sl2.QueryDate < StatusLog.QueryDate)
WHERE StatusLog.QueryDate > @StartDate
AND StatusLog.QueryDate < @EndDate
AND StatusLog.ServerId = @ServerID ) AS individualRows
GROUP BY
individualRows.ServerId,
individualRows.ServerStatus

如果你确实需要每次中断的时间,我可能会尝试一个临时表,每行与前一行以及相反状态的前一行连接。与您的结果类似。然后我会过滤并聚合那个临时表。

我的经验是,一旦表格获得多行数据,临时表就会比CTE快得多。

答案 1 :(得分:1)

我会通过获得每个下记录的下一个上升时间来实现此目的。在SQL Server 2008中,它使用outer apply

select sl.*, slup.querydate as next_update,
       datediff(second, sl.querydate, slup.querydate) as down_in_seconds
from statuslog sl outer apply
     (select top 1 sl2.*
      from statuslog sl2
      where sl2.serverid = sl.serverid and
            sl2.querydate >= sl.querydate and
            sl2.serverstatus = 'up'
      order by sl2.querydate asc
     ) slup
where sl.serverstatus = 'down';

如果你想通过停机时间得到摘要,那么我会使用聚合:

select servid, min(querydate) as down_date, next_update, 
       max(down_in_seconds)
from (select sl.*, slup.querydate as next_update,
             datediff(second, sl.querydate, slup.querydate) as down_in_seconds
      from statuslog sl outer apply
           (select top 1 sl2.*
            from statuslog sl2
            where sl2.serverid = sl.serverid and
                  sl2.querydate >= sl.querydate and
                  sl2.serverstatus = 'up'
            order by sl2.querydate asc
           ) slup
      where sl.serverstatus = 'down'
     ) slud
group by serverid, next_update;