有一个表(SQL Server 2008 R2)可以保存多个服务器的up / down日志。服务器定期进行ping操作,并将其状态(向上或向下)写入此表。它有这样的结构:
CREATE TABLE StatusLog
(
LogID INT PRIMARY KEY,
ServerID INT,
QueryDate DATETIME,
ServerStatus VARCHAR(50)
)
示例数据
INSERT INTO StatusLog
VALUES
(1, '1724', '2016-04-16 09:28:00.000', 'up'),
(2, '1724', '2016-04-16 09:29:00.000', 'up'),
(3, '1724', '2016-04-16 09:30:00.000', 'down'),
(6, '1724', '2016-04-16 09:31:00.000', 'down'),
(8, '1724', '2016-04-16 09:32:00.000', 'down'),
(9, '1724', '2016-04-16 09:33:00.000', 'down'),
(17, '1724', '2016-04-16 09:33:40.000', 'up'),
(18, '1724', '2016-04-16 09:34:00.000', 'up')
我试图在给定时间段内找到特定服务器的总停机时间。 在上面的数据提取中,ID为1724的服务器的状态变为" down"在09:30:00然后回到" up"在09:33:40,这是220秒的总停机时间。
我的方法是:
我编写了以下脚本,但速度非常慢(每台服务器都有数十万条日志记录)
DECLARE @StartDate DATE = '2016-04-01'
DECLARE @EndDate DATE = '2016-04-30'
DECLARE @ServerID INT = '1724'
;WITH CTE_StatusLog AS
(
SELECT LogID, QueryDate, ServerStatus,
ROW_NUMBER() OVER (ORDER BY QueryDate) AS RN
FROM StatusLog
WHERE ServerID = @ServerID
AND QueryDate BETWEEN @StartDate AND @EndDate
)
SELECT LogID,
QueryDate,
ServerStatus,
RN,
DownStarted = CASE WHEN s1.ServerStatus = 'down'
THEN s1.QueryDate END,
DownEnded = (SELECT TOP 1 QueryDate
FROM CTE_StatusLog AS s2
WHERE s2.QueryDate > s1.QueryDate
AND s1.ServerStatus = 'down'
AND s2.ServerStatus = 'up'
AND (SELECT s3.ServerStatus
FROM CTE_StatusLog AS s3
WHERE s3.RN = s1.RN-1) <> 'down'
ORDER BY s2.QueryDate),
DownDuration = DATEDIFF(SECOND,
CASE WHEN s1.ServerStatus = 'down'
THEN s1.QueryDate END,
(SELECT TOP 1 QueryDate
FROM CTE_StatusLog AS s2
WHERE s2.QueryDate > s1.QueryDate
AND s1.ServerStatus = 'down'
AND s2.ServerStatus = 'up'
AND (SELECT s3.ServerStatus
FROM CTE_StatusLog AS s3
WHERE s3.RN = s1.RN-1) <> 'down'
ORDER BY s2.QueryDate))
FROM CTE_StatusLog AS s1
WHERE QueryDate BETWEEN @StartDate AND @EndDate
ORDER BY s1.RN
输出:
LogID QueryDate ServerStatus RN DownStarted DownEnded DownDuration
1 2016-04-16 09:28:00.000 up 1 NULL NULL NULL
2 2016-04-16 09:29:00.000 up 2 NULL NULL NULL
3 2016-04-16 09:30:00.000 down 3 2016-04-16 09:30:00.000 2016-04-16 09:33:40.000 220
6 2016-04-16 09:31:00.000 down 4 2016-04-16 09:31:00.000 NULL NULL
8 2016-04-16 09:32:00.000 down 5 2016-04-16 09:32:00.000 NULL NULL
9 2016-04-16 09:33:00.000 down 6 2016-04-16 09:33:00.000 NULL NULL
17 2016-04-16 09:33:40.000 up 7 NULL NULL NULL
18 2016-04-16 09:34:00.000 up 8 NULL NULL NULL
如何改进此脚本或是否有更好的方法来计算此表结构的停机时间?
答案 0 :(得分:1)
如果您只需要总停机时间,您可以弄清楚每行代表什么:假设每个下行代表自上次检查该服务器以来停机时间的秒数。然后SUM那些行:
DECLARE @StartDate DATE = '2016-04-01'
DECLARE @EndDate DATE = '2016-04-30'
DECLARE @ServerID INT = '1724'
SELECT
individualRows.ServerId,
individualRows.ServerStatus,
SUM(secondsInState) AS TotalTime
FROM
(Select
statusLog.ServerId,
statusLog.QueryDate,
statusLog.ServerStatus,
DateDiff(second, PreviousStatus.QueryDate, statusLog.QueryDate) as secondsInState
FROM
StatusLog
left outer join
StatusLog AS PreviousStatus
ON StatusLog.ServerId = PreviousStatus.ServerId
AND PreviousStatus.QueryDate < StatusLog.QueryDate
AND PreviousStatus.QueryDate = ( SELECT Max(QueryDate) FROM statusLog sl2 where sl2.ServerId= StatusLog.ServerId and sl2.QueryDate < StatusLog.QueryDate)
WHERE StatusLog.QueryDate > @StartDate
AND StatusLog.QueryDate < @EndDate
AND StatusLog.ServerId = @ServerID ) AS individualRows
GROUP BY
individualRows.ServerId,
individualRows.ServerStatus
如果你确实需要每次中断的时间,我可能会尝试一个临时表,每行与前一行以及相反状态的前一行连接。与您的结果类似。然后我会过滤并聚合那个临时表。
我的经验是,一旦表格获得多行数据,临时表就会比CTE快得多。
答案 1 :(得分:1)
我会通过获得每个下记录的下一个上升时间来实现此目的。在SQL Server 2008中,它使用outer apply
:
select sl.*, slup.querydate as next_update,
datediff(second, sl.querydate, slup.querydate) as down_in_seconds
from statuslog sl outer apply
(select top 1 sl2.*
from statuslog sl2
where sl2.serverid = sl.serverid and
sl2.querydate >= sl.querydate and
sl2.serverstatus = 'up'
order by sl2.querydate asc
) slup
where sl.serverstatus = 'down';
如果你想通过停机时间得到摘要,那么我会使用聚合:
select servid, min(querydate) as down_date, next_update,
max(down_in_seconds)
from (select sl.*, slup.querydate as next_update,
datediff(second, sl.querydate, slup.querydate) as down_in_seconds
from statuslog sl outer apply
(select top 1 sl2.*
from statuslog sl2
where sl2.serverid = sl.serverid and
sl2.querydate >= sl.querydate and
sl2.serverstatus = 'up'
order by sl2.querydate asc
) slup
where sl.serverstatus = 'down'
) slud
group by serverid, next_update;