我需要基于消息日志构建一个kpi报告。查看消息流是否及时正确发送。
数据示例:
CREATE TABLE PackageFlow (
[Package] NVARCHAR(6),
[message] NVARCHAR(3),
[Date_time] DATETIME
);
INSERT INTO PackageFlow VALUES
Package message Date_time
(N'10',N'112','1-1-2019 01:00'),
(N'10',N'115','2-1-2019 01:00'),
(N'10',N'117','3-1-2019 01:00'),
(N'10',N'25','4-1-2019 01:00'),
(N'10',N'26','5-1-2019 01:00'),
(N'10',N'27','6-1-2019 01:00'),
(N'10',N'44','7-1-2019 01:00'),
(N'10',N'112','8-1-2019 01:00'),
(N'10',N'117','10-1-2019 01:00'),
(N'10',N'25','11-1-2019 01:00'),
(N'10',N'26','12-1-2019 01:00'),
(N'10',N'27','13-1-2019 01:00'),
(N'10',N'44','14-1-2019 01:00'),
(N'10',N'112','15-1-2019 01:00'),
(N'10',N'115','16-1-2019 01:00'),
(N'10',N'117','17-1-2019 01:00'),
(N'10',N'25','18-1-2019 01:00'),
(N'10',N'26','19-1-2019 01:00'),
(N'10',N'27','20-1-2019 01:00'),
(N'10',N'44','21-1-2019 01:00');
因为缺少消息,我没有得到很好的kpi
那么如何处理缺失值。每天有超过5万条消息,每月kpi都超过了。
答案 0 :(得分:1)
您的问题不是100%清楚,屏幕截图也无济于事。也就是说,您似乎正在寻找一种处理packageFlow表中缺少日期的方法。您的样本数据中缺少1月9日。这样可以使用tally table填写缺失的数字,日期等。注意我的评论。
WITH
dt(Mn,Mx,Df) AS -- 1. get the oldest and newest date, and the number of days between them
(
SELECT MIN(pf.date_time), MAX(pf.date_time), DATEDIFF(DAY,MIN(pf.date_time), MAX(pf.date_time))
FROM dbo.PackageFlow AS pf
),
iTally(N) AS -- 2. Build a "tally table" table (aka "numbers table")
(
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT 1))
FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS a(x)
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS b(x)
CROSS JOIN (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) AS c(x)
),
cal AS -- 3. use your tally table to build a calendar table
(
SELECT i.N, dt = CAST(DATEADD(DAY,i.N-1,dt.Mn) AS DATE)
FROM iTally AS i
CROSS JOIN dt
WHERE i.N <= dt.Df+1
) -- 4. Left join your calendar table to dbo.packageFlow
SELECT date_time = ISNULL(pf.date_time,cal.dt), pf.package, pf.[message]
FROM cal
LEFT JOIN dbo.packageFlow AS pf ON cal.dt = CAST(pf.date_time AS DATE);
返回:
date_time package message
----------------------- ------- -------
2019-01-01 01:00:00.000 10 112
2019-01-02 01:00:00.000 10 115
2019-01-03 01:00:00.000 10 117
2019-01-04 01:00:00.000 10 25
2019-01-05 01:00:00.000 10 26
2019-01-06 01:00:00.000 10 27
2019-01-07 01:00:00.000 10 44
2019-01-08 01:00:00.000 10 112
2019-01-09 00:00:00.000 NULL NULL <<-- MISSING VALUE filled in
2019-01-10 01:00:00.000 10 117
2019-01-11 01:00:00.000 10 25
2019-01-12 01:00:00.000 10 26
2019-01-13 01:00:00.000 10 27
2019-01-14 01:00:00.000 10 44
请注意,我们现在如何返回缺少日期的行。您可以使用ISNULL
处理该日期返回的NULL值。
为了提高性能,您需要在date_time列上建立索引。使用该索引后,上面的查询将产生一个出色的计划...除了连接日期/日期时间值所需的哈希匹配外。为此,除非您需要该级别的粒度,否则我将考虑将您的列更改为日期列(而不是日期时间)。
建议索引:
CREATE CLUSTERED INDEX cl_packageflow ON dbo.packageFlow(date_time);