查找数据中的遗失日期

时间:2016-09-22 16:05:20

标签: sql sql-server missing-data gaps-and-islands sql-server-2016

我从多个天线收到每日txt数据文件。文件的命名约定是:

独特的天线ID +年+月+日+随机3位数

我解析了文件名,并创建了一个像这样的表:

AntennaID    fileyear    filemonth    fileday    filenumber     filename
0000         2016        09           22         459            000020160922459.txt
0000         2016        09           21         981            000020160921981.txt
0000         2016        09           20         762            000020160920762.txt
0001         2016        09           22         635            000120160922635.txt
.
.
.
etc. (200k rows)

有时天线会发送多个文件或根本不发送文件。如果发送的文件超过1,则唯一的3位数文件编号会区分文件,但我会尝试查找文件未发送的日期。

我尝试了几个groupby语句来比较给定月份的数据文件数量,看看它是否与该月份的天数相匹配 - 但问题是有时候天线会发送超过1个文件每天可以人为地弥补“失踪”#34;文件,如果我们只是比较计数。

我正在寻找一种更健壮的方法来查找丢失文件的日期或日期范围。我已经查看了Partition和Over函数,觉得可能存在这些功能,但我不确定如何使用它们,因为我对SQL很新。

我使用的是Microsoft SQL Server 2016

2 个答案:

答案 0 :(得分:6)

您可以使用common table expression(或简称cte)来创建日期表。然后,您可以从此表join到天线数据并查找返回null值的日期:

declare @MinDate date = getdate()-50
declare @MaxDate date = getdate()

;with Dates as
(
select @MinDate as DateValue

union all

select dateadd(d,1,DateValue)
from Dates
where DateValue < @MaxDate
)
select d.DateValue
from Dates d
    left join AntennaData a
        on(d.DateValue = cast(cast(a.fileyear as nvarchar(4)) + cast(a.filemonth as nvarchar(4)) + cast(a.fileday as nvarchar(4)) as date))
option (maxrecursion 0)

使用此答案的任何人编辑:

虽然递归CTE将生成日期列表,但这不是最有效的方法。如果速度对您很重要,请改用基于集合的计数表:

declare @MinDate date = getdate()-50;
declare @MaxDate date = getdate();

              -- Generate table with 10 rows
with t(t) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
              -- Add row numbers (-1 to start at adding 0 to retain @MinDate value) based on tally table to @MinDate for the number of days +1 (to ensure Min and Max date are included) between the two dates
    ,d(d) as (select top(datediff(day, @MinDate, @MaxDate)+1) dateadd(day,row_number() over (order by (select null))-1,@MinDate)
              from t t1,t t2,t t3,t t4,t t5,t t6    -- Cross join creates 10^6 or 10*10*10*10*10*10 = 1,000,000 row table
              )
select *
from d;

答案 1 :(得分:1)

您可以使用NOT EXISTS

DECLARE @BeginDate DATE, @EndDate DATE;
SET @BeginDate = '20160101';
SET @EndDate = '20160922';

WITH Dates AS
(
    SELECT DATEADD(DAY,number,@BeginDate) [Date]
    FROM master.dbo.spt_values
    WHERE type = 'P'
    AND DATEADD(DAY,number,@BeginDate) <= @EndDate
)
SELECT *
FROM Dates A
WHERE NOT EXISTS(SELECT 1 FROM dbo.Antenna
                 WHERE SUBSTRING([filename],5,8) = A.[Date]);