我从多个天线收到每日txt数据文件。文件的命名约定是:
独特的天线ID +年+月+日+随机3位数
我解析了文件名,并创建了一个像这样的表:
AntennaID fileyear filemonth fileday filenumber filename
0000 2016 09 22 459 000020160922459.txt
0000 2016 09 21 981 000020160921981.txt
0000 2016 09 20 762 000020160920762.txt
0001 2016 09 22 635 000120160922635.txt
.
.
.
etc. (200k rows)
有时天线会发送多个文件或根本不发送文件。如果发送的文件超过1,则唯一的3位数文件编号会区分文件,但我会尝试查找文件未发送的日期。
我尝试了几个groupby语句来比较给定月份的数据文件数量,看看它是否与该月份的天数相匹配 - 但问题是有时候天线会发送超过1个文件每天可以人为地弥补“失踪”#34;文件,如果我们只是比较计数。
我正在寻找一种更健壮的方法来查找丢失文件的日期或日期范围。我已经查看了Partition和Over函数,觉得可能存在这些功能,但我不确定如何使用它们,因为我对SQL很新。
我使用的是Microsoft SQL Server 2016
答案 0 :(得分:6)
您可以使用common table expression(或简称cte
)来创建日期表。然后,您可以从此表join
到天线数据并查找返回null
值的日期:
declare @MinDate date = getdate()-50
declare @MaxDate date = getdate()
;with Dates as
(
select @MinDate as DateValue
union all
select dateadd(d,1,DateValue)
from Dates
where DateValue < @MaxDate
)
select d.DateValue
from Dates d
left join AntennaData a
on(d.DateValue = cast(cast(a.fileyear as nvarchar(4)) + cast(a.filemonth as nvarchar(4)) + cast(a.fileday as nvarchar(4)) as date))
option (maxrecursion 0)
虽然递归CTE将生成日期列表,但这不是最有效的方法。如果速度对您很重要,请改用基于集合的计数表:
declare @MinDate date = getdate()-50;
declare @MaxDate date = getdate();
-- Generate table with 10 rows
with t(t) as (select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1 union all select 1)
-- Add row numbers (-1 to start at adding 0 to retain @MinDate value) based on tally table to @MinDate for the number of days +1 (to ensure Min and Max date are included) between the two dates
,d(d) as (select top(datediff(day, @MinDate, @MaxDate)+1) dateadd(day,row_number() over (order by (select null))-1,@MinDate)
from t t1,t t2,t t3,t t4,t t5,t t6 -- Cross join creates 10^6 or 10*10*10*10*10*10 = 1,000,000 row table
)
select *
from d;
答案 1 :(得分:1)
您可以使用NOT EXISTS
:
DECLARE @BeginDate DATE, @EndDate DATE;
SET @BeginDate = '20160101';
SET @EndDate = '20160922';
WITH Dates AS
(
SELECT DATEADD(DAY,number,@BeginDate) [Date]
FROM master.dbo.spt_values
WHERE type = 'P'
AND DATEADD(DAY,number,@BeginDate) <= @EndDate
)
SELECT *
FROM Dates A
WHERE NOT EXISTS(SELECT 1 FROM dbo.Antenna
WHERE SUBSTRING([filename],5,8) = A.[Date]);