我每天都有数据快照。现在,我想使用SQL从中获取时间序列数据。我尝试了一些方法,但是有一定的局限性。
样本数据:
预期结果:
我尝试了以下SQL,但局限性在于,在逻辑上应为值0创建两个分区,而仅创建一个分区时,它给出False结果。
SELECT [name], [value],
[date] as [start],
DATEADD(DAY, -1, LEAD([date], 1) OVER(PARTITION BY [name] ORDER BY [date])) AS [end]
FROM (
SELECT *,
RANK() OVER(Partition by [name], [rnk] ORDER BY [date]) as row_num
FROM(
SELECT [name], [value], [date],
DENSE_RANK() OVER(Partition by [name] ORDER BY [value]) AS rnk
FROM sample_data
) AS T
) AS TT
WHERE row_num = 1
上述SQL的结果:
我们非常感谢您的帮助!
答案 0 :(得分:2)
这是一个gaps-and-islands
问题。你可以试试看。
SELECT Name, Value, MIN([Date]) Start, MAX([Date]) [End] FROM (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY Name ORDER BY [Date])
- ROW_NUMBER() OVER(PARTITION BY Name, Value ORDER BY [Date]) AS GRP
FROM sample_data
) T
GROUP BY Name, Value, GRP
ORDER BY Name, Start
答案 1 :(得分:1)
MS SQL Server 2017架构设置:
create table sample_data(Name varchar(max), Value int , Date date)
insert into sample_data(Name,Value,Date)values('A',0,'2019-10-24')
insert into sample_data(Name,Value,Date)values('A',0,'2019-10-25')
insert into sample_data(Name,Value,Date)values('A',0,'2019-10-26')
insert into sample_data(Name,Value,Date)values('A',1,'2019-10-27')
insert into sample_data(Name,Value,Date)values('A',1,'2019-10-28')
insert into sample_data(Name,Value,Date)values('A',1,'2019-10-29')
insert into sample_data(Name,Value,Date)values('A',0,'2019-10-30')
insert into sample_data(Name,Value,Date)values('A',0,'2019-10-31')
查询1 :
WITH CTE AS (
SELECT *,
ROW_NUMBER() OVER(PARTITION BY Name ORDER BY Date )
- ROW_NUMBER() OVER(PARTITION BY Name,Value ORDER BY Date ) AS Interval
FROM sample_data
)
SELECT Name, Value, MIN(Date) Starting_Date, MAX(Date) Ending_Date FROM CTE
GROUP BY Name, Value, Interval
Order BY Name,Starting_Date
Results :
| Name | Value | Starting_Date | Ending_Date |
|------|-------|---------------|-------------|
| A | 0 | 2019-10-24 | 2019-10-26 |
| A | 1 | 2019-10-27 | 2019-10-29 |
| A | 0 | 2019-10-30 | 2019-10-31 |
答案 2 :(得分:1)
这是针对称为孤岛和缺口的算法的解决方案。
;WITH [Islands] AS
(
SELECT 'A' AS [Name], 0 AS [Value], CAST('2019-10-24' AS DATE) AS [Date] UNION
SELECT 'A' AS [Name], 0 AS [Value], CAST('2019-10-25' AS DATE) AS [Date] UNION
SELECT 'A' AS [Name], 0 AS [Value], CAST('2019-10-26' AS DATE) AS [Date] UNION
SELECT 'A' AS [Name], 1 AS [Value], CAST('2019-10-27' AS DATE) AS [Date] UNION
SELECT 'A' AS [Name], 1 AS [Value], CAST('2019-10-28' AS DATE) AS [Date] UNION
SELECT 'A' AS [Name], 1 AS [Value], CAST('2019-10-29' AS DATE) AS [Date] UNION
SELECT 'A' AS [Name], 0 AS [Value], CAST('2019-10-30' AS DATE) AS [Date] UNION
SELECT 'A' AS [Name], 0 AS [Value], CAST('2019-10-31' AS DATE) AS [Date]
)
, [IslandGroups] AS
(
SELECT
*
,DATEDIFF(DAY, '1900-01-01', [Date]) AS [DifferenceInDays]
,ROW_NUMBER() OVER (ORDER BY [Name], [Value]) AS [RowNumber]
,DATEDIFF(DAY, '1900-01-01', [Date]) - ROW_NUMBER() OVER (ORDER BY [Name], [Value]) AS [IslandGroup]
FROM
[Islands]
)
SELECT
[Name]
,[Value]
,MIN([Date]) AS [starting_date]
,MAX([Date]) AS [starting_date]
FROM
[IslandGroups]
GROUP BY
[Name]
,[Value]
,[IslandGroup]
ORDER BY
[Name]
,MIN([Date])
这是它的工作方式。该算法通过从两个日期之间的天差中减去排名函数(在本例中为ROW_NUMBER())来工作。如果运行此命令,则将看到RowNumber列随着DifferenceInDays的增加而增加。
... removed for brevity
, [IslandGroups] AS
(
SELECT
*
,DATEDIFF(DAY, '1900-01-01', [Date]) AS [DifferenceInDays]
,ROW_NUMBER() OVER (ORDER BY [Name], [Value]) AS [RowNumber]
,DATEDIFF(DAY, '1900-01-01', [Date]) - ROW_NUMBER() OVER (ORDER BY [Name], [Value]) AS [IslandGroup]
FROM
[Islands]
)
SELECT
*
FROM
[IslandGroups]
结果:
A 0 2019-10-24 43760 1 43759 <- First in the series
A 0 2019-10-25 43761 2 43759
A 0 2019-10-26 43762 3 43759
A 0 2019-10-30 43766 4 43762 <- Next set
A 0 2019-10-31 43767 5 43762
A 1 2019-10-27 43763 6 43757 <- Next set
A 1 2019-10-28 43764 7 43757
A 1 2019-10-29 43765 8 43757
然后,您可以按通用的Island分组进行GROUP BY,并从同一组中获得MIN()和MAX()[日期]。