使用日期范围

时间:2018-03-09 10:41:05

标签: sql sql-server logging

我有日志数据,我正在尝试尽可能地回填数据,以帮助改进分析。

日志数据包含一个SessionId,它是浏览器创建的SessionId,登录用户的名称(如果他们已登录)和LogTime。

我正在尝试获取所有相关会话,彼此在24小时内的会话,并获得该组会话的第一个日期,该组会话的最后日期并填充第一个非空和不是所有其他名称空间的空名。

例如,如果我有以下数据:

--Id    SessionId   Name        LogTime
--1     1                       2018-01-01 00:00
--2     1           LargeOne    2018-01-01 12:00
--3     2           Two         2018-01-01 13:00
--4     3           NULL        2018-01-02 00:00
--5     3                       2018-01-03 00:00
--6     1           One         2018-01-03 00:00
--7     2                       2018-01-03 00:00
--8     2           LargeTwo    2018-01-04 00:00
--9     1                       2018-01-04 00:00

我想按如下方式处理数据:

--Id    SessionId   Name        LogTime             StartTime           EndTime
--1     1           LargeOne    2018-01-01 00:00    2018-01-01 00:00    2018-01-01 12:00
--2     1           LargeOne    2018-01-01 12:00    2018-01-01 00:00    2018-01-01 12:00

--3     2           Two         2018-01-01 13:00    2018-01-01 13:00    2018-01-01 13:00

--4     3           NULL        2018-01-02 00:00    2018-01-02 00:00    2018-01-03 00:00
--5     3           NULL        2018-01-03 00:00    2018-01-02 00:00    2018-01-03 00:00

--6     1           One         2018-01-03 00:00    2018-01-03 00:00    2018-01-04 00:00

--7     2           LargeTwo    2018-01-03 00:00    2018-01-03 00:00    2018-01-04 00:00
--8     2           LargeTwo    2018-01-04 00:00    2018-01-03 00:00    2018-01-04 00:00

--9     1           One         2018-01-04 00:00    2018-01-03 00:00    2018-01-04 00:00

Ids 1和2在相同的会话中并且在彼此的范围内(24小时),因此他们创建了一个集合,注意Id 1没有名称列但是ID 2,并且因为它是相同的一套,它回填了名字。 Ids 6和9也在会话1中,但不在第一组的24小时范围内,因此它创建了一个新的集合,Ids 6和9都在会话1中,即使新会话出现在它们之间,它们仍然是范围内的相同会话因此他们创建了一个新的集合。

我认为这涵盖了解释问题,现在我尝试寻找解决方案。要查找和回填名称,我尝试使用:

SELECT  Id,SessionId,
        FIRST_VALUE(Name) OVER (PARTITION BY SessionId ORDER BY CASE WHEN Name IS NULL or Name='' then 0 ELSE 1 END DESC,Id) Name,
        LogTime
FROM #RawData
ORDER BY Id

这会产生:

--Id    SessionId   Name        LogTime
--1     1           LargeOne    2018-01-01 00:00
--2     1           LargeOne    2018-01-01 12:00
--3     2           Two         2018-01-01 13:00
--4     3           NULL        2018-01-02 00:00
--5     3           NULL        2018-01-03 00:00
--6     1           LargeOne    2018-01-03 00:00
--7     2           Two         2018-01-03 00:00
--8     2           Two         2018-01-04 00:00
--9     1           LargeOne    2018-01-04 00:00

这几乎可行,但不考虑日期范围。

所以我做了很多关于如何根据SessionId和日期范围获取组的方法,我想出了这个:

;WITH ProcessTable1 AS
(
  SELECT Id,SessionId,Name,LogTime,
    PreviousLogTimeInRange = CASE WHEN LAG(LogTime, 1) OVER (partition by SessionId ORDER BY LogTime) between  DATEADD(day, -1, LogTime) and LogTime
        THEN 0 ELSE 1 END,
    NextLogTimeInRange = CASE WHEN Lead(LogTime,1) OVER (partition by SessionId ORDER BY LogTime) between  LogTime and DATEADD(day, 1, LogTime)
        THEN 0 ELSE 1 END
  FROM #RawData
),
ProcessTable2 AS 
(
  SELECT Id, Name, SessionId, LogTime, PreviousLogTimeInRange, 
  NextLogTime = case when NextLogTimeInRange = 0 then LEAD(LogTime, 1) OVER (partition by SessionId ORDER BY LogTime) else LogTime end
  FROM ProcessTable1 WHERE 1 IN (PreviousLogTimeInRange, NextLogTimeInRange)
)
SELECT Id,SessionId,
FIRST_VALUE(Name) OVER (PARTITION BY SessionId ORDER BY CASE WHEN Name IS NULL or Name = '' then 0 ELSE 1 END DESC, Id) Name,
LogTime, NextLogTime

FROM ProcessTable2 
--WHERE PreviousLogTimeInRange = 1
ORDER BY id;

这会产生:

--Id    SessionId   Name        LogTime             NextLogTime
--1     1           LargeOne    2018-01-01 00:00    2018-01-01 12:00
--2     1           LargeOne    2018-01-01 12:00    2018-01-01 12:00
--3     2           Two         2018-01-01 13:00    2018-01-01 13:00
--4     3           NULL        2018-01-02 00:00    2018-01-03 00:00
--5     3           NULL        2018-01-03 00:00    2018-01-03 00:00
--6     1           LargeOne    2018-01-03 00:00    2018-01-04 00:00
--7     2           Two         2018-01-03 00:00    2018-01-04 00:00
--8     2           Two         2018-01-04 00:00    2018-01-04 00:00
--9     1           LargeOne    2018-01-04 00:00    2018-01-04 00:00

如此接近,但我仍然需要StartTime,说实话,我并非100%确定这将永远做我想要的。

最后一个查询部分是根据调查结果创建的 SQL Query to group items by time, but only if near each other?

如果有人愿意在这里伸出援助之手,我将永远感激不尽!

- 编辑 -

如果有人想给它打击,我已经创建了一些数据。

IF OBJECT_ID('tempdb..#RawData') IS NOT NULL DROP TABLE #RawData
GO

Create Table #RawData
(
Id INT IDENTITY,
SessionId INT NOT NULL,
Name NVARCHAR(50) NULL,
LogTime DATETIME NOT NULL
)

INSERT INTO #RawData(SessionId,Name,LogTime)
VALUES
(1, '',         '2018-01-01 00:00'),
(1, 'LargeOne', '2018-01-01 12:00'),

(2, 'Two',      '2018-01-01 13:00'),

(3, NULL,       '2018-01-02 00:00'),
(3, '',         '2018-01-03 00:00'),

(1, 'One',      '2018-01-03 00:00'),

(2, '',         '2018-01-03 00:00'),
(2, 'LargeTwo', '2018-01-04 00:00'),

(1, '',         '2018-01-04 00:00')

SELECT * FROM #RawData

2 个答案:

答案 0 :(得分:0)

您基本上想要LAG(. . . IGNORE NULLS),但SQL Server不支持。

相反,您可以在id上使用累计最大/最小值。这是一个想法:

select t.*,
       max(name) over (partition by sessionid, grpmax) as name
from (select t.*,
             max(case when name <> '' or name is null then id end) over (partition by sessionid) as grp
      from t
     ) t;

这会填充值“向前”,但不会向后填充。向后做它也有一些问题。但你可以用类似的逻辑来做到这一点:

select t.*,
       (case when max(name) over (partition by sessionid, grpafter) <> '' 
             then max(name) over (partition by sessionid, grpafter)
             else min(name) over (partition by sessionid, grpbefore)
        end) as name
from (select t.*,
             min(case when name <> '' or name is null then id end) over (partition by sessionid order by id desc) as grpafter
             max(case when name <> '' or name is null then id end) over (partition by sessionid order by id asc) as grpbefore
      from t
     ) t;

答案 1 :(得分:0)

IF OBJECT_ID('tempdb..#RawData') IS NOT NULL DROP TABLE #RawData
GO

Create Table #RawData
(
Id INT IDENTITY,
SessionId INT NOT NULL,
Name NVARCHAR(50) NULL,
LogTime DATETIME NOT NULL
)

INSERT INTO #RawData(SessionId,Name,LogTime)
VALUES
(1, '',         '2018-01-01 00:00'),
(1, 'LargeOne', '2018-01-01 12:00'),


(2, 'Two',      '2018-01-01 13:00'),

(3, NULL,       '2018-01-02 00:00'),
(3, '',         '2018-01-03 00:00'),

(1, 'One',      '2018-01-03 00:00'),

(2, '',         '2018-01-03 00:00'),
(2, 'LargeTwo', '2018-01-04 00:00'),

(1, '',         '2018-01-04 00:00')

go


with my_sql as (    
    SELECT t1.SessionId, 
           t1.Name, 
           t1.LogTime , 
           (
             SELECT min( t2.LogTime )
               from #RawData t2
              where t1.SessionId = t2.SessionId
                and cast( t1.LogTime as date ) >= cast( t2.LogTime as date )
                and cast( t1.LogTime as date ) <= dateadd(day, 1, t2.LogTime)
           ) as StartTime
      FROM #RawData t1 
  )

  --select * from my_sql

  SELECT ms.SessionId, 
         ( select top 1 t.name
              from my_sql t
              where ms.SessionId = t.SessionId 
                and cast(ms.StartTime as date ) = cast(t.StartTime as date) 
                and t.name <> ''
         ) as name,           
         ms.LogTime,
         ms.StartTime,
         ( select max(t.LogTime) 
            from my_sql t
           where ms.SessionId = t.SessionId 
             and cast(ms.StartTime as date ) = cast(t.StartTime as date) 
         ) as endTime    
    FROM my_sql ms