如何获取所有日期的汇总值,即使在某些日子丢失数据时也是如此?

时间:2015-08-26 19:10:47

标签: sql sql-server sql-server-2008

我有用户跟踪时间的数据。数据是分段的,每行代表一个分段。这是样本数据

http://sqlfiddle.com/#!6/2fa61

我如何每天获取数据,即如果一整天是1440分钟,那么我想知道用户在一天内被跟踪了多少分钟。我还想在没有数据的那天显示0。

我期待以下输出

desired output

5 个答案:

答案 0 :(得分:1)

我对日期范围做了一些猜测,但这应该非常接近。

在我的系统上,我保留了一个名为cteTally的视图,这是我的计数表版本。这是创建它的代码。

create View [dbo].[cteTally] as

WITH
    E1(N) AS (select 1 from (values (1),(1),(1),(1),(1),(1),(1),(1),(1),(1))dt(n)),
    E2(N) AS (SELECT 1 FROM E1 a, E1 b), --10E+2 or 100 rows
    E4(N) AS (SELECT 1 FROM E2 a, E2 b), --10E+4 or 10,000 rows max
    cteTally(N) AS 
    (
        SELECT  ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) FROM E4
    )
select N from cteTally

现在我们可以利用它来建立您的结果。我们只需要安排其他几个CTE来确定日期范围。

with DateRange as
(
    select MIN(FirstDate) as StartDate
        , MAX(LastUpdate) as EndDate 
    from track
)
, AllDates as
(
    select DateAdd(DAY, t.N - 1, StartDate) BaseDate
    from DateRange dr
    cross join cteTally t
    where t.N <= DATEDIFF(day, StartDate, EndDate) + 1
)

select t.Email
    , ad.BaseDate as xDate
    , t.DurationInSeconds as TrackMinutes
from AllDates ad
left join track t on cast(t.StartTime as date) = ad.BaseDate

答案 1 :(得分:1)

你应该按日值分组。您可以使用DATEPART函数获取当天:DATEPART(d,[StartTime])

SELECT cast([StartTime] as date) as date ,sum(datediff(n,[StartTime],[EndTime])) as "min" 
FROM [test].[dbo].[track] 
group by DATEPART(d,[StartTime]),cast([StartTime]as date)

答案 2 :(得分:1)

  1. 为日期创建表变量
  2. 在WHILE循环中填充表格
  3. 使用日期表变量
  4. 交叉连接到跟踪器数据
  5. 将[DurationInSeconds]列中的值转换为分钟
  6. 将零替换为零
  7. 代码:

    DECLARE @dates TABLE ( ReportDates DATE )  
    DECLARE @BeginDate AS DATE
      , @EndDate AS DATE
      , @RunDate AS DATE
    
    SELECT @BeginDate = MIN(starttime) FROM dbo.track
    SELECT @EndDate = MAX(starttime) FROM dbo.track
    
    SET @RunDate = @BeginDate
    WHILE @RunDate <= @EndDate
        BEGIN
            SET @RunDate = DATEADD(DAY, 1, @RunDate)
            INSERT  INTO @dates
            VALUES  ( @RunDate )
        END;
    
    SELECT e.Email 
         , e.ReportDates
         , ISNULL(SUM(DurationInSeconds / 60), 0) AS TotDurationInMinutes
    FROM (  SELECT  d.ReportDates
                   ,t.email
            FROM    @dates AS d
            cross JOIN track AS t  
            GROUP BY d.ReportDates, t.Email ) AS e
    LEFT JOIN track AS t ON e.ReportDates = CAST(t.StartTime AS DATE)
    GROUP BY e.ReportDates, e.Email
    

    结果:

    Email ReportDates TotDurationInMinutes
    ----- ----------- ----------------------
    ABC   2015-02-21  1439
    ABC   2015-02-22  1357
    ABC   2015-02-23  1969
    ABC   2015-02-24  0
    ABC   2015-02-25  0
    ABC   2015-02-26  0
    ABC   2015-02-27  0
    ABC   2015-02-28  360
    ABC   2015-03-01  0
    

答案 3 :(得分:1)

使用table of numbers。我个人有一个永久表Numbers,里面有100K数字。

一旦有了一组数字,就可以为所需的范围生成一组日期。在此查询中,我会从您的数据中获取MINMAX个日期,但由于您可能没有某些日期的数据,因此最好使用明确的参数来定义范围。

对于每个日期,我有一天的开始和结束 - 我们的分组间隔。

对于我们在track行中搜索与此间隔相交的那些日期。如果(DayStart, DayEnd)(StartTime, EndTime),则StartTime < DayEndEndTime > DayStart两个时间间隔相交。这进入WHERE

对于每个交叉间隔,我们计算属于两个区间的范围:从MAX(DayStart, StartTime)MIN(DayEnd, EndTime)

最后,我们按天分组并总结所有范围的持续时间。

我在您的示例数据中添加了一行,以便在时间间隔覆盖整天时测试案例。2015-02-14 20:50:432015-02-16 19:49:59。我在样本中的间隔之前选择了这个间隔,因此示例中日期的结果不会受到影响。这是SQL Fiddle

DECLARE @track table
(
Email varchar(20),
StartTime datetime,
EndTime datetime,
DurationInSeconds int,
FirstDate datetime,
LastUpdate datetime
);

Insert into @track  values ( 'ABC', '2015-02-20 08:49:43.000', '2015-02-20 14:49:59.000', 21616, '2015-02-19 00:00:00.000', '2015-02-28 11:45:27.000')
Insert into @track  values ( 'ABC', '2015-02-20 14:49:59.000', '2015-02-20 22:12:07.000', 26528, '2015-02-19 00:00:00.000', '2015-02-28 11:45:27.000')
Insert into @track  values ( 'ABC', '2015-02-20 22:12:07.000', '2015-02-21 07:00:59.000', 31732, '2015-02-19 00:00:00.000', '2015-02-28 11:45:27.000')
Insert into @track  values ( 'ABC', '2015-02-21 09:49:43.000', '2015-02-21 16:30:10.000', 24027, '2015-02-19 00:00:00.000', '2015-02-28 11:45:27.000')
Insert into @track  values ( 'ABC', '2015-02-21 16:30:10.000', '2015-02-22 09:49:30.000', 62360, '2015-02-19 00:00:00.000', '2015-02-28 11:45:27.000')
Insert into @track  values ( 'ABC', '2015-02-22 09:55:43.000', '2015-02-22 11:49:59.000', 5856, '2015-02-19 00:00:00.000', '2015-02-28 11:45:27.000')
Insert into @track  values ( 'ABC', '2015-02-22 11:49:10.000', '2015-02-23 08:49:59.000', 75649, '2015-02-19 00:00:00.000', '2015-02-28 11:45:27.000')
Insert into @track  values ( 'ABC', '2015-02-23 10:59:43.000', '2015-02-23 12:49:59.000', 6616, '2015-02-19 00:00:00.000', '2015-02-28 11:45:27.000')
Insert into @track  values ( 'ABC', '2015-02-23 12:50:43.000', '2015-02-24 19:49:59.000', 111556, '2015-02-19 00:00:00.000', '2015-02-28 11:45:27.000')
Insert into @track  values ( 'ABC', '2015-02-28 08:49:43.000', '2015-02-28 14:49:59.000', 21616, '2015-02-19 00:00:00.000', '2015-02-28 11:45:27.000')

Insert into @track  values ( 'ABC', '2015-02-14 20:50:43.000', '2015-02-16 19:49:59.000', 0, '2015-02-19 00:00:00.000', '2015-02-28 11:45:27.000')

;WITH
CTE_Dates
AS
(
    SELECT
        Email
        ,CAST(MIN(StartTime) AS date) AS StartDate
        ,CAST(MAX(EndTime) AS date) AS EndDate
    FROM @track
    GROUP BY Email
)
SELECT
    CTE_Dates.Email
    ,DayStart AS xDate
    ,ISNULL(SUM(DATEDIFF(second, RangeStart, RangeEnd)) / 60, 0) AS TrackMinutes
FROM
    Numbers
    CROSS JOIN CTE_Dates -- this generates list of dates without gaps
    CROSS APPLY
    (
        SELECT
            DATEADD(day, Numbers.Number-1, CTE_Dates.StartDate) AS DayStart
            ,DATEADD(day, Numbers.Number, CTE_Dates.StartDate) AS DayEnd
    ) AS A_Date -- this is midnight of each current and next day
    OUTER APPLY
    (
        SELECT
          -- MAX(DayStart, StartTime)
          CASE WHEN DayStart > StartTime THEN DayStart ELSE StartTime END AS RangeStart

          -- MIN(DayEnd, EndTime)
          ,CASE WHEN DayEnd < EndTime THEN DayEnd ELSE EndTime END AS RangeEnd
        FROM @track AS T
        WHERE
            T.Email = CTE_Dates.Email
            AND T.StartTime < DayEnd
            AND T.EndTime > DayStart
    ) AS A_Track -- this is all tracks that intersect with the current day
WHERE
    Numbers.Number <= DATEDIFF(day, CTE_Dates.StartDate, CTE_Dates.EndDate)+1
GROUP BY DayStart, CTE_Dates.Email
ORDER BY DayStart;

<强>结果

Email    xDate         TrackMinutes
ABC      2015-02-14    189
ABC      2015-02-15    1440
ABC      2015-02-16    1189
ABC      2015-02-17    0
ABC      2015-02-18    0
ABC      2015-02-19    0
ABC      2015-02-20    910
ABC      2015-02-21    1271
ABC      2015-02-22    1434
ABC      2015-02-23    1309
ABC      2015-02-24    1189
ABC      2015-02-25    0
ABC      2015-02-26    0
ABC      2015-02-27    0
ABC      2015-02-28    360

如果您的数据中有两个或更多间隔重叠,您仍然可以获得超过1440的TrackMinutes

<强>更新

你在评论中说你的数据中有几行,其中间隔重叠,结果的值超过1440.你可以将SUM包装到CASE以隐藏数据中的这些错误,但最终找到有问题的这些行并修复数据会更好。您只看到几行的值超过1440,但可能会有更多其他行具有相同的问题,这是不可见的。因此,最好编写一个查找重叠行的查询并检查有多少行,然后决定如何处理它们。这里的危险是,目前你认为只有很少,但可能会有很多。这超出了这个问题的范围。

要隐藏问题,请在上面的查询中替换此行:

,ISNULL(SUM(DATEDIFF(second, RangeStart, RangeEnd)) / 60, 0) AS TrackMinutes

用这个:

,CASE 
WHEN ISNULL(SUM(DATEDIFF(second, RangeStart, RangeEnd)) / 60, 0) > 1440
THEN 1440
ELSE ISNULL(SUM(DATEDIFF(second, RangeStart, RangeEnd)) / 60, 0) 
END AS TrackMinutes

答案 4 :(得分:0)

希望有所帮助

SET NOCOUNT ON;

DROP TABLE #temp_table

CREATE TABLE #temp_table (
    Email VARCHAR(20)
    ,StartTime DATETIME
    ,DurationInSeconds INT
    ,
    )

DECLARE @Nextday DATETIME
    ,@Email VARCHAR(20)
    ,@StartTime DATETIME
    ,@DurationInSeconds INT
    ,@lastduration INT
    ,@currentduration INT
    ,@FirstDate DATETIME

SET @FirstDate = (
        SELECT TOP 1 LEFT(StartTime, 11)
        FROM track
        )

DECLARE vendor_cursor CURSOR
FOR
SELECT Email
    ,StartTime
    ,DurationInSeconds
FROM track

OPEN vendor_cursor

FETCH NEXT
FROM vendor_cursor
INTO @Email
    ,@StartTime
    ,@DurationInSeconds

WHILE @@FETCH_STATUS = 0
BEGIN
    IF EXISTS (
            SELECT 1
            FROM #temp_table
            WHERE LEFT(StartTime, 11) = LEFT(@StartTime, 11)
            )
    BEGIN
        SELECT @lastduration = DurationInSeconds
        FROM #temp_table
        WHERE LEFT(StartTime, 11) = LEFT(@StartTime, 11)

        SET @currentduration = @lastduration + @DurationInSeconds

        UPDATE #temp_table
        SET DurationInSeconds = @currentduration
        WHERE LEFT(StartTime, 11) = LEFT(@StartTime, 11)
    END
    ELSE
    BEGIN
        INSERT INTO #temp_table
        SELECT @Email
            ,@StartTime
            ,@DurationInSeconds

        SET @FirstDate = DATEADD(day, 1, @FirstDate)
    END

    IF NOT EXISTS (
            SELECT 1
            FROM track
            WHERE LEFT(StartTime, 11) = @FirstDate
            )
    BEGIN
        INSERT INTO #temp_table
        SELECT @Email
            ,@FirstDate
            ,0

        SET @FirstDate = DATEADD(day, 1, @FirstDate)
    END

    -- Get the next vendor.
    FETCH NEXT
    FROM vendor_cursor
    INTO @Email
        ,@StartTime
        ,@DurationInSeconds
END

CLOSE vendor_cursor;

DEALLOCATE vendor_cursor;

SELECT *
FROM #temp_table
ORDER BY StartTime