每个时段,人员和类别的最新价值

时间:2015-10-19 09:24:34

标签: sql sql-server greatest-n-per-group

我试过浏览问题&这个论坛的答案,但它们都不适合我的情况。

我有一些人报告了他们两个类别的状态,如下所示:

TimeStamp           | PersonID | Category | Value
2015-07-02 01:25:00 | 2303     | CatA     | 8.2
2015-07-02 01:25:00 | 2303     | CatB     | 10.1
2015-07-02 03:35:00 | 2303     | CatA     | 8.0
2015-07-02 03:35:00 | 2303     | CatB     | 9.9
2015-07-02 02:30:00 | 4307     | CatA     | 8.7
2015-07-02 02:30:00 | 4307     | CatB     | 12.7
.
.
.
2015-07-31 22:15:00 | 9011     | CatA     | 7.9
2015-07-31 22:15:00 | 9011     | CatB     | 8.9

有些人每小时报告几次状态,但其他人每天只报告几次。

我需要为每个人和每个类别生成一个输出,该输出显示每天每小时的最新知晓状态。这应该是这样的:

Date        |Hour| Person | Category | Value
2015-07-02  | 1  | 2307   | CatA     | Null 
2015-07-02  | 1  | 2307   | CatB     | Null 
2015-07-02  | 2  | 2307   | CatA     | 8.2 
2015-07-02  | 2  | 2307   | CatB     | 10.2 
2015-07-02  | 3  | 2307   | CatA     | 8.2 
2015-07-02  | 3  | 2307   | CatB     | 10.2 
2015-07-02  | 4  | 2307   | CatA     | 8.0 
2015-07-02  | 4  | 2307   | CatB     | 9.9 
.
.
.
2015-07-31  | 23 | 9011   | CatA     | 7.9
2015-07-31  | 23 | 9011   | CatB     | 8.9

每个人和类别的第一行可能为空,因为没有已知值,因为这是"时间的开始"

我尝试过使用像这样的子查询:

SELECT Date
      ,hour
      ,Person
      ,Category
      ,(SELECT TOP 1 status FROM readings WHERE (readings.Date<=structure.Date) AND readings.Hour<=structure.hour)....and so forth.... order by TimeStamp DESC
FROM structure

这样做 - 除了在性能方面,因为我需要这样做一个月,2个类别的2.000个人,这意味着子查询必须运行(30 * 24 * 2000 * 2 = 2,880,000)次,并且鉴于包含读数的表格还包含数十万个读数,这不起作用。

我也尝试过使用row_number(),但是没有成功。

有什么建议吗?

编辑(2015年10月19日15:34):在上面的查询示例中,我指的是&#34;结构&#34;表。这实际上只是(目前)一个视图,使用以下SQL:

SELECT Calendar.CalendarDay, Hours.Hour, Persons.Person, Categories.Category
FROM Calendar CROSS JOIN Hours CROSS JOIN Persons CROSS JOIN Categories

这是为了生成一个包含每天行的表,每个人和每个类别的每小时。然后该表包含(30 * 24 * 2000 * 2 = 2,880,000)行。

对于这些行中的每一行,我需要从读数表中找到最新状态。因此,对于每一天,每个小时,对于每个人和每个类别,我需要从读数表中阅读最新的可用状态。

3 个答案:

答案 0 :(得分:1)

让我猜一下。

根据“生成输出的任务,显示每天的最新知识状态,每天的每个小时,每个人和类别”,您需要采取三个步骤:

(1)查找每小时的最新记录;

(2)获取要显示的所有日期和时间的表格;

(3)按人员和类别乘以日期 - 小时表,并将结果与​​每小时最新记录一起加入。

-- Test data 

declare @t table ([Timestamp] datetime2(0), PersonId int, Category varchar(4), Value decimal(3,1));

insert into @t values
('2015-07-02 01:25:00', 2303, 'CatA', 8.2    ),
('2015-07-02 01:45:00', 2303, 'CatA', 9.9    ),
('2015-07-02 01:25:00', 2303, 'CatB', 10.1   ),
('2015-07-02 03:35:00', 2303, 'CatA', 8.0    ),
('2015-07-02 03:35:00', 2303, 'CatB', 9.9    ),
('2015-07-02 02:30:00', 4307, 'CatA', 8.7    ),
('2015-07-02 02:30:00', 4307, 'CatB', 12.7   );


-- Latest records for every hour 

declare @Latest table (
    [Date] date, 
    [Hour] tinyint, 
    PersonId int, 
    Category varchar(4), 
    Value decimal(3,1)

    primary key ([Date], [Hour], PersonId, Category)
);

insert into @Latest 
select top 1 with ties
    [Date]    = cast([Timestamp] as date),
    [Hour]    = datepart(hour, [Timestamp]),
    PersonId  ,
    Category  ,
    Value        
from
    @t
order by
    row_number() over(partition by cast([Timestamp] as date), datepart(hour, [Timestamp]), PersonId, Category order by [Timestamp] desc);


-- Date-hours table 

declare @FromDateTime datetime2(0);
declare @ToDateTime datetime2(0);

select @FromDateTime = min([Timestamp]), @ToDateTime = max([Timestamp]) from @t;

declare @DateDiff int = datediff(day, @FromDateTime, @ToDateTime);
declare @FromDate date = cast(@FromDateTime as date);
declare @FromHour int = datepart(hour, @FromDateTime);
declare @ToHour int = datepart(hour, @ToDateTime);

declare @DayHours table ([Date] date, [Hour] tinyint, primary key clustered ([Date], [Hour]) );


with N as 
(
    select n from (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) t(n)    
),
D as (
    select
        row_number() over(order by (select 1))-1 as d
    from 
        N n1, N n2, N n3
),
H as (
    select top 24
        row_number() over(order by (select 1)) - 1 as h
    from 
        N n1, N n2
)
insert into @DayHours
select dateadd(day, d, @FromDate), h
from 
    D, h
where
    @FromHour <= (d * 100 + h) 
    and (d * 100 + h) <= (@DateDiff * 100 + @ToHour);


-- @PersonsIds & @Categories tables (just an imitation of the real tables)

declare @PersonsIds table (Id int primary key);
declare @Categories table (Category varchar(4) primary key);

insert into @PersonsIds select distinct PersonId from @t;
insert into @Categories select distinct Category from @t;


-- The result 

select
    dh.[Date],
    dh.[Hour],
    PersonId    =    p.Id,
    c.Category,
    l.Value
from
    @PersonsIds p cross join @Categories c cross join @DayHours dh
    left join @Latest l on l.[Date] = dh.[Date] and l.[Hour] = dh.[Hour] and l.PersonId = p.Id and l.Category = c.Category
order by
    [Date], [Hour], PersonId, Category;

修改(1):

行。 为了将以前的值带到空白处, 让我们用这个语句替换最后的 select 语句:

select top 1 with ties
    dh.[Date],
    dh.[Hour],
    PersonId    =    p.Id,
    c.Category,
    l.Value
from
    @PersonsIds p cross join @Categories c cross join @DayHours dh
    left join @Latest l 
        on (l.[Date] = dh.[Date] and l.[Hour] <= dh.[Hour] or l.[Date] < dh.[Date])  
        and l.PersonId = p.Id and l.Category = c.Category
order by
    row_number() 
        over (partition by dh.[Date], dh.[Hour], p.Id, c.Category 
              order by l.[Date] desc, l.[Hour] desc);

编辑(2): 让我们尝试在聚集索引的临时表中收集笛卡尔积:PersonId,Category,[Date],[Hour]。

然后更新表格拖动未更改的值:

declare @Result table (
    [Date]      date, 
    [Hour]      tinyint, 
    PersonId    int, 
    Category    varchar(4), 
    Value       decimal(3,1)

    primary key (PersonId, Category, [Date], [Hour]) -- Important !!!
)


insert into @Result
select
    dh.[Date],
    dh.[Hour],
    PersonId    =    p.Id,
    c.Category,
    l.Value
from
    @PersonsIds p cross join @Categories c cross join @DayHours dh
    left join @Latest l on l.[Date] = dh.[Date] and l.[Hour] = dh.[Hour] and l.PersonId = p.Id and l.Category = c.Category
order by
    [Date], [Hour], PersonId, Category;



declare @PersonId   int;
declare @Category   varchar(4);
declare @Value decimal(3,1);

update @Result set  
    @Value = Value = isnull(Value, case when @PersonId = PersonId and @Category = Category then @Value end),
    @PersonId = PersonId,
    @Category = Category;

为了获得更好的性能,请考虑使用临时表更改表变量并根据查询计划建议应用索引。

答案 1 :(得分:0)

如果我弄错了..它应该给你想要的结果。

select st.Date,
     case when hour =1 then NULL
          else hour 
          end as hour
      ,st.Person,st.Category,
      (select status from reading qualify row_number() over (partition by personid 
      order by status desc)=1)
      from structure;

答案 2 :(得分:0)

你可以在SQL中实现这一点,但它会很慢,因为对于每个人,类别,日和小时,你必须在此之前查找人和类别的最新条目。想想这个过程:在你的大表中选择一个记录,找到所有状态,然后订购它们并找到最新的并选择其值。这将在你的大桌上的每一条记录中完成。

您可能更好的方法是使用编程语言编写的程序简单地检索所有数据,并使用控制中断算法收集数据。

但是,让我们看看它是如何在SQL中完成的。

一个问题是SQL Server糟糕的日期/时间功能。我们想比较日期加上小时,这对于'yyyymmddhh'格式的字符串来说是最简单的,例如'2015101923'&lt; '2015102001'。在您的大表中,您有日期和小时,在状态表中您有日期时间。让我们看看我们如何获得所需的字符串:

convert(varchar(8), bigtable.calendarday, 112) +
right('0' + convert(varchar(2), bigtable.hour), 2)

convert(varchar(8), status.timestamp, 112) + 
right('0' + convert(varchar(2), datepart(hour, status.timestamp)), 2)

由于这是 - 以及人和类别 - 我们查找记录的关键标准,您可能希望将其作为计算列并在两个表中添加索引(person + category + dayhourkey)。

您可以从大表中进行选择,并在子查询中获取状态值。为了获得最新的匹配记录,您需要按时间戳排序并限制为1条记录。

select 
  personid,
  calendarday,
  hour,
  category,
  (
    select value
    from status s
    where s.personid = b.personid
      and s.category = b.category
      and convert(varchar(8), s.timestamp, 112) + right('0' + convert(varchar(2), datepart(hour, s.timestamp)), 2) <=
          convert(varchar(8), b.calendarday, 112) + right('0' + convert(varchar(2), b.hour), 2)
    order by s.timestamp desc limit 1
  ) as value
from bigtable b;