我试过浏览问题&这个论坛的答案,但它们都不适合我的情况。
我有一些人报告了他们两个类别的状态,如下所示:
TimeStamp | PersonID | Category | Value
2015-07-02 01:25:00 | 2303 | CatA | 8.2
2015-07-02 01:25:00 | 2303 | CatB | 10.1
2015-07-02 03:35:00 | 2303 | CatA | 8.0
2015-07-02 03:35:00 | 2303 | CatB | 9.9
2015-07-02 02:30:00 | 4307 | CatA | 8.7
2015-07-02 02:30:00 | 4307 | CatB | 12.7
.
.
.
2015-07-31 22:15:00 | 9011 | CatA | 7.9
2015-07-31 22:15:00 | 9011 | CatB | 8.9
有些人每小时报告几次状态,但其他人每天只报告几次。
我需要为每个人和每个类别生成一个输出,该输出显示每天每小时的最新知晓状态。这应该是这样的:
Date |Hour| Person | Category | Value
2015-07-02 | 1 | 2307 | CatA | Null
2015-07-02 | 1 | 2307 | CatB | Null
2015-07-02 | 2 | 2307 | CatA | 8.2
2015-07-02 | 2 | 2307 | CatB | 10.2
2015-07-02 | 3 | 2307 | CatA | 8.2
2015-07-02 | 3 | 2307 | CatB | 10.2
2015-07-02 | 4 | 2307 | CatA | 8.0
2015-07-02 | 4 | 2307 | CatB | 9.9
.
.
.
2015-07-31 | 23 | 9011 | CatA | 7.9
2015-07-31 | 23 | 9011 | CatB | 8.9
每个人和类别的第一行可能为空,因为没有已知值,因为这是"时间的开始"
我尝试过使用像这样的子查询:
SELECT Date
,hour
,Person
,Category
,(SELECT TOP 1 status FROM readings WHERE (readings.Date<=structure.Date) AND readings.Hour<=structure.hour)....and so forth.... order by TimeStamp DESC
FROM structure
这样做 - 除了在性能方面,因为我需要这样做一个月,2个类别的2.000个人,这意味着子查询必须运行(30 * 24 * 2000 * 2 = 2,880,000)次,并且鉴于包含读数的表格还包含数十万个读数,这不起作用。
我也尝试过使用row_number(),但是没有成功。
有什么建议吗?
编辑(2015年10月19日15:34):在上面的查询示例中,我指的是&#34;结构&#34;表。这实际上只是(目前)一个视图,使用以下SQL:
SELECT Calendar.CalendarDay, Hours.Hour, Persons.Person, Categories.Category
FROM Calendar CROSS JOIN Hours CROSS JOIN Persons CROSS JOIN Categories
这是为了生成一个包含每天行的表,每个人和每个类别的每小时。然后该表包含(30 * 24 * 2000 * 2 = 2,880,000)行。
对于这些行中的每一行,我需要从读数表中找到最新状态。因此,对于每一天,每个小时,对于每个人和每个类别,我需要从读数表中阅读最新的可用状态。
答案 0 :(得分:1)
让我猜一下。
根据“生成输出的任务,显示每天的最新知识状态,每天的每个小时,每个人和类别”,您需要采取三个步骤:
(1)查找每小时的最新记录;
(2)获取要显示的所有日期和时间的表格;
(3)按人员和类别乘以日期 - 小时表,并将结果与每小时最新记录一起加入。
-- Test data
declare @t table ([Timestamp] datetime2(0), PersonId int, Category varchar(4), Value decimal(3,1));
insert into @t values
('2015-07-02 01:25:00', 2303, 'CatA', 8.2 ),
('2015-07-02 01:45:00', 2303, 'CatA', 9.9 ),
('2015-07-02 01:25:00', 2303, 'CatB', 10.1 ),
('2015-07-02 03:35:00', 2303, 'CatA', 8.0 ),
('2015-07-02 03:35:00', 2303, 'CatB', 9.9 ),
('2015-07-02 02:30:00', 4307, 'CatA', 8.7 ),
('2015-07-02 02:30:00', 4307, 'CatB', 12.7 );
-- Latest records for every hour
declare @Latest table (
[Date] date,
[Hour] tinyint,
PersonId int,
Category varchar(4),
Value decimal(3,1)
primary key ([Date], [Hour], PersonId, Category)
);
insert into @Latest
select top 1 with ties
[Date] = cast([Timestamp] as date),
[Hour] = datepart(hour, [Timestamp]),
PersonId ,
Category ,
Value
from
@t
order by
row_number() over(partition by cast([Timestamp] as date), datepart(hour, [Timestamp]), PersonId, Category order by [Timestamp] desc);
-- Date-hours table
declare @FromDateTime datetime2(0);
declare @ToDateTime datetime2(0);
select @FromDateTime = min([Timestamp]), @ToDateTime = max([Timestamp]) from @t;
declare @DateDiff int = datediff(day, @FromDateTime, @ToDateTime);
declare @FromDate date = cast(@FromDateTime as date);
declare @FromHour int = datepart(hour, @FromDateTime);
declare @ToHour int = datepart(hour, @ToDateTime);
declare @DayHours table ([Date] date, [Hour] tinyint, primary key clustered ([Date], [Hour]) );
with N as
(
select n from (values (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)) t(n)
),
D as (
select
row_number() over(order by (select 1))-1 as d
from
N n1, N n2, N n3
),
H as (
select top 24
row_number() over(order by (select 1)) - 1 as h
from
N n1, N n2
)
insert into @DayHours
select dateadd(day, d, @FromDate), h
from
D, h
where
@FromHour <= (d * 100 + h)
and (d * 100 + h) <= (@DateDiff * 100 + @ToHour);
-- @PersonsIds & @Categories tables (just an imitation of the real tables)
declare @PersonsIds table (Id int primary key);
declare @Categories table (Category varchar(4) primary key);
insert into @PersonsIds select distinct PersonId from @t;
insert into @Categories select distinct Category from @t;
-- The result
select
dh.[Date],
dh.[Hour],
PersonId = p.Id,
c.Category,
l.Value
from
@PersonsIds p cross join @Categories c cross join @DayHours dh
left join @Latest l on l.[Date] = dh.[Date] and l.[Hour] = dh.[Hour] and l.PersonId = p.Id and l.Category = c.Category
order by
[Date], [Hour], PersonId, Category;
修改(1):
行。 为了将以前的值带到空白处, 让我们用这个语句替换最后的 select 语句:
select top 1 with ties
dh.[Date],
dh.[Hour],
PersonId = p.Id,
c.Category,
l.Value
from
@PersonsIds p cross join @Categories c cross join @DayHours dh
left join @Latest l
on (l.[Date] = dh.[Date] and l.[Hour] <= dh.[Hour] or l.[Date] < dh.[Date])
and l.PersonId = p.Id and l.Category = c.Category
order by
row_number()
over (partition by dh.[Date], dh.[Hour], p.Id, c.Category
order by l.[Date] desc, l.[Hour] desc);
编辑(2): 让我们尝试在聚集索引的临时表中收集笛卡尔积:PersonId,Category,[Date],[Hour]。
然后更新表格拖动未更改的值:
declare @Result table (
[Date] date,
[Hour] tinyint,
PersonId int,
Category varchar(4),
Value decimal(3,1)
primary key (PersonId, Category, [Date], [Hour]) -- Important !!!
)
insert into @Result
select
dh.[Date],
dh.[Hour],
PersonId = p.Id,
c.Category,
l.Value
from
@PersonsIds p cross join @Categories c cross join @DayHours dh
left join @Latest l on l.[Date] = dh.[Date] and l.[Hour] = dh.[Hour] and l.PersonId = p.Id and l.Category = c.Category
order by
[Date], [Hour], PersonId, Category;
declare @PersonId int;
declare @Category varchar(4);
declare @Value decimal(3,1);
update @Result set
@Value = Value = isnull(Value, case when @PersonId = PersonId and @Category = Category then @Value end),
@PersonId = PersonId,
@Category = Category;
为了获得更好的性能,请考虑使用临时表更改表变量并根据查询计划建议应用索引。
答案 1 :(得分:0)
如果我弄错了..它应该给你想要的结果。
select st.Date,
case when hour =1 then NULL
else hour
end as hour
,st.Person,st.Category,
(select status from reading qualify row_number() over (partition by personid
order by status desc)=1)
from structure;
答案 2 :(得分:0)
你可以在SQL中实现这一点,但它会很慢,因为对于每个人,类别,日和小时,你必须在此之前查找人和类别的最新条目。想想这个过程:在你的大表中选择一个记录,找到所有状态,然后订购它们并找到最新的并选择其值。这将在你的大桌上的每一条记录中完成。
您可能更好的方法是使用编程语言编写的程序简单地检索所有数据,并使用控制中断算法收集数据。
但是,让我们看看它是如何在SQL中完成的。
一个问题是SQL Server糟糕的日期/时间功能。我们想比较日期加上小时,这对于'yyyymmddhh'格式的字符串来说是最简单的,例如'2015101923'&lt; '2015102001'。在您的大表中,您有日期和小时,在状态表中您有日期时间。让我们看看我们如何获得所需的字符串:
convert(varchar(8), bigtable.calendarday, 112) +
right('0' + convert(varchar(2), bigtable.hour), 2)
和
convert(varchar(8), status.timestamp, 112) +
right('0' + convert(varchar(2), datepart(hour, status.timestamp)), 2)
由于这是 - 以及人和类别 - 我们查找记录的关键标准,您可能希望将其作为计算列并在两个表中添加索引(person + category + dayhourkey)。
您可以从大表中进行选择,并在子查询中获取状态值。为了获得最新的匹配记录,您需要按时间戳排序并限制为1条记录。
select
personid,
calendarday,
hour,
category,
(
select value
from status s
where s.personid = b.personid
and s.category = b.category
and convert(varchar(8), s.timestamp, 112) + right('0' + convert(varchar(2), datepart(hour, s.timestamp)), 2) <=
convert(varchar(8), b.calendarday, 112) + right('0' + convert(varchar(2), b.hour), 2)
order by s.timestamp desc limit 1
) as value
from bigtable b;