我有三张表从三个数据源全天接收新信息。
Table A Table B Table C
5, 8:00 J, 8:00 3, 8:00
6, 8:01 K, 8:02 8, 8:04
4, 8:03
9, 8:06
在一天结束时,我想按时间顺序处理数据,我需要最新的三条信息,因为三个时间戳中的任何一个都会发生变化。我想要的结果是:
Table A Table B Table C *Data As of*
5, 8:00 J, 8:00 3, 8:00 *8:00*
6, 8:01 J, 8:00 3, 8:00 *8:01*
6, 8:01 K, 8:02 3, 8:00 *8:02*
4, 8:03 K, 8:02 3, 8:00 *8:03*
4, 8:03 K, 8:02 8, 8:04 *8:04*
9, 8:06 K, 8:02 8, 8:04 *8:06*
我目前正在将3个查询转储到3个数据表中。然后我通过采用三个中最早的时间戳来一次性迭代所有三个。这有效,但有点麻烦。一张桌子每天有大约300万条记录,一条有200条,一条有一把。有时我一次处理20天的数据。关于最佳方法的想法?
答案 0 :(得分:2)
至少有一种方法可以做到这一点。可能需要对性能进行一些分析,但这假定您创建一个具有不同时间的表。如果仅仅在分钟级别上(或者你在桌面上拥有的任何东西)这是不够的,你可以在运行它之前从每个表中“插入#time选择不同的时间...”,但这可能是相当的也很重。
select distinct
a.id as a_id,
a.time as a_time,
b.id as b_id,
b.time as b_time,
c.id as c_id,
c.time as c_time
from
time t
outer apply (
select top 1 id, time
from tablea a
where a.time <= t.time
order by a.time desc
) a
outer apply (
select top 1 id, time
from tableb b
where b.time <= t.time
order by b.time desc
) b
outer apply (
select top 1 id, time
from tablec c
where c.time <= t.time
order by c.time desc
) c
order by
a_time,
b_time,
c_time
答案 1 :(得分:0)
尝试下一个脚本(SQL2012 +):
-- Step #1: it creates a table to store all distinct TS
CREATE TABLE #AllTS (TS DATETIME NOT NULL PRIMARY KEY) -- Change type of TS column with the proper data type
-- Step #2: it inserts distinct (UNION) TS values
INSERT #AllTS
SELECT TS
FROM (
SELECT TS FROM dbo.A
UNION SELECT TS FROM dbo.B
UNION SELECT TS FROM dbo.C
) x(TS)
-- Step #3: for every source table use bellow query to generate requested resultset
SELECT MAX(y.Col1)OVER(PARTITION BY GroupID) AS Col1,
MAX(y.TS)OVER(PARTITION BY GroupID) AS TS
FROM (
SELECT a.Col1, a.TS, SUM(CASE WHEN a.TS IS NOT NULL THEN 1 ELSE 0 END) OVER(ORDER BY x.TS) AS GroupID
FROM #AllTS x LEFT JOIN dbo.A a ON x.TS = a.TS
) y
注意1:您应该尝试使用在TS
列上的每个源表上创建的索引来加速上述查询。例如:
CREATE INDEX IX_A_TS_#_Col1 ON dbo.A(TS) INCLUDE (Col1)
注意2:另外,为了提高上一个查询的性能,您可以测试不同的联合提示:
#AllTS x LEFT HASH JOIN dbo.A -- Could be useful when source tables are "big"
或
#AllTS x MERGE JOIN dbo.A