Question

我有三张表从三个数据源全天接收新信息。

Table A     Table B     Table C
5, 8:00     J, 8:00     3, 8:00
6, 8:01     K, 8:02     8, 8:04
4, 8:03
9, 8:06

在一天结束时，我想按时间顺序处理数据，我需要最新的三条信息，因为三个时间戳中的任何一个都会发生变化。我想要的结果是：

Table A     Table B     Table C     *Data As of*
5, 8:00     J, 8:00     3, 8:00     *8:00*
6, 8:01     J, 8:00     3, 8:00     *8:01*
6, 8:01     K, 8:02     3, 8:00     *8:02*
4, 8:03     K, 8:02     3, 8:00     *8:03*
4, 8:03     K, 8:02     8, 8:04     *8:04*
9, 8:06     K, 8:02     8, 8:04     *8:06*

我目前正在将3个查询转储到3个数据表中。然后我通过采用三个中最早的时间戳来一次性迭代所有三个。这有效，但有点麻烦。一张桌子每天有大约300万条记录，一条有200条，一条有一把。有时我一次处理20天的数据。关于最佳方法的想法？

Answer 1

至少有一种方法可以做到这一点。可能需要对性能进行一些分析，但这假定您创建一个具有不同时间的表。如果仅仅在分钟级别上（或者你在桌面上拥有的任何东西）这是不够的，你可以在运行它之前从每个表中“插入#time选择不同的时间...”，但这可能是相当的也很重。

select distinct
  a.id as a_id,
  a.time as a_time,
  b.id as b_id,
  b.time as b_time,
  c.id as c_id,
  c.time as c_time
from
  time t

  outer apply (
    select top 1 id, time
    from tablea a
    where a.time <= t.time
    order by a.time desc
  ) a

  outer apply (
    select top 1 id, time
    from tableb b
    where b.time <= t.time
    order by b.time desc
  ) b

  outer apply (
    select top 1 id, time
    from tablec c
    where c.time <= t.time
    order by c.time desc
  ) c

order by 
  a_time, 
  b_time, 
  c_time

SQL小提琴：http://sqlfiddle.com/#!3/de7ae/6

Answer 2

尝试下一个脚本（SQL2012 +）：

-- Step #1: it creates a table to store all distinct TS
CREATE TABLE #AllTS (TS DATETIME NOT NULL PRIMARY KEY) -- Change type of TS column with the proper data type

-- Step #2: it inserts distinct (UNION) TS values
INSERT  #AllTS
SELECT  TS
FROM (
    SELECT TS FROM dbo.A
    UNION SELECT TS FROM dbo.B
    UNION SELECT TS FROM dbo.C
) x(TS)

-- Step #3: for every source table use bellow query to generate requested resultset
SELECT  MAX(y.Col1)OVER(PARTITION BY GroupID) AS Col1,
        MAX(y.TS)OVER(PARTITION BY GroupID) AS TS
FROM (
    SELECT  a.Col1, a.TS, SUM(CASE WHEN a.TS IS NOT NULL THEN 1 ELSE 0 END) OVER(ORDER BY x.TS) AS GroupID
    FROM    #AllTS x LEFT JOIN dbo.A a ON x.TS = a.TS
) y

注意1：您应该尝试使用在TS列上的每个源表上创建的索引来加速上述查询。例如：

CREATE INDEX IX_A_TS_#_Col1 ON dbo.A(TS) INCLUDE (Col1)

注意2：另外，为了提高上一个查询的性能，您可以测试不同的联合提示：

#AllTS x LEFT HASH JOIN dbo.A -- Could be useful  when source tables are "big"

或

#AllTS x MERGE JOIN dbo.A

Demo

将三个DB表与唯一时间戳匹配的最佳方法是什么？

2 个答案: