对于重叠日期范围(SQL Server 2008),按日分组的最有效方法是什么?

时间:2015-02-06 04:06:12

标签: sql sql-server sql-server-2008 tsql query-optimization

考虑如下简化表格T1:

CREATE TABLE dbo.T1 (
    id        INTEGER       NOT NULL
    ,measure  NUMERIC(15,2) NOT NULL
    ,begin_dt DATE          NOT NULL
    ,end_dt   DATE          NOT NULL
);

假设约束/业务逻辑确保尽管每个id可以有多个记录,但单个id没有重叠的日期范围,单个id没有日期范围间隙。如,

id   | measure |  begin_dt  |   end_dt
-----------------------------------------
1    |  100.00 | 2012-05-07 | 2012-05-30
1    |  200.00 | 2012-05-31 | 2013-10-11
1    |   50.00 | 2013-10-12 | 2013-10-13
1    |    0.00 | 2013-10-14 | 9999-12-31
2    | 1234.56 | 2002-02-25 | 9999-12-31
3    |    9.87 | 2014-01-31 | 2014-02-15
3    |   50.00 | 2014-02-16 | 2015-01-04
3    |    0.00 | 2015-01-05 | 9999-12-31
...

现在,我的目标是生成一个结果集,该结果集显示T1中每个唯一的begin_dt的一条记录,以及具有正测量值的id的计数以及所有id'中的测量字段的总和。该日期落在begin_dt和end_dt之间的s。所以,如下所示:

    dt     | count_of_ids | sum_of_measure 
-------------------------------------------
2002-02-25 |      1       |   1234.56 
2012-05-07 |      2       |   1334.56 
2012-05-31 |      2       |   1434.56 
2013-10-12 |      2       |   1284.56 
2013-10-14 |      1       |   1234.56 
2014-01-31 |      2       |   1244.43 
2014-02-16 |      2       |   1284.56
2015-01-05 |      1       |   1234.56
... 

我目前的解决方案基本上如下:

SELECT *
FROM (
    SELECT DISTINCT t1.begin_dt AS dt
    FROM dbo.T1 AS t1
) AS dt_s
CROSS APPLY (
    SELECT COUNT(t1.id)     AS count_of_ids
           ,SUM(t1.measure) AS sum_of_measure
    FROM dbo.T1 AS t1
    WHERE t1.measure > 0
          AND dt_s.dt BETWEEN t1.begin_dt AND t1.end_dt
) AS t1_x
ORDER BY dt_s.dt DESC;

执行大约需要3.5分钟(在实际数据集上有~10MM记录,~2,500个独特日期以及更多字段,度量和聚合要处理) - 我希望有' sa得到那个< 10秒左右。

我尝试过其他方法(使用UDF / CTE /等),但它们似乎都遵循相同的执行计划。我还没有太多关于优化方面的经验,所以我非常希望听到别人对这方面最好的一般方法。提前谢谢!

1 个答案:

答案 0 :(得分:0)

尝试使用以下代码:

SELECT  t1.begin_dt AS dt,COUNT(t2.id) AS count_of_ids,SUM(t1.measure) AS sum_of_measure
    FROM dbo.T1 AS t1
    JOIN dbo.T1 AS t2 ON t1.begin_dt BETWEEN t2.begin_dt AND t2.end_dt
    GROUP BY t1.begin_dt;

通过使用begin_dt,end_dt上的索引和收敛字段ID和度量,可以明显提高性能。 希望这有帮助!