根据分区增加计数器

时间:2014-12-18 09:30:24

标签: sql sql-server sql-server-2008 tsql partitioning

假设我在SQL Server 2008中有一个看起来像这样的表:

UID    Std_RecordID
-------------------
1      10
2      10
3      12
4      10
5      10
6      10
7      12

基本上会有一系列10 s后跟一个1210也可以在列表14, 50, 21, 24, 31中,12可以分别在列表16, 52, 23, 26, 33中。增量2几乎表示集合的结束。

每次有新的设置时我都需要增加一个计数器。

我知道我可以得到一个计数器来增加和重置如下:

select ROW_NUMBER() over (partition by Std_RecordId order by UID) 'Ind'
,UID
from @inputTable

但这不是我想要的,因为它会产生以下结果:

UID    Std_RecordId    Ind
---------------------------
1      10              1
2      10              2
3      12              1
4      10              1
5      10              2
6      10              3
7      12              1

我需要它来做这样的事情:

UID    Std_RecordId    Ind
------------------------------
1      10              1
2      10              1
3      12              1
4      10              2
5      10              2
6      10              2
7      12              2

如何在不借助迭代的情况下完成此任务?我正在尝试摆脱我正在处理的过程中的迭代,因为迭代是过程中最慢的部分(它除此之外还有很多其他的东西)。

1 个答案:

答案 0 :(得分:2)

您可以将问题简化为"在当前记录"之前出现了值为12的记录多少次,然后在结果中添加一次。要获取记录数,您可以使用OUTER APPLY

DECLARE @InputTable TABLE (UID INT, Std_RecordId INT);
INSERT @InputTable (UID, Std_RecordId)
VALUES (1, 10), (2, 10), (3, 12), (4, 10), (5, 10), (6, 10), (7, 12);

SELECT  i.UID,
        i.Std_RecordId,
        t.Ind
FROM    @InputTable AS i
        OUTER APPLY
        (   SELECT  Ind = COUNT(*) + 1
            FROM    @InputTable AS t
            WHERE   t.Std_RecordId = 12
            AND     t.UID < i.UID
        ) AS t;

修改

为了详细说明我在关于临时表而不是表变量的评论中所说的内容,在我的测试中,具有完全相同数据的完全相同的查询一直运行得更快:

我跑的脚本是:

DECLARE @InputTable TABLE (UID INT, Std_RecordId INT);

INSERT @InputTable (UID, Std_RecordId)
SELECT  TOP 50000
        ROW_NUMBER() OVER(ORDER BY a.object_id),
        CEILING(RAND(CHECKSUM(NEWID())) * 20)
FROM    sys.all_objects a
        CROSS JOIN sys.all_objects b;

CREATE TABLE #InputTable (UID INT, Std_RecordId INT);
INSERT #InputTable (UID, Std_RecordId)
SELECT  UID, Std_RecordId
FROM    @InputTable;

SET STATISTICS TIME ON;

SELECT  i.UID,
        i.Std_RecordId,
        t.Ind
FROM    @InputTable AS i
        OUTER APPLY
        (   SELECT  Ind = COUNT(*) + 1
            FROM    @InputTable AS t
            WHERE   t.Std_RecordId = 12
            AND     t.UID < i.UID
        ) AS t;

SELECT  i.UID,
        i.Std_RecordId,
        t.Ind
FROM    #InputTable AS i
        OUTER APPLY
        (   SELECT  Ind = COUNT(*) + 1
            FROM    #InputTable AS t
            WHERE   t.Std_RecordId = 12
            AND     t.UID < i.UID
        ) AS t;

SET STATISTICS TIME OFF;

DROP TABLE #InputTable;

随着我增加样本大小,差距变得更大,但是对于10,000行(我厌倦了再等待),表变量一直花费大约7.9秒,而临时表平均为0.4。我运行了一次50,000行,表变量花了190秒,临时表花了4.6,所以差别很大。

另一个优点是你的临时表可以被编入索引,但是我在临时表中找到的最佳性能是创建一个新的临时表来记录你的标记的位置,然后使用它来给你的原始表一个排名:

DECLARE @InputTable TABLE (UID INT, Std_RecordId INT);

INSERT @InputTable (UID, Std_RecordId)
SELECT  TOP 1000000
        ROW_NUMBER() OVER(ORDER BY a.object_id),
        CEILING(RAND(CHECKSUM(NEWID())) * 20)
FROM    sys.all_objects a
        CROSS JOIN sys.all_objects b;

DECLARE @Counter TABLE (UID INT PRIMARY KEY, Ind INT NOT NULL);
INSERT @Counter
SELECT  UID, ROW_NUMBER() OVER(ORDER BY UID) + 1
FROM    @InputTable
WHERE   Std_RecordId = 12;

SELECT  i.UID,
        i.Std_RecordId,
        Ind = ISNULL(t.Ind, 1)
FROM    @InputTable AS i
        OUTER APPLY
        (   SELECT  TOP 1 Ind
            FROM    @Counter AS t
            WHERE   t.UID < i.UID
        ) AS t
ORDER BY i.UID;

对于50,000行,这种情况在不到一秒的时间内持续运行,即使在15-20秒内也运行了100万行。