获取与分区上的最大值关联的标签(SQL)

时间:2012-04-27 18:00:51

标签: tsql

我知道必须有更好的方法来做到这一点,我今天脑子里已经死了。

我有两张桌子:

Reference
Id         Label
1          Apple
2          Banana
3          Cherry  

Elements
Id    ReferenceId    P1   P2   Qty
1     1              1    2    8
2     2              2    3    14
3     1              3    2    1
4     3              2    1    6
5     3              1    2    3

我想主要通过(P1,P2)将它们分组,但不依赖于P1和P2的顺序 - 这样(1,2)和(2,1)映射到同一组。没关系。

另一部分是我想获得给定P1,P2对的大和(qty)的标签 - 换句话说,我希望结果集为:

 P1   P2  TotalQty  MostRepresentativeLabel
 1    2   17        Cherry
 2    3   15        Banana

我能想到的就是这个可怕的混乱:

select endpoint1, endpoint2, totalTotal, mostRepresentativeLabelByQty from 
(
select SUM(qty)as total
,case when (p1<p2) then p1 else p2 end as endpoint1
,case when (p1<p2) then p2 else p1 end as endpoint2
,reference.label as mostRepresentativeLabelByQty

from elements inner join reference on elements.fkId = reference.id
group by case when (p1<p2) then p1 else p2 end
,case when (p1<p2) then p2 else p1 end
,label
) a inner join 
(
select MAX(total) as highestTotal, SUM(total) as totalTotal from 
(
select SUM(qty)as total
,case when (p1<p2) then p1 else p2 end as endpoint1
,case when (p1<p2) then p2 else p1 end as endpoint2
,reference.label as mostRepresentativeLabelByQty

from elements inner join reference on elements.fkId = reference.id
group by case when (p1<p2) then p1 else p2 end
,case when (p1<p2) then p2 else p1 end
,label
) byLabel
group by endpoint1, endpoint2
) b
on a.total = b.highestTotal

哪个......有效...但我不相信。这最终将在更大的数据集(200,000行左右)上运行,所以我不喜欢这种方法 - 是否有一种更简单的方式来表达“使用此列中的值,其中一些其他列最大化”我是我完全消隐了吗?

(顺便提一下SQL Server 2008 R2)

1 个答案:

答案 0 :(得分:1)

我使用P1和P2的BINARY_CHECKSUM之和来唯一地标识每个组。此SUM被识别 通过BC别名,并允许分组找到最大的组标签。

DECLARE @Reference TABLE(ID INT, Label VARCHAR(10));
DECLARE @Elements TABLE(ID INT, ReferenceID INT, P1 INT, P2 INT, Qty INT);

INSERT INTO @Reference VALUES
(1,'Apple')
, (2,'Banana')
, (3,'Cherry');

INSERT INTO @Elements VALUES
(1,1,1,2,8)
, (2,2,2,3,14)
, (3,1,3,2,1)
, (4,3,2,1,6)
, (5,3,1,2,3);

; WITH a AS (
    SELECT
    P1, P2=P2, Qty, BC=ABS(BINARY_CHECKSUM(CAST(P1 AS VARCHAR(10))))+ABS(BINARY_CHECKSUM(CAST(P2 AS VARCHAR(10))))
    , Label
    , LabelSum=SUM(Qty)OVER(PARTITION BY ABS(BINARY_CHECKSUM(CAST(P1 AS VARCHAR(10))))+ABS(BINARY_CHECKSUM(CAST(P2 AS VARCHAR(10)))),Label)
    , GroupSum=SUM(Qty)OVER(PARTITION BY ABS(BINARY_CHECKSUM(CAST(P1 AS VARCHAR(10))))+ABS(BINARY_CHECKSUM(CAST(P2 AS VARCHAR(10)))))
    FROM @Elements e
    INNER JOIN @Reference r on r.ID=e.ReferenceID
)
, r AS (
    SELECT *, rnk=RANK()OVER(PARTITION BY BC ORDER BY LabelSum DESC)
    FROM a
)
SELECT P1=MIN(P1)
, P2=MAX(P2)
, TotalQty=GroupSum
, MostRepresentativeLabel=Label
FROM r
WHERE rnk=1
GROUP BY GroupSum,Label
ORDER BY GroupSum DESC;
GO

结果:

enter image description here

编辑将每个BINARY_CHECKSUM包装在ABS中,以最大化每个组的BINARY_CHECKSUM总和的熵。因为BINARY_CHECKSUM是带符号的BIGINT,所以会减少 两个不同组之间发生碰撞的可能性,其中BINARY_CHECKSUM为正 负BINARY_CHECKSUM。