使用T-SQL,如何选择非关键非索引列的n行并避免重复结果?
示例表:
ID_ | state | customer | memo
------------------------------------------
1 | abc | 123 | memo text xyz
2 | abc | 123 | memo text abc
3 | abc | 456 | memo text def
4 | abc | 456 | memo text rew
5 | abc | 789 | memo text yte
6 | def | 123 | memo text hrd
7 | def | 432 | memo text dfg
我想为州'abc'选择2个备忘录,但返回的备忘录不应该是同一个客户。
memo
----
memo text xyz
memo text def
PS:唯一可用的选择条件是state(例如:where state ='abc')
我设法以非常低效的方式做到了这一点
SELECT top 2 MAX(memo)
FROM table
WHERE state = 'abc'
GROUP BY customer
这适用于小样本量,但生产表有超过10亿行。
答案 0 :(得分:4)
您可以尝试在实际数据库大小中使用以下查询。不确定具有十亿行的数据库表中的性能。所以你可以自己做测试。
SELECT memo
FROM (SELECT memo,
ROW_NUMBER() OVER (PARTITION BY customer ORDER BY (SELECT 0)) AS RN
FROM table1 WHERE state = 'abc') T
WHERE RN = 1
您可以查看 SQL FIDDLE
编辑:在状态和客户上添加非聚集索引(包括备忘录)将极大地提高性能。
CREATE NONCLUSTERED INDEX [custom_index] ON table
(
[state] ASC,
[customer] ASC
)
INCLUDE ( [memo]) WITH (SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF) ON [DATA]
答案 1 :(得分:1)
为州/客户获取n个不同值的方法是获取每个组的ID
SELECT MIN(ID_) ID
FROM Table1
GROUP BY State, customer
(MIN
可以用MAX
代替,它只是获取其中一个值的一种方式)
然后JOIN
添加其他条件的表格
WITH getID AS (
SELECT MIN(ID_) ID
FROM Table1
GROUP BY State, customer
)
SELECT TOP 2
t.ID_, t.State, t.Customer, t.memo
FROM table1 t
INNER JOIN getID g ON t.ID_ = g.ID
WHERE t.state = 'abc'
如果您的SQLServer版本不支持WITH
CTE
可以成为子查询
SELECT TOP 2
t.ID_, t.State, t.Customer, t.memo
FROM table1 t
INNER JOIN (SELECT MIN(ID_) ID
FROM Table1
GROUP BY State, customer
) g ON t.ID_ = g.ID
WHERE t.state = 'abc'
另一种方法是使用CROSS APPLY
获取不同的ID
SELECT TOP 2
t.ID_, t.State, t.Customer, t.memo
FROM table1 t
CROSS APPLY (SELECT TOP 1
ID_
FROM table1 t1
WHERE t1.State = t.State AND t1.Customer = t.Customer) c
WHERE t.state = 'abc'
AND c.ID_ = t.ID_;