已编写一个包含重复项的存储过程。尝试了ROW_NUMBER,但无效。 DISTINCT
工作正常,但无法检索所需的大量记录(约700,000条)。还有另一种使用RANK或GROUP BY删除重复项的方法吗?
我已经使用DISTINCT了,这没有检索到足够的记录。我尚未成功使用GROUP BY。
我尝试使用ROW NUMBER,但这也不起作用(您可以在其中看到注释)。
CREATE PROCEDURE [report].[get_foodDetails]
@foodgroup_id INT,
@shop_id INT = 0,
@product_id INT = 0,
@maxrows INT = 600,
@expiry INT = 1,
@productactive INT = 1,
@expiryPeriod DATETIME = '9999-12-31 23:59:59'
AS
IF (@expiryPeriod >= '9999-12-31')
BEGIN
SET @expiryPeriod = GETDATE()
END
SELECT
-- dp.RowNumber
ISNULL([FoodType], '') AS [Foodtype],
ISNULL([FoodColour], '') AS [FoodColour],
ISNULL([FoodBarcode], '') AS [FoodBarcode],
ISNULL([FoodArticleNum], 0) AS [FoodArticleNum],
ISNULL([FoodShelfLife, '9999-21-31') AS [FoodShelfLIFe]
INTO
#devfood
FROM
report.[GetOrderList] (@foodgroup_id, @product_id, @productactive, @expiry, @expiryPeriod, @shop_id, @maxrows ) dp
INNER JOIN
food_group fg ON fg.food_group_id = it.item_FK_item_group_id
SELECT TOP(@maxrows) *
FROM #devfood
ORDER BY [device_packet_created_date]
END
检索到约700,000条记录。尽管有重复项,但目前已实现。使用DISTINCT时,只能检索到20,000个(但不能重复)。
答案 0 :(得分:0)
下面的示例代码来自我用来演示CTE的演示文稿。这是删除重复项的常用机制,并且非常快。在这种情况下,重复项将从表中直接删除。如果这不是您的目标,则可以使用临时表或先前的链接CTE。请注意,重要的是分区依据的列。在此示例中,如果仅按[名称]进行分区,则不会同时看到红玫瑰和白玫瑰。
-------------------------------------------------
if object_id(N'[flower].[order]', N'U') is not null
drop table [flower].[order];
go
create table [flower].[order]
(
[id] int identity(1, 1) not null constraint [flower.order.id.clustered_primary_key] primary key clustered
, [flower] nvarchar(128)
, [color] nvarchar(128)
, [count] int
);
go
insert into [flower].[order]
([flower]
, [color]
, [count])
values (N'rose',N'red',5),
(N'rose',N'red',3),
(N'rose',N'white',2),
(N'rose',N'red',1),
(N'rose',N'red',9),
(N'marigold',N'yellow',2),
(N'marigold',N'yellow',9),
(N'marigold',N'yellow',4),
(N'chamomile',N'amber',9),
(N'chamomile',N'amber',4),
(N'lily',N'white',12);
go
select [flower]
, [color]
from [flower].[order];
go
--
-------------------------------------------------
with [duplicate_finder]([name], [color], [sequence])
as (select [flower]
, [color]
, row_number()
over (
partition by [flower], [color]
order by [flower] desc) as [sequence]
from [flower].[order])
delete from [duplicate_finder]
where [sequence] > 1;
--
-- no duplicates
-------------------------------------------------
select [flower]
, [color]
from [flower].[order];
答案 1 :(得分:0)
我知道您说过您尝试过ROW_NUMBER
,但是您是否尝试过以下两种方式?
首先,一个CTE
。这里的CTE
只是您现有的查询,但是附加了ROW_NUMBER
窗口功能。对于记录的每个重复迭代,它将在RowNumber
中添加一个。对于下一个唯一的记录组,RowNumber
重置为1
。
拉取后,仅使用RowNumber = 1
记录。我一直在用这种方法从基础记录集中删除重复对象,但也可以很好地识别它们。
WITH NoDupes AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY
ISNULL(FoodType, '')
,ISNULL(FoodColour, '')
,ISNULL(FoodBarcode, '')
,ISNULL(FoodArticleNum, '')
,ISNULL(FoodShelfLife, '9999-21-31')
ORDER BY
(
SELECT
0
)
) AS RowNumber
,ISNULL(FoodType, '') AS Foodtype
,ISNULL(FoodColour, '') AS FoodColour
,ISNULL(FoodBarcode, '') AS FoodBarcode
,ISNULL(FoodArticleNum, 0) AS FoodArticleNum
,ISNULL(FoodShelfLife, '9999-21-31') AS FoodShelfLIFe
FROM
report.GetOrderList(@foodgroup_id, @product_id, @productactive, @expiry, @expiryPeriod, @shop_id, @maxrows) AS dp
INNER JOIN
food_group AS fg
ON
fg.food_group_id = it.item_FK_item_group_id
)
SELECT
nd.Foodtype
,nd.FoodColour
,nd.FoodBarcode
,nd.FoodArticleNum
,nd.FoodShelfLIFe
INTO
#devfood
FROM
NoDupes AS nd
WHERE
NoDupes.RowNumber = 1;
您也可以尝试SELECT TOP (1) WITH TIES
,使用相同的ROW_NUMBER
函数对记录集进行排序。 TOP (1) WITH TIES
部分在功能上与CTE
相同,只返回每组重复项的第一条记录。
SELECT
TOP (1) WITH TIES
ISNULL(FoodType, '') AS Foodtype
,ISNULL(FoodColour, '') AS FoodColour
,ISNULL(FoodBarcode, '') AS FoodBarcode
,ISNULL(FoodArticleNum, 0) AS FoodArticleNum
,ISNULL(FoodShelfLife, '9999-21-31') AS FoodShelfLIFe
INTO
#devfood
FROM
report.GetOrderList(@foodgroup_id, @product_id, @productactive, @expiry, @expiryPeriod, @shop_id, @maxrows) AS dp
INNER JOIN
food_group AS fg
ON
fg.food_group_id = it.item_FK_item_group_id
ORDER BY
ROW_NUMBER() OVER (PARTITION BY
ISNULL(FoodType, '')
,ISNULL(FoodColour, '')
,ISNULL(FoodBarcode, '')
,ISNULL(FoodArticleNum, '')
,ISNULL(FoodShelfLife, '9999-21-31')
ORDER BY
(
SELECT
0
)
);
对于下一个查看代码的人来说,CTE
可能会更清晰一些,但是TOP
的性能可能会好一些。