我正在尝试压缩数据库表中的数据,该表包含具有不同列数据的唯一记录的多个实例。
我想为每个列选择最高出现值 特殊的独特记录
但是我的SQL事务无法正常工作。
[dataBase1].[dbo].[table1]
有数十万条记录,其中有几列(Name, Place, etc.)
。
[dataBase1].[dbo].[table2]
包含[table1]
中唯一名称的列表以及其余为空的列(页头等)的标题。
我尝试了以下代码。
DECLARE @name varchar(max);
DECLARE @place varchar(max);
DECLARE db_cursor SCROLL CURSOR FOR
SELECT [Name]
FROM [dataBase1].[dbo].[table2];
OPEN HostName_cursor
FETCH NEXT FROM db_cursor INTO @name;
WHILE @@FETCH_STATUS = 0
BEGIN
SELECT DISTINCT TOP(1) @place = [Place]
FROM [dataBase1].[dbo].[table1]
WHERE [Name] = @name
AND [Place] IS NOT NULL AND [Place] <> ''
AND (EXISTS (SELECT [Place], COUNT (*) AS TOTAL
FROM [dataBase1].[dbo].[table1]
GROUP BY [Place]))
GROUP BY [Place];
UPDATE [dataBase1].[dbo].[table2]
SET [Place] = @place
WHERE [Name] = @name;
SET @place = '';
FETCH NEXT FROM db_cursor INTO @name
END
特定唯一[Place]
的 [Name]
列具有 53 值,最高重复值计数为 3 。本质上,我想为每个唯一的[Name]
自动执行以下SQL事务。
SELECT DISTINCT TOP 1
[Place], COUNT (*) TOTAL
FROM
[dataBase1].[dbo].[table1]
WHERE
[Name] = 'xxxxxx'
AND [Place] IS NOT NULL AND [Place] <> ''
GROUP BY [Place]
ORDER BY TOTAL DESC;
答案 0 :(得分:0)
这可以通过许多步骤来完成,每个步骤都在接下来的步骤中进行。您想一次处理所有名称和地点。
首先,您要统计每个名称的数量,地点组合,因此请按名称和地点分组并计算地点。您的查询将如下所示
SELECT name, place, COUNT(place) as placecount
FROM table1
GROUP BY name, place
现在,您需要找到计数最高的一个,如果是平局,则需要按字母顺序查找第一个。您可以通过对上述结果进行ROW_NUMBER,重新启动名称的计数(分区),然后按地点计数,然后按地点排序来解决关系,来完成此操作。使用CTE(您也可以将其作为子查询来执行),看起来就像
WITH places as (
SELECT name, place, COUNT(place) as placecount
FROM table1
GROUP BY name, place
)
SELECT name, place, ROW_NUMBER() OVER (PARTITION BY name ORDER BY placecount, place) as RN
FROM places
如果查看这些数据,则任何给定名称的位置都应该在RN为1的行上。这样,您就可以通过这样的查询获得所需的最终数据
WITH places as (
SELECT name, place, COUNT(place) as placecount
FROM table1
GROUP BY name, place
), orderplaces as (
SELECT name, place, ROW_NUMBER() OVER (PARTITION BY name ORDER BY placecount, place) as RN
FROM places
)
Select name, place
FROM orderplaces
WHERE RN = 1
由于要使用此位置数据更新table2而不是查看它,您将在最终查询中加入table2并进行更新,类似这样
WITH places as (
SELECT name, place, COUNT(place) as placecount
FROM table1
GROUP BY name, place
), orderplaces as (
SELECT name, place, ROW_NUMBER() OVER (PARTITION BY name ORDER BY placecount, place) as RN
FROM places
)
UPDATE T2 set place = OP.place
FROM orderplaces OP
INNER JOIN table2 T2 on T2.name = OP.name
WHERE RN = 1;