下表包含PC资产信息,我需要根据不同的标准从中删除数据片段。
我需要在SQL Server 2005中创建一个返回结果的视图。
我尝试使用临时表来实现目标,直到我意识到我无法在View中使用临时表。
然后我尝试使用CTE,直到我意识到从CTE删除数据也会删除实际表中的数据。
我无法从实际表中删除数据。我无法在数据库中创建另一个表。
该表有160,000条记录。
表格:
TABLE dsm_hardware_basic
(
[UUID] binary(16) -- Randomly generated 16 digit key that is unique for each record, only column with no duplicate rows.
[HostUUID] binary(16) -- Randomly generated 16 digit key, column has duplicate rows.
[Name] nvarchar(255) -- Column that contains hostnames of computer assets. Example of record: PCASSET001. Column has duplicate rows.
[LastAgentExecution] datetime -- The last time that the software agent that collects asset information ran on the PC.
[HostName] nvarchar(255) -- The fully qualified domain name of the PC. Example of record: PCASSET001.companydomain.com. Column has duplicate rows.
)
我将解释我想要完成的事情:
1)读入表dbo.dsm_hardware_basic中的所有信息。让我们称之为:dsm_hardware_basic_copy。
2)查询dbo.dsm_hardware_basic并从dsm_hardware_basic_copy中删除符合以下条件的数据。 这基本上删除了具有最早[LastAgentExecution]时间的重复[HostUUID]。:
SELECT ,dsm_hardware_basic.[HostUUID]
,MIN(dsm_hardware_basic.[LastAgentExecution]) AS [LastAgentExecution]
FROM dsm_hardware_basic
WHERE dsm_hardware_basic.[HostUUID] <> ''
GROUP BY dsm_hardware_basic.[HostUUID]
HAVING COUNT(*) = 2 -- The tiny amount of rows where this count is >2 will be left alone.
3)Additionaly查询dbo.dsm_hardware_basic并从dsm_hardware_basic_copy中删除符合以下条件的数据: 这基本上删除了具有最旧[LastAgentExecution]时间的副本[HostName]。:
SELECT ,dsm_hardware_basic.[HostName]
,MIN(dsm_hardware_basic.[LastAgentExecution]) AS [LastAgentExecution]
FROM dsm_hardware_basic
WHERE dsm_hardware_basic.[HostName] <> ''
GROUP BY dsm_hardware_basic.[HostName]
HAVING COUNT(*) > 1
我不确定如何在上面的选择中执行此操作,但不仅应该[HostName]的COUNT是&gt; 1,但[Name]应该等于[HostName]中第一个句点之前的[HostName]中的所有内容。示例[名称]:PCASSET001。示例[HostName]:PCASSET001.companydomain.com。我知道这听起来很奇怪,考虑到我们在这两列中讨论的PC数据类型,但这是我真正需要应对的事情。
3)另外查询dbo.dsm_hardware_basic并从dsm_hardware_basic_copy中删除符合以下条件的数据:
这基本上删除了具有最早[LastAgentExecution]时间的副本[Name]。:
SELECT ,dsm_hardware_basic.[Name]
,MIN(dsm_hardware_basic.[LastAgentExecution]) AS [LastAgentExecution]
FROM dsm_hardware_basic
WHERE dsm_hardware_basic.[Name] <> ''
GROUP BY dsm_hardware_basic.[Name]
HAVING COUNT(*) = 2 -- The tiny amount of rows where this count is >2 will be left alone.
答案 0 :(得分:0)
你实际上已经在这里提出了几个不同的问题,我不确定我是否完全遵循查询的逻辑,但构建它应该不会太困难。
首先,您可以直接使用dsm_hardware_basic
而不是副本:
SELECT
*
FROM dsm_hardware_basic
现在是
的部分删除带有最旧[LastAgentExecution]的重复[HostUUID] 时间
SELECT
dsm_hardware_basic.*
FROM dsm_hardware_basic
INNER JOIN
(
SELECT [UUID], ROW_NUMBER() OVER
(PARTITION BY [HostUUID]
ORDER BY [LastAgentExecution] DESC) AS host_UUID_rank
FROM dsm_hardware_basic
WHERE
[HostUUID] <> ''
) AS
duplicate_host_UUID_filtered ON dsm_hardware_basic.UUID = duplicate_host_UUID_filtered.UUID
AND duplicate_host_UUID_filtered.host_UUID_rank = 1
我们所做的是按最新HostUUID
排序LastAgentExecution
对您的表格进行分区,并使用JOIN
从查询中删除与我们的结果匹配的每个UUID。
我们现在可以对您的HostName
:
SELECT
dsm_hardware_basic.*
FROM dsm_hardware_basic
INNER JOIN
(
SELECT [UUID], ROW_NUMBER() OVER
(PARTITION BY [HostUUID]
ORDER BY [LastAgentExecution] DESC) AS host_UUID_rank
FROM dsm_hardware_basic
WHERE
[HostUUID] <> ''
) AS
duplicate_host_UUID_filtered ON dsm_hardware_basic.UUID = duplicate_host_UUID_filtered.UUID
AND duplicate_host_UUID_filtered.host_UUID_rank = 1
INNER JOIN
(
SELECT [UUID], ROW_NUMBER() OVER
(PARTITION BY [HostName]
ORDER BY [LastAgentExecution] DESC) AS host_UUID_rank
FROM dsm_hardware_basic
WHERE
[HostName] <> ''
) AS
duplicate_HostName_filtered ON dsm_hardware_basic.UUID = duplicate_HostName_filtered.UUID
AND duplicate_HostName_filtered.host_UUID_rank = 1
我将把最后一部分作为练习留给你。最后,在完成调试后,只需添加CREATE VIEW
即可。