我有一张如下所示的表格。
我希望能够识别具有相同Custodians
的所有不同MD5hash
。
结果应该是ArtifactID
和Custodian
ID作为新行。例如:
1098647, 1098624 1098648, 1098717 1098648, 1098624 1098647, 1098717
表格如下:
ArtifactID md5Hash Custodian 1098647 e6ae2fbc906c42b55d25f6d660f4913a 1098624 1098648 e6ae2fbc906c42b55d25f6d660f4913a 1098717 1098649 9f0c88c40be3d01b6beed39b32dea3fb 1098624 1098650 39446d6f0a5b29fef001c184797349b4 1098624 1098651 35ec5012284256c97553b5342fd59530 1098624 1098652 0914cd30b41460efaab7d6703444a5de 1098624 1098653 929eefb170bc74ed3cfabae969a032ed 1098624 1098654 d8986a76130fde673bbf5f1f9fb82857 1098624 1098655 6399df1a2ca3fde7021da25e4aa9e722 1098624 1098656 a19701c034af4094bc3da149d1e9b8d1 1098624 1098657 8384d8e0562391ee02c731fc059b510c 1098624 1098658 94800202b4473f8ce3dc08ddea4aff0c 1098624 1098659 87388b9895c749147d5a19a8ccd9c865 1098624
答案 0 :(得分:2)
首先确定哪些哈希值与不同的保管人重复,然后检索这些保管人。
编辑:您希望的结果似乎涉及存储在表格中的隐式关系。我尝试在以下CTE中区分这种关系。这应该得到你所需要的。
IF OBJECT_ID('tempdb..#Data') IS NOT NULL
DROP TABLE #Data
CREATE TABLE #Data (
ArtifactID INT,
md5Hash VARCHAR(200),
Custodian INT)
INSERT INTO #Data (
ArtifactID,
md5Hash,
Custodian)
VALUES
(1098647, 'e6ae2fbc906c42b55d25f6d660f4913a', 1098624),
(1098648, 'e6ae2fbc906c42b55d25f6d660f4913a', 1098717),
(1098649, '9f0c88c40be3d01b6beed39b32dea3fb', 1098624),
(1098650, '39446d6f0a5b29fef001c184797349b4', 1098624),
(1098651, '35ec5012284256c97553b5342fd59530', 1098624),
(1098652, '0914cd30b41460efaab7d6703444a5de', 1098624),
(1098653, '929eefb170bc74ed3cfabae969a032ed', 1098624),
(1098654, 'd8986a76130fde673bbf5f1f9fb82857', 1098624),
(1098655, '6399df1a2ca3fde7021da25e4aa9e722', 1098624),
(1098656, 'a19701c034af4094bc3da149d1e9b8d1', 1098624),
(1098657, '8384d8e0562391ee02c731fc059b510c', 1098624),
(1098658, '94800202b4473f8ce3dc08ddea4aff0c', 1098624),
(1098659, '87388b9895c749147d5a19a8ccd9c865', 1098624)
;WITH Artifacts AS
(
SELECT DISTINCT
D.ArtifactID,
D.md5Hash
FROM
#Data AS D
),
Custodians AS
(
SELECT DISTINCT
D.Custodian,
D.md5Hash
FROM
#Data AS D
),
RepeatedHash AS
(
SELECT
T.md5Hash
FROM
Custodians AS T
GROUP BY
T.md5Hash
HAVING
COUNT(DISTINCT(T.Custodian)) > 1
)
SELECT
A.ArtifactID,
C.Custodian
FROM
RepeatedHash AS R
INNER JOIN Custodians AS C ON R.md5Hash = C.md5Hash
INNER JOIN Artifacts AS A ON R.md5Hash = A.md5Hash
答案 1 :(得分:1)
您可以在md5Hash
字段上自行加入表格。子查询将按md5Hash
字段对记录进行分组,并仅返回重复的记录:
SELECT ArtifactID, Custodian
FROM table1 t
INNER JOIN (SELECT md5Hash
FROM table1
GROUP BY md5Hash
HAVING COUNT(*) > 1
) tt ON t.md5Hash = tt.md5Hash
编辑您的更新表明您的表格未正确规范化。强烈建议您对表格进行标准化。要使用当前的表格设计获得所需的结果,您需要像上面的那个子查询,一个用于ArtifactID
的{{1}}和另一个用md5Hash
Custodian
的子查询,然后你可以在隐式关系md5Hash
上加入两个:
md5Hash