要分析我导入的跟踪文件,它希望Select Distinct TextData from myImportedTraceFile
我尝试使用hashbyte,但我不确定MD5
是否是创建唯一标识符的正确工具。即使是这种情况(请告诉我,如果是这样)那么我仍然有问题
HASHBYTES('MD5', CAST(TextData AS varchar(7999))) As TextData_HashBytes
剪切几行(see this reply)如何为Select Distinct TextData from ..
列中的每个唯一值(TextData
)创建唯一标识符?
基于Dan的帖子我创建了这个测试用例
Drop Table #Temp
Create Table #Temp
(
A int,
B NText
)
Insert Into #Temp ( A, B)
Select 1, 'some space' UNION ALL
Select 2, ' some space' UNION ALL
Select 3, ' some space ' UNION ALL
Select 4, 'some space ' UNION ALL
Select 5, ' some space ' UNION ALL
Select 6, ' some space '
-- this returns 6 rows
Select
HASHBYTES('MD5', CAST(B AS nvarchar(MAX)))
, CAST(B AS nvarchar(MAX)) as B from #Temp;
-- this returns 3 rows
SELECT NEWID() AS UniqueID, B FROM
( Select DISTINCT CAST(B AS nvarchar(MAX)) AS B
FROM #Temp
) sq
这三行是结果
' some space ' -- 2sp B + 1sp E --> row 5
' some space' -- 1sp B + 0sp E --> row 2
'some space ' -- 0sp B + 3sp E --> row 4
目前还不清楚第1行(0sp),3(1sp B + E)和6(2sp B + E)是如何处理的。 所以有些空格被删除了。
答案 0 :(得分:1)
您可以将派生表与SELECT DISTINCT
一起使用:
SELECT NEWID() AS UniqueID, TextData
FROM (
SELECT DISTINCT CAST(TextData AS nvarchar(MAX)) AS TextData
FROM myImportedTraceFile
) AS UniqueQueries;