在sql server中为大字节创建一个狭窄的唯一值的最佳方法是什么?

时间:2014-08-30 03:11:22

标签: sql sql-server wcf tsql

我在一张支持WCF应用程序的表中有相当多的记录。该应用程序基本上检查是否存在recored,如果没有找到它会插入该特定记录。

它检查存在的关键字段之一是VARBINARY(256).

我目前正在使用HASHBYTES()使用SHA2_256算法将字节数减少到32个字节。

CREATE TABLE BlobTable ( BlobID INT, Blob VARBINARY(256), BlobHash VARBINARY(32)) 
DECLARE @Bin VARBINARY(256) = CRYPT_GEN_RANDOM(256) 
DECLARE @BinHash VARBINARY(32) = HASHBYTES('SHA2_256', @Bin)
DECLARE @Bin VARBINARY(256)

SELECT @_Bin = Blob FROM dbo.BlobTable WITH (ROWLOCK, READPAST) WHERE BlobHash = @BinHash
IF (@_Bin IS NULL)
BEGIN
    INSERT INTO dbo.BlobTable (Blob, BlobHash) VALUES (@Bin, @BinHash)
END

有没有办法在上面的select语句中降低查询成本?有没有办法让VARBINARY(256)字段的唯一值更短,如VARBINARY(16)或更短,但仍然避免重复?

由于

2 个答案:

答案 0 :(得分:0)

CREATE UNIQUE INDEX blobhash_index ON BlobTable ( BlobHash);

SELECT LEN(HASHBYTES('MD5', Blob)) FROM BlobTable

答案 1 :(得分:0)

我认为这里不需要ROWLOCK / READPAST。性能的关键是索引。我使用下面的T-SQL运行了一个性能,并使用10线程测试工具观察到大约5K /秒的速率。复合唯一约束键将允许快速确定行的不存在,并且仍然允许在不太可能的哈希冲突事件中插入不同的blob。请注意,如果不同的线程尝试同时插入相同的blob,那么缺少可序列化可能会导致重复的密钥冲突,因此您的代码需要处理它。

CREATE TABLE dbo.BlobTable(
      BlobID INT IDENTITY
        CONSTRAINT PK_BlobTable PRIMARY KEY CLUSTERED
    , Blob VARBINARY(256)
    , BlobHash VARBINARY(32)
    );
GO

--load 3M rows
WITH 
     t4 AS (SELECT n FROM (VALUES(0),(0),(0),(0)) t(n))
    ,t256 AS (SELECT 0 AS n FROM t4 AS a CROSS JOIN t4 AS b CROSS JOIN t4 AS c CROSS JOIN t4 AS d)
    ,t16M AS (SELECT ROW_NUMBER() OVER (ORDER BY (a.n)) AS num FROM t256 AS a CROSS JOIN t256 AS b CROSS JOIN t256 AS c)
INSERT INTO dbo.BlobTable WITH(TABLOCKX) (Blob, BlobHash)
SELECT Blob, HASHBYTES('SHA2_256', Blob)
FROM (
    SELECT CRYPT_GEN_RANDOM(256) AS Blob
    FROM t16M
    WHERE num <= 3000000) AS Blobs;
UPDATE STATISTICS dbo.BlobTable WITH FULLSCAN;
ALTER TABLE dbo.BlobTable
    ADD CONSTRAINT UQ_BlobTable1_Blob_BlobHash UNIQUE NONCLUSTERED(BlobHash, Blob);
CHECKPOINT;
GO

CREATE PROC dbo.usp_insert_BlobTable
AS
SET NOCOUNT ON;

DECLARE @Bin VARBINARY(256) = CRYPT_GEN_RANDOM(256);
DECLARE @BinHash VARBINARY(32) = HASHBYTES('SHA2_256', @Bin);
DECLARE @_Bin VARBINARY(256);

INSERT INTO dbo.BlobTable (Blob, BlobHash)
SELECT @Bin, @BinHash
WHERE NOT EXISTS(
    SELECT *
    FROM dbo.BlobTable
    WHERE
        BlobHash = @BinHash
        AND Blob = @Bin
    );