确保MS SQL中多个大型URL字段的唯一性

时间:2011-09-15 13:55:08

标签: indexing unique-constraint

我有一个包含以下定义的表:

CREATE TABLE url_tracker (
    id int not null identity(1, 1),
    active bit not null,
    install_date int not null,
    partner_url nvarchar(512) not null,
    local_url nvarchar(512) not null,
    public_url nvarchar(512) not null,
    primary key(id)
);

我要求这三个网址始终是唯一的 - 任何单个网址都可以多次出现,但三者的组合必须是唯一的(对于某一天)。

最初我以为我可以这样做:

CREATE UNIQUE INDEX uniques ON url_tracker 
(install_date, partner_url, local_url, public_url);

然而,这给了我警告:

Warning! The maximum key length is 900 bytes. The index 'uniques' has maximum
length of 3076 bytes. For some combination of large values, the insert/update
operation will fail.

四处搜索我了解到INCLUDE的{​​{1}}参数,但根据this question将命令转换为使用CREATE INDEX将不会强制对URL进行唯一性。

INCLUDE

如何在几个相对较大的nvarchar字段上强制实现唯一性?


分辨率

因此,从评论和答案以及更多研究中我得出结论,我可以这样做:

CREATE UNIQUE INDEX uniques ON url_tracker (install_date)
INCLUDE (partner_url, local_url, public_url);

思想?

2 个答案:

答案 0 :(得分:4)

我会使用hash的URL创建一个计算列,然后对其进行唯一索引/约束。考虑将哈希值设为持久计算列。插入后不必重新计算。

答案 1 :(得分:3)

根据评论中的对话提出意见。假设您可以将URL的数据类型更改为VARCHAR(900)(或NVARCHAR(450),如果您确实认为需要Unicode URL)并且对URL长度的限制感到满意,则此解决方案可以正常工作。这也假定SQL Server 2008或更高版本。请始终指定您正在使用的版本; 不够具体,因为解决方案可能会因版本而异。

设定:

USE tempdb;
GO

CREATE TABLE dbo.urls
(
    id INT IDENTITY(1,1) PRIMARY KEY,
    url VARCHAR(900) NOT NULL UNIQUE
);

CREATE TABLE dbo.url_tracker 
(
    id INT IDENTITY(1,1) PRIMARY KEY,
    active BIT NOT NULL DEFAULT 1,
    install_date DATE NOT NULL DEFAULT CURRENT_TIMESTAMP,
    partner_url_id INT NOT NULL REFERENCES dbo.urls(id),
    local_url_id   INT NOT NULL REFERENCES dbo.urls(id),
    public_url_id  INT NOT NULL REFERENCES dbo.urls(id),
    CONSTRAINT unique_urls UNIQUE
    (
        install_date,partner_url_id, local_url_id, public_url_id
    )
);

插入一些网址:

INSERT dbo.urls(url) VALUES
    ('http://msn.com/'),
    ('http://aol.com/'),
    ('http://yahoo.com/'),
    ('http://google.com/'),
    ('http://gmail.com/'),
    ('http://stackoverflow.com/');

现在让我们插入一些数据:

-- succeeds:
INSERT dbo.url_tracker(partner_url_id, local_url_id, public_url_id)
VALUES (1,2,3), (2,3,4), (3,4,5), (4,5,6);

-- fails:
INSERT dbo.url_tracker(partner_url_id, local_url_id, public_url_id)
VALUES(1,2,3);
GO

/*
    Msg 2627, Level 14, State 1, Line 3
    Violation of UNIQUE KEY constraint 'unique_urls'. Cannot insert duplicate key 
    in object 'dbo.url_tracker'. The duplicate key value is (2011-09-15, 1, 2, 3).
    The statement has been terminated.
*/

-- succeeds, since it's for a different day:
INSERT dbo.url_tracker(install_date, partner_url_id, local_url_id, public_url_id)
VALUES('2011-09-01',1,2,3);

清理:

DROP TABLE dbo.url_tracker, dbo.urls;

现在,如果900字节不够,您可以稍微更改URL表:

CREATE TABLE dbo.urls
(
    id INT IDENTITY(1,1) PRIMARY KEY,
    url VARCHAR(2048) NOT NULL,
    url_hash AS CONVERT(VARBINARY(32), HASHBYTES('SHA1', url)) PERSISTED,
    CONSTRAINT unique_url UNIQUE(url_hash)
);

其余的不必改变。如果您尝试两次插入相同的URL,则会出现类似的违规行为,例如

INSERT dbo.urls(url) SELECT 'http://www.google.com/';
GO
INSERT dbo.urls(url) SELECT 'http://www.google.com/';
GO

/*
    Msg 2627, Level 14, State 1, Line 1
    Violation of UNIQUE KEY constraint 'unique_url'. Cannot insert duplicate key 
    in object 'dbo.urls'. The duplicate key value is
    (0xd111175e022c19f447895ad6b72ff259552d1b38).
    The statement has been terminated.
*/