从T SQL中的varchar列生成唯一ID

时间:2018-03-20 09:12:05

标签: sql-server tsql

我想要加入两个表。 两个表都包含一些用于连接它们的varchar列。 但是,运行查询以使用varchar列进行计算以进行连接是一个缓慢的过程。 所以,我想将这些varchar列转换为唯一的整数id,以便比较更快。

SELECT /*do calculations*/
FROM   [dbo].[messages]  m WITH (NOLOCK)
JOIN   [dbo].[jointable] j ON j.address = m.orig OR j.address = m.recip

地址,orig和recip是具有字符串的列,并且最好使id具有更快的性能。 我意识到部分ON j.address = m.orig OR j.address = m.recip会降低性能。

我想加入的表具有以下结构:

CREATE TABLE [dbo].[jointable](
    [displayname] [nvarchar](256) NULL,
    [alias] [nvarchar](129) NULL,
    [firstname] [nvarchar](129) NULL,
    [lastname] [nvarchar](129) NULL,
    [address] [nvarchar](256) NULL,
    [company] [nvarchar](129) NULL,
    [department] [nvarchar](129) NULL,
    [office] [nvarchar](129) NULL) 


CREATE TABLE [dbo].[messages](
    [messageid] [bigint] NOT NULL,
    [message] [varchar](150) NULL,
    [orig] [nvarchar](256) NULL,
    [recip] [nvarchar](256) NULL)

我该怎么做?是否有任何函数可以将id从varchar转换为整数?提前谢谢。

4 个答案:

答案 0 :(得分:3)

    表格中的
  1. Normalize数据
  2. 将int主键添加到第一个表
  3. 将int外键添加到第二个表。设置第一个表中的相应值
  4. 通过int键加入

答案 1 :(得分:2)

可以 从VARCHAR 生成GUID,但我怀疑,您对此感到满意(因此您需要某种映射表,如其他答案所示) )。只是为了表明原则:

如果您的字符串很短且在16字节内是唯一的,这可能对您有用:

DECLARE @tbl TABLE(SomeString VARCHAR(100),TheGUID UNIQUEIDENTIFIER);

--a GUID is a 16-Byte(128 bit) sized type

INSERT INTO @tbl(SomeString) VALUES
 ('test1')
,('Some short text')
,('Some very very very very long text')
,('Some very very very very long text which is the same as the other one in the first 16 bytes');

UPDATE @tbl SET TheGUID=CAST(CAST(SomeString AS VARBINARY(16)) AS UNIQUEIDENTIFIER);

SELECT SomeString
      ,TheGUID
      ,CAST(CAST(TheGUID AS VARBINARY(16)) AS VARCHAR(16))
FROM @tbl;

结果(滚动到一边)

+---------------------------------------------------------------------------------------------+--------------------------------------+--------------------+
| SomeString                                                                                  | TheGUID                              | (Kein Spaltenname) |
+---------------------------------------------------------------------------------------------+--------------------------------------+--------------------+
| test1                                                                                       | 74736574-0031-0000-0000-000000000000 | test1              |
+---------------------------------------------------------------------------------------------+--------------------------------------+--------------------+
| Some short text                                                                             | 656D6F53-7320-6F68-7274-207465787400 | Some short text    |
+---------------------------------------------------------------------------------------------+--------------------------------------+--------------------+
| Some very very very very long text                                                          | 656D6F53-7620-7265-7920-766572792076 | Some very very v   |
+---------------------------------------------------------------------------------------------+--------------------------------------+--------------------+
| Some very very very very long text which is the same as the other one in the first 16 bytes | 656D6F53-7620-7265-7920-766572792076 | Some very very v   |
+---------------------------------------------------------------------------------------------+--------------------------------------+--------------------+

答案 2 :(得分:1)

首先尝试在列上添加索引(即使它们是VARCHAR)。如果您仍在努力提高性能,可以使用以下内容按整数值连接。

-- Create a table to link a varchar with an integer
CREATE TABLE WordIndex(
    WordID INT IDENTITY PRIMARY KEY,
    Word VARCHAR(500))

CREATE NONCLUSTERED INDEX NCI_WordIndex_Word ON WordIndex (Word)
GO

-- Load the table with all available words
INSERT INTO WordIndex (
    Word)
SELECT DISTINCT
    YourVarcharColumn
FROM
    YourTable
UNION
SELECT DISTINCT
    YourOtherVarcharColumn
FROM
    YourSecondTable

GO


-- Add the integer ID to your tables
ALTER TABLE YourTable ADD WordID INT
ALTER TABLE YourSecondTable ADD WordID INT

ALTER TABLE YourTable ADD FOREIGN KEY (WordID) REFERENCES WordIndex (WordID)
ALTER TABLE YourSecondTable ADD FOREIGN KEY (WordID) REFERENCES WordIndex (WordID)
GO

-- Optionally (but recommended) add indexes on the ID
CREATE NONCLUSTERED INDEX NCI_YourTable_WordID ON YourTable (WordID)
CREATE NONCLUSTERED INDEX NCI_YourSecondTable_WordID ON YourSecondTable (WordID)
GO


-- Update the integer ID
UPDATE T SET
    WordID = W.WordID
FROM
    YourTable AS T
    INNER JOIN WordIndex AS W ON T.Word = W.Word

UPDATE T SET
    WordID = W.WordID
FROM
    YourSecondTable AS T
    INNER JOIN WordIndex AS W ON T.Word = W.Word
GO


-- Join by integer
SELECT
    1
FROM
    YourTable AS T
    INNER JOIN YourSecondTable AS N ON T.WordID = N.WordID

使用这种方法需要维护单词索引表。

答案 3 :(得分:0)

需要查看表格设计才能正确回答。

如果你只是为关系添加int并正确分配int,问题是如果你以后更改数据那么int关系是错误的。

如果索引这些列,varchar连接并不比int快得多。

可能是相同的查询计划,但

SELECT /*do calculations*/
FROM   [dbo].[messages]  m 
JOIN   [dbo].[jointable] j ON j.address in (m.orig, m.recip)