将地址列拆分为大表上的多个

时间:2013-03-13 15:00:30

标签: sql-server-2008 tsql

我有一个500k行的表,其中地址在一个字段中,由Char(13)+ Char(10)分隔。我已经在表格中添加了5个字段,希望将其拆分。

发现在线this split function似乎效果不错,因为我有5个部分,而且parsename可能在该字段中,因此我无法使用.

这是一个表值函数,所以我必须循环行并更新记录,以前我会使用游标或sql,或者甚至可能使用c#来执行此操作但我觉得它们必须是 cte 设置基于的答案来执行此操作。

2 个答案:

答案 0 :(得分:3)

您有几种选择:

您可以创建临时表,然后将地址解析为临时表,然后通过将原始表连接到临时表来更新原始表。

您可以编写自己的T-SQL函数,并在更新语句函数中使用这些函数,如下所示:

UPDATE myTable
   SET address1 = myGetAddress1Function(address),
       address2 = myGetAddress2Function(address)....

答案 1 :(得分:3)

所以给出了一些源数据:

CREATE TABLE dbo.Addresses
(
  AddressID INT IDENTITY(1,1),
  [Address] VARCHAR(255),
  Address1  VARCHAR(255),
  Address2  VARCHAR(255),
  Address3  VARCHAR(255),
  Address4  VARCHAR(255),
  Address5  VARCHAR(255)
);

INSERT dbo.Addresses([Address])
SELECT 'foo
bar'
UNION ALL SELECT 'add1
add2
add3
add4
add5';

让我们创建一个函数,按顺序返回地址部分:

CREATE FUNCTION dbo.SplitAddressOrdered
(
    @AddressID  INT,
    @List       VARCHAR(MAX),
    @Delimiter  VARCHAR(32)
)
RETURNS TABLE
AS
    RETURN 
    (
      SELECT  
          AddressID = @AddressID, 
          rn = ROW_NUMBER() OVER (ORDER BY Number), 
          AddressItem = Item 
        FROM (SELECT Number, Item = LTRIM(RTRIM(SUBSTRING(@List, Number, 
          CHARINDEX(@Delimiter, @List + @Delimiter, Number) - Number)))
        FROM (SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
          FROM sys.all_objects) AS n(Number)
        WHERE Number <= CONVERT(INT, LEN(@List))
        AND SUBSTRING(@Delimiter + @List, Number, LEN(@Delimiter)) = @Delimiter
      ) AS y
    );
GO

现在你可以这样做(你必须运行5次查询):

DECLARE 
  @i INT = 1, 
  @sql NVARCHAR(MAX),
  @src NVARCHAR(MAX) = N';WITH x AS 
    (
      SELECT a.*, Original = s.AddressID, s.rn, s.AddressItem
      FROM dbo.Addresses AS a
      CROSS APPLY dbo.SplitAddressOrdered(a.AddressID, a.Address, 
        CHAR(13) + CHAR(10)) AS s WHERE rn = @i
    )';
WHILE @i <= 5
BEGIN
   SET @sql = @src + N'UPDATE x SET Address' + RTRIM(@i)
     + ' = CASE WHEN AddressID = Original AND rn = ' 
     + RTRIM(@i) + ' THEN AddressItem END;';

   EXEC sp_executesql @sql, N'@i INT', @i;

   SET @i += 1;
END

然后您可以删除Address列:

ALTER TABLE dbo.Addresses DROP COLUMN [Address];

然后表格有:

AddressID  Address1  Address2  Address3  Address4  Address5
---------  --------  --------  --------  --------  --------
1          foo       bar       NULL      NULL      NULL
2          add1      add2      add3      add4      add5

我确信有人会比我更聪明地展示如何利用该功能而不必循环。

我还可以想象一下这个功能会稍微改变一下就可以让你简单地拉出某个元素......请等待......

编辑

这是一个标量函数,它本身更昂贵,但允许你进行一次传递而不是5:

CREATE FUNCTION dbo.ElementFromOrderedList
(
    @List       VARCHAR(MAX),
    @Delimiter  VARCHAR(32),
    @Index      SMALLINT
)
RETURNS VARCHAR(255)
AS
BEGIN
    RETURN 
    (
      SELECT Item 
        FROM (SELECT rn = ROW_NUMBER() OVER (ORDER BY Number),
          Item = LTRIM(RTRIM(SUBSTRING(@List, Number, 
          CHARINDEX(@Delimiter, @List + @Delimiter, Number) - Number)))
        FROM (SELECT ROW_NUMBER() OVER (ORDER BY [object_id])
          FROM sys.all_objects) AS n(Number)
        WHERE Number <= CONVERT(INT, LEN(@List))
        AND SUBSTRING(@Delimiter + @List, Number, LEN(@Delimiter)) = @Delimiter
      ) AS y WHERE rn = @Index
    );
END
GO

现在,根据上表(更新之前和删除之前)的更新,只是:

UPDATE dbo.Addresses
  SET Address1 = dbo.ElementFromOrderedList([Address], CHAR(13) + CHAR(10), 1),
      Address2 = dbo.ElementFromOrderedList([Address], CHAR(13) + CHAR(10), 2),
      Address3 = dbo.ElementFromOrderedList([Address], CHAR(13) + CHAR(10), 3),
      Address4 = dbo.ElementFromOrderedList([Address], CHAR(13) + CHAR(10), 4),
      Address5 = dbo.ElementFromOrderedList([Address], CHAR(13) + CHAR(10), 5);