SQL使用空格将地址分隔成多列

时间:2016-06-02 19:04:36

标签: sql sql-server csv

我有超过700万行,否则我会使用Excel。

我的地址栏中包含不同数量的字词。有些像'123 bay street'一样短,而有些则可以像'1234 west spring hill drive to 123'一样长。

我的目标是将每个单词放入自己的专栏中。我能够使用下面的查询获得第一个单词。但我无法创建足够高效的查询来完成剩下的工作。

update X
set X.Address_number = Y.[address]
from 
     (SELECT 
          unique_id,
          CASE 
             WHEN SUBSTRING(phy_addr1, 1, CHARINDEX(' ', phy_addr1)) = ''
                THEN phy_addr1 + ' '
             ELSE SUBSTRING(phy_addr1, 1, CHARINDEX(' ', phy_addr1))
          END 'address'
      FROM 
         [RD_GeoCode].[dbo].[PA_Stg_excel]) as Y
  inner join 
      [RD_GeoCode].[dbo].[rg_ApplicationData_AllForms_20160401_address] as X on X.unique_id = Y.unique_id
  where 
      X.Address_number is null

2 个答案:

答案 0 :(得分:1)

你需要有一个Numbers表和一个animate() here.once你有,然后它很简单..

-----字符串拆分函数

select 
*
 from yourtable t
cross apply
dbo.SplitStrings_Numbers(t.address,' ') b

您可以使用以下功能。

create table addressreferences
(
addresss varchar(300),
delimitedvalue varchar(100)
)

insert into addressreferences
 select 
    address,b.*
     from yourtable t
    cross apply
    dbo.SplitStrings_Numbers(t.address,' ') b

而不是将值更新到同一个表中,我建议创建一些其他表,其中包含指向上表的链接。这需要对现有表进行一些架构修改

create trigger trg_test
after insert,update,delete 
on dbo.yourtable
as
begin
---check for inserts
if exists(Select * from inserted)
begin
insert into addressreferences
select address,b.* from inserted i
cross apply
dbo.splitstrings(address,' ') b

--check for deletes
if exists(select 1 from deleted)
begin

delete * from 
deleted d
join
adressreferences a
on a.address=d.address

end

if update(address)
begin
---here i recommend doing delete first since your old address and new one may not have equal rows

delete * from
deleted d
join
addressreferences a
on a.address=d.address

--then do a insert
insert into addressreferences
select address,a.* from
inserted i
join
addressreferences a
on a.address=i.address

end

end

end

这只是一个提供想法的伪代码,您将不得不处理引用...更新相同的表将无法工作,因为您不知道地址列可以跨越多少行

<强>更新
我认为触发器更适合您的场景而不是引用。但是您必须首先在引用表上插入现有值。这里有一些伪代码..

# we can find LCM of two numbers by the basic prime factorizing method
# but i will use the idea that GCD(a,b) * LCM(a,b) = a*b
# and it is easy to find the GCD(a,b)=[GCD(a,a%b)or GCD(b,b%a)] depending on if a is bigger or b
# i have used this idea because factoring large numbers take time.

答案 1 :(得分:1)

序列表是一件好事。与Louis Davidson的“Pro Relational数据库设计和实现”一样,您可以创建它

CREATE SCHEMA tools
go
CREATE TABLE tools.sequence
(
i int CONSTRAINT PKtools_sequence PRIMARY KEY
)

-- Then I will load it, up to 99999:
;WITH DIGITS (i) as(--set up a set of numbers from 0-9
SELECT i
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) as digits (i))
--builds a table from 0 to 99999
,sequence (i) as (
SELECT D1.i + (10*D2.i) + (100*D3.i) + (1000*D4.i) + (10000*D5.i)
--+ (100000*D6.i)
FROM digits AS D1 CROSS JOIN digits AS D2 CROSS JOIN digits AS D3
CROSS JOIN digits AS D4 CROSS JOIN digits AS D5
/* CROSS JOIN digits AS D6 */)
INSERT INTO tools.sequence(i)
SELECT i
FROM sequence

然后分开你的输入,再次来自L. Davidson的书中的代码

DECLARE @delimitedList VARCHAR(100) = '1,2,3,4,5'
SELECT word = SUBSTRING(',' + @delimitedList + ',',i + 1,
CHARINDEX(',',',' + @delimitedList + ',',i + 1) - i - 1)
FROM tools.sequence
WHERE i >= 1
AND i < LEN(',' + @delimitedList + ',') - 1
AND SUBSTRING(',' + @delimitedList + ',', i, 1) = ','
ORDER BY i

使用空格而不是逗号。

最后,我会考虑使用PIVOT运算符将行转换为列,但要使其工作,您需要指定最大字数。