我有超过700万行,否则我会使用Excel。
我的地址栏中包含不同数量的字词。有些像'123 bay street'一样短,而有些则可以像'1234 west spring hill drive to 123'一样长。
我的目标是将每个单词放入自己的专栏中。我能够使用下面的查询获得第一个单词。但我无法创建足够高效的查询来完成剩下的工作。
update X
set X.Address_number = Y.[address]
from
(SELECT
unique_id,
CASE
WHEN SUBSTRING(phy_addr1, 1, CHARINDEX(' ', phy_addr1)) = ''
THEN phy_addr1 + ' '
ELSE SUBSTRING(phy_addr1, 1, CHARINDEX(' ', phy_addr1))
END 'address'
FROM
[RD_GeoCode].[dbo].[PA_Stg_excel]) as Y
inner join
[RD_GeoCode].[dbo].[rg_ApplicationData_AllForms_20160401_address] as X on X.unique_id = Y.unique_id
where
X.Address_number is null
答案 0 :(得分:1)
你需要有一个Numbers表和一个animate()
here.once你有,然后它很简单..
-----字符串拆分函数
select
*
from yourtable t
cross apply
dbo.SplitStrings_Numbers(t.address,' ') b
您可以使用以下功能。
create table addressreferences
(
addresss varchar(300),
delimitedvalue varchar(100)
)
insert into addressreferences
select
address,b.*
from yourtable t
cross apply
dbo.SplitStrings_Numbers(t.address,' ') b
而不是将值更新到同一个表中,我建议创建一些其他表,其中包含指向上表的链接。这需要对现有表进行一些架构修改
create trigger trg_test
after insert,update,delete
on dbo.yourtable
as
begin
---check for inserts
if exists(Select * from inserted)
begin
insert into addressreferences
select address,b.* from inserted i
cross apply
dbo.splitstrings(address,' ') b
--check for deletes
if exists(select 1 from deleted)
begin
delete * from
deleted d
join
adressreferences a
on a.address=d.address
end
if update(address)
begin
---here i recommend doing delete first since your old address and new one may not have equal rows
delete * from
deleted d
join
addressreferences a
on a.address=d.address
--then do a insert
insert into addressreferences
select address,a.* from
inserted i
join
addressreferences a
on a.address=i.address
end
end
end
这只是一个提供想法的伪代码,您将不得不处理引用...更新相同的表将无法工作,因为您不知道地址列可以跨越多少行
<强>更新强>
我认为触发器更适合您的场景而不是引用。但是您必须首先在引用表上插入现有值。这里有一些伪代码..
# we can find LCM of two numbers by the basic prime factorizing method
# but i will use the idea that GCD(a,b) * LCM(a,b) = a*b
# and it is easy to find the GCD(a,b)=[GCD(a,a%b)or GCD(b,b%a)] depending on if a is bigger or b
# i have used this idea because factoring large numbers take time.
答案 1 :(得分:1)
序列表是一件好事。与Louis Davidson的“Pro Relational数据库设计和实现”一样,您可以创建它
CREATE SCHEMA tools
go
CREATE TABLE tools.sequence
(
i int CONSTRAINT PKtools_sequence PRIMARY KEY
)
-- Then I will load it, up to 99999:
;WITH DIGITS (i) as(--set up a set of numbers from 0-9
SELECT i
FROM (VALUES (0),(1),(2),(3),(4),(5),(6),(7),(8),(9)) as digits (i))
--builds a table from 0 to 99999
,sequence (i) as (
SELECT D1.i + (10*D2.i) + (100*D3.i) + (1000*D4.i) + (10000*D5.i)
--+ (100000*D6.i)
FROM digits AS D1 CROSS JOIN digits AS D2 CROSS JOIN digits AS D3
CROSS JOIN digits AS D4 CROSS JOIN digits AS D5
/* CROSS JOIN digits AS D6 */)
INSERT INTO tools.sequence(i)
SELECT i
FROM sequence
然后分开你的输入,再次来自L. Davidson的书中的代码
DECLARE @delimitedList VARCHAR(100) = '1,2,3,4,5'
SELECT word = SUBSTRING(',' + @delimitedList + ',',i + 1,
CHARINDEX(',',',' + @delimitedList + ',',i + 1) - i - 1)
FROM tools.sequence
WHERE i >= 1
AND i < LEN(',' + @delimitedList + ',') - 1
AND SUBSTRING(',' + @delimitedList + ',', i, 1) = ','
ORDER BY i
使用空格而不是逗号。
最后,我会考虑使用PIVOT运算符将行转换为列,但要使其工作,您需要指定最大字数。