我有发送给我的数据,我需要将其标准化。数据位于sql表中,但每行都有多个多值列。一个例子如下:
ID fname lname projects projdates
1 John Doe projA;projB;projC 20150701;20150801;20150901
2 Jane Smith projD;;projC 20150701;;20150902
3 Lisa Anderson projB;projC 20150801;20150903
4 Nancy Johnson projB;projC;projE 20150601;20150822;20150904
5 Chris Edwards projA 20150905
需要看起来像这样:
ID fname lname projects projdates
1 John Doe projA 20150701
1 John Doe projB 20150801
1 John Doe projC 20150901
2 Jane Smith projD 20150701
2 Jane Smith projC 20150902
3 Lisa Anderson projB 20150801
3 Lisa Anderson projC 20150903
4 Nancy Johnson projB 20150601
4 Nancy Johnson projC 20150822
4 Nancy Johnson projE 20150904
5 Chris Edwards projA 20150905
我需要将其拆分为id,fname,lname和解析项目的行,并将其分成不同的记录。我发现很多帖子都有分割功能,我可以让它适用于1列,但不是2.当我做2列时,它会渗透到分割中。即对于John Doe,它给了我3次projA的记录,每次为每个proddates。我需要将每个多值项目记录与其相应的projdate而不是其他项目进行协调。
有什么想法吗?
谢谢!
答案 0 :(得分:1)
如果您使用Jeff Moden" DelimitedSplit8K" splitter(我在这里重命名了#34; fDelimitedSplit8K")
(参见图21: The Final" New" Splitter Code,Ready for Testing )
为了对分裂进行繁重的工作,其余部分变得相当简单,使用CROSS APPLY和WHERE来正确连接。
IF object_ID (N'tempdb..#tInputData') is not null
DROP TABLE #tInputData
CREATE TABLE #tInputData (
ID INT
PRIMARY KEY CLUSTERED -- Add IDENTITY if ID needs to be set at INSERT time
, FName VARCHAR (30)
, LName VARCHAR (30)
, Projects VARCHAR (4000)
, ProjDates VARCHAR (4000)
)
INSERT INTO #tInputData
( ID, FName, LName, Projects, ProjDates )
VALUES
( 1, 'John', 'Doe' , 'projA;projB;projC' , '20150701;20150801;20150901'),
( 2, 'Jane', 'Smith' , 'projD;;projC' , '20150701;;20150902'),
( 3, 'Lisa', 'Anderson' , 'projB;projC' , '20150801;20150903'),
( 4, 'Nancy', 'Johnson' , 'projB;projC;projE' , '20150601;20150822;20150904'),
( 5, 'Chris', 'Edwards' , 'projA' , '20150905')
SELECT * FROM #tInputData -- Take a look at the INSERT results
; WITH ResultSet AS
(
SELECT
InData.ID
, InData.FName
, InData.LName
, ProjectList.ItemNumber AS ProjectID
, ProjectList.Item AS Project
, DateList.ItemNumber AS DateID
, DateList.Item AS ProjDate
FROM #tInputData AS InData
CROSS APPLY dbo.fDelimitedSplit8K(InData.Projects,';') AS ProjectList
CROSS APPLY dbo.fDelimitedSplit8K(InData.ProjDates,';') AS DateList
WHERE DateList.ItemNumber = ProjectList.ItemNumber -- Links projects and dates in left-to-r1ght order
AND (ProjectList.Item <> '' AND DateList.Item <> '') -- Ignore input lines when both Projects and ProjDates have no value; note that these aren't NULLs.
)
SELECT
ID
, FName
, LName
, Project
, ProjDate
FROM ResultSet
ORDER BY ID, Project
结果
ID FName LName Project ProjDate
-- ----- -------- ------- --------
1 John Doe projA 20150701
1 John Doe projB 20150801
1 John Doe projC 20150901
2 Jane Smith projC 20150902
2 Jane Smith projD 20150701
3 Lisa Anderson projB 20150801
3 Lisa Anderson projC 20150903
4 Nancy Johnson projB 20150601
4 Nancy Johnson projC 20150822
4 Nancy Johnson projE 20150904
5 Chris Edwards projA 20150905
此算法处理等长的Project和Date列表。对于给定的行,如果一个列表比另一个列表短,则需要特别注意在适当的位置应用NULL。
-- Cleanup
DROP TABLE #tInputData
答案 1 :(得分:0)
你没有说出你预期的结果是什么,但这可能是一个很好的起点:
declare @t table (ID int not null,fname varchar(17) not null,lname varchar(15) not null,
projects varchar(76) not null,projdates varchar(310) not null)
insert into @t(ID,fname,lname,projects,projdates) values
(1,'John', 'Doe', 'projA;projB;projC','20150701;20150801;20150901'),
(2,'Jane', 'Smith', 'projD;;projC', '20150701;;20150902' ),
(3,'Lisa', 'Anderson','projB;projC', '20150801;20150903' ),
(4,'Nancy','Johnson', 'projB;projC;projE','20150601;20150822;20150904'),
(5,'Chris','Edwards', 'projA', '20150905' )
;With Numbers as (
select ROW_NUMBER() OVER (ORDER BY Number) n
from master..spt_values
), ProjectPositions as (
select ID,n.n
from @t t
inner join
Numbers n
on SUBSTRING(t.projects,n.n,1) = ';'
union all
select ID,0 from @t
union all
select ID,LEN(projects)+1 from @t
), ProjectsNumbered as (
select *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY n) rn
from ProjectPositions
), ProjectPartitions as (
select n1.ID,n1.n+1 as startat,n2.n as endat,n1.rn
from ProjectsNumbered n1
inner join
ProjectsNumbered n2
on
n1.id = n2.id and
n1.rn = n2.rn -1
), ProDatePositions as (
select ID,n.n
from @t t
inner join
Numbers n
on SUBSTRING(t.projdates,n.n,1) = ';'
union all
select ID,0 from @t
union all
select ID,LEN(projdates)+1 from @t
), ProDateNumbered as (
select *,ROW_NUMBER() OVER (PARTITION BY ID ORDER BY n) rn
from ProDatePositions
), ProDatePartitions as (
select n1.ID,n1.n+1 as startat,n2.n as endat,n1.rn
from ProDateNumbered n1
inner join
ProDateNumbered n2
on
n1.id = n2.id and
n1.rn = n2.rn -1
)
select
t.ID,t.fname,t.lname,
SUBSTRING(projects,pp.startat,pp.endat - pp.startat) as project,
SUBSTRING(projdates,pdp.startat,pdp.endat - pdp.startat) as projdate
from
@t t
inner join
ProjectPartitions pp
on
t.ID = pp.ID
inner join
ProDatePartitions pdp
on
t.ID = pdp.ID and
pp.rn = pdp.rn
结果:
ID fname lname project projdate
----------- ----------------- --------------- ----------- ----------
1 John Doe projA 20150701
1 John Doe projB 20150801
1 John Doe projC 20150901
2 Jane Smith projD 20150701
2 Jane Smith
2 Jane Smith projC 20150902
3 Lisa Anderson projB 20150801
3 Lisa Anderson projC 20150903
4 Nancy Johnson projB 20150601
4 Nancy Johnson projC 20150822
4 Nancy Johnson projE 20150904
5 Chris Edwards projA 20150905
(目前还不清楚你想为ID
2的“空”项目做些什么
工作原理 - 我们使用Numbers
假设ROW_NUMBER()
表 - 我们在master
查询未记录的表,但我们没有使用表中的任何实际值 - 只知道有很多行。如果您有一个实数表,则可以跳过该CTE。
然后我们做两次相同的操作 - 我们将数字表连接到我们的数据表,并使用它来查找我们想要拆分的字符串中;
个字符的位置。我们还为位置0(在字符串开始之前)和在字符串结尾之后的1位置创建一对虚拟结果。这定义了ProjectPositions
和ProDatePositions
我们使用其他ROW_NUMBER()
(ProjectNumbered
,ProDateNumbered
对这些位置进行编号,然后使用该信息将连续的行连接在一起(ProjectPartitions
,ProDatePartitions
)。然后最终结果是我们计算了从两个字符串中提取子字符串的位置。
最后,我们将这些“paritition”CTE加入到原始数据表中,我们使用行号来确保我们对齐来自两个独立字符串的分区信息。
答案 2 :(得分:0)
尝试以下查询。
SELECT A.ID,a.fname,a.lname,a.projects,
ltrim(Split.a.value('。','VARCHAR(100)'))AS projdates
FROM(SELECT ID,fname,lname,projects,
CAST(''+ REPLACE([projdates],';','')+''AS XML)AS String
FROM)作为交叉应用String.nodes('/ M')AS Split(a);
尝试使用此功能,您将获得预期的输出。
感谢。