我有一个非常大的xml数据集,其结构如下:
<root>
<person>
<personid>HH3269732</personid>
<firstname>John</firstname>
<lastname>Smith</lastname>
<entertime>01/02/2008 10:15</entertime>
<leavetime>01/02/2008 11:45</leavetime>
<entertime>03/01/2008 08:00</entertime>
<leavetime>03/01/2008 10:00</leavetime>
...
// number of enter times and leave times vary from person to person
// there may not be a final leave time (ie, they haven't left yet)
</person>
...
</root>
数据的结构不在我的控制之下。此数据当前位于MS SQL Server 2005中单行的单个xml列中。我正在尝试构造一个查询,该结果将产生以下输出:
HH3269732 John Smith 01/02/2008 10:15 01/02/2008 11:45
HH3269732 John Smith 03/01/2008 08:00 01/02/2008 10:00
HH3269735 Mark Pines 02/01/2008 09:00 NULL
HH3263562 James Frank NULL NULL
HH3264237 Harold White 04/18/2008 03:00 04/18/2008 05:00
...
我的查询目前如下所示:
DECLARE @xml xml
SELECT @xml = XmlCol FROM Data
SELECT
[PersonId] = Persons.PersonCollection.value('(personid)[1]', 'NVARCHAR(50)')
,[First Name] = Persons.PersonCollection.value('(firstname)[1]', 'NVARCHAR(50)')
,[Last Name] = Persons.PersonCollection.value('(lastname)[1]', 'NVARCHAR(50)')
??????
FROM @xml.nodes('root\person') Persons(PersonCollection)
该查询可能不是100%正确,因为我从内存中提取它,但我遇到的问题是我不知道如何以这样的方式包含entertime leavetime序列元素我在上面指出的所需行集。
感谢。
更新: 我想补充一点,给定的人记录可能有 no entertime / leavetime序列元素,但仍然需要在行集中返回。我更新了所需输出的示例以反映这一点。
答案 0 :(得分:1)
with cte_entertime as (
SELECT
[PersonId] = t.c.value('(../personid)[1]', 'NVARCHAR(50)')
,[First Name] = t.c.value('(../firstname)[1]', 'NVARCHAR(50)')
,[Last Name] = t.c.value('(../lastname)[1]', 'NVARCHAR(50)')
,[Entertime] = t.c.value('.', 'NVARCHAR(50)')
,[entry_number] = ROW_NUMBER() OVER (ORDER BY t.c)
FROM @x.nodes('root/person/entertime') t(c))
, cte_leavetime as (
SELECT
[Leavetime] = t.c.value('.', 'NVARCHAR(50)')
,[entry_number] = ROW_NUMBER() OVER (ORDER BY t.c)
FROM @x.nodes('root/person/leavetime') t(c))
SELECT PersonID
, [First Name]
, [Last Name]
, [Entertime]
, [Leavetime]
FROM cte_entertime e
LEFT OUTER JOIN cte_leavetime l on e.entry_number = l.entry_number
答案 1 :(得分:0)
我接受了Remus的答案,因为它让我获得了95%的解决方案。出于提供信息的目的,这是最终的查询结构:
with cte_maindata as (
SELECT
[PersonId] = t.c.value('(personid)[1]', 'NVARCHAR(50)')
,[First Name] = t.c.value('(firstname)[1]', 'NVARCHAR(50)')
,[Last Name] = t.c.value('(lastname)[1]', 'NVARCHAR(50)')
FROM @x.nodes('root/person') t(c))
, cte_entertime as (
SELECT
[PersonId] = t.c.value('(../personid)[1]', 'NVARCHAR(50)')
,[Entertime] = t.c.value('.', 'NVARCHAR(50)')
FROM @x.nodes('root/person/entertime') t(c))
, cte_leavetime as (
SELECT
[PersonId] = t.c.value('(../personid)[1]', 'NVARCHAR(50)')
,[Leavetime] = t.c.value('.', 'NVARCHAR(50)')
FROM @x.nodes('root/person/leavetime') t(c))
SELECT
m.PersonID
,[First Name]
,[Last Name]
,[Entertime]
,[Leavetime]
FROM cte_maindata m
LEFT OUTER JOIN cte_entertime e on m.PersonId = e.PersonId
LEFT OUTER JOIN cte_leavetime l on m.PersonId = l.PersonId
答案 2 :(得分:0)
没有意识到您可能在文档中有多个人。无论如何,我的查询在这种情况下是不正确的。我想也许如果你先将每个人分解成自己的XML片段,那么提取输入/离开时间可能会更好。我没有尝试215k人的XML,但这是一个想法:
declare @x xml;
select @x = N'<root>
<person>
<personid>HH3269732</personid>
<firstname>John</firstname>
<lastname>Smith</lastname>
<entertime>01/02/2008 10:15</entertime>
<leavetime>01/02/2008 11:45</leavetime>
<entertime>03/01/2008 08:00</entertime>
<leavetime>03/01/2008 10:00</leavetime>
<entertime>04/01/2008 08:00</entertime>
</person>
<person>
<personid>HH3269733</personid>
<firstname>Jane</firstname>
<lastname>Doe</lastname>
<entertime>01/03/2008 10:15</entertime>
<leavetime>01/03/2008 11:45</leavetime>
<entertime>03/04/2008 08:00</entertime>
<leavetime>03/04/2008 10:00</leavetime>
<entertime>04/04/2008 08:00</entertime>
</person>
</root>';
with cte_person as (
select
t.c.value('(personid)[1]', 'NVARCHAR(50)') as personid
, t.c.value('(firstname)[1]', 'NVARCHAR(50)') as firstname
, t.c.value('(lastname)[1]', 'NVARCHAR(50)') as lastname
, t.c.query('entertime') as entertime
, t.c.query('leavetime') as leavetime
FROM @x.nodes('root/person') t(c))
, cte_cross_enter as (
select
p.personid
, p.firstname
, p.lastname
, x.c.value('.', 'datetime') as entertime
, row_number() over (partition by personid order by x.c) as row_enter
from cte_person p
cross apply p.entertime.nodes('/entertime') x(c))
, cte_cross_leave as (
select
p.personid
, x.c.value('.', 'datetime') as leavetime
, row_number() over (partition by personid order by x.c) as row_leave
from cte_person p
cross apply p.leavetime.nodes('/leavetime') x(c))
select e.personid
, e.firstname
, e.lastname
, e.entertime
, l.leavetime
from cte_cross_enter e
left outer join cte_cross_leave l
on e.personid = l.personid and
e.row_enter = l.row_leave