从SQL Server中的重复数据填充不同的列表

时间:2011-11-07 14:23:50

标签: sql sql-server xml tsql

我需要从XML文件中收集不同员工的列表,该文件包含每个员工的销售日志。不幸的是,XML文件中的数据并不完全“一致”。该文件的结构如下:

<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345" 
      CustomerName="Bob" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345" 
      CustomerName="Pat" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName=""     EmployeeManagerId="12345" 
      CustomerName="Sally" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName=""     EmployeeManagerId="12345" 
      CustomerName="Sue" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId=""      
      CustomerName="Jack" SaleNumber="..." />
<Sale EmployeeId="58203" EmployeeName="Fred" EmployeeManagerId=""      
      CustomerName="Bill" SaleNumber="..." />

此XML文件上载到Web应用程序,Web应用程序将其内容(作为XML)传递给SQL Server中的存储过程进行处理。由于此文件的大小(最多30,000个元素),我希望尽可能少地处理Web应用程序。

到目前为止,我提出的最佳解决方案是为每个不同的EmployeeId和ManagerId值创建一个临时表,其中包含一行。然后,对于表中的每一行,循环遍历具有匹配的EmployeeId的XML元素,直到找到名称不为null的条目(然后对ManagerId重复)。

因此,对于每个唯一的员工ID,我会迭代结果两次,看看我是否能找到他们的名字和经理的ID。

处理完文件后,我希望Employee表看起来像这样:

+---------+------+------------+
| Id (PK) | Name | ManagerId  |
+---------+------+------------+
| 12345   | NULL | NULL       |
| 67890   | John | 12345      |
| 58203   | Fred | NULL       |
+---------+------+------------+

对此有更有效(且程序性更小)的解决方案吗?

2 个答案:

答案 0 :(得分:3)

这会得到结果,但如果样本数据不同,可能需要进行一些清理工作。

DECLARE @T TABLE ( x XML )
INSERT  INTO @T
        ( x )
VALUES  ( '<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345"        CustomerName="Bob" SaleNumber="..." />' )
    ,   ( '<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345"        CustomerName="Pat" SaleNumber="..." />' ),
        ( '<Sale EmployeeId="67890" EmployeeName=""     EmployeeManagerId="12345"        CustomerName="Sally" SaleNumber="..." />' )
     ,  ( '<Sale EmployeeId="67890" EmployeeName=""     EmployeeManagerId="12345"        CustomerName="Sue" SaleNumber="..." />' ),
        ( '<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId=""             CustomerName="Jack" SaleNumber="..." />' ),
        ( '<Sale EmployeeId="58203" EmployeeName="Fred" EmployeeManagerId=""             CustomerName="Bill" SaleNumber="..." />' ) 

;WITH c 
AS (

SELECT DISTINCT ID = x.value('(/Sale/@EmployeeId)[1]', 'int')
      , NAME = x.value('(/Sale/@EmployeeName)[1]', 'varchar(4)')
      , ManagerID = x.value('(/Sale/@EmployeeManagerId)[1]', 'int')
FROM    @t
WHERE  x.value('(/Sale/@EmployeeName)[1]', 'varchar(4)') <> ''
)

SELECT ID, NAME, ManagerID =MIN( NULLIF(ManagerID, 0))
FROM c 
GROUP BY ID, Name
UNION 
SELECT ManagerID, NULL, NULL
FROM c
WHERE ManagerID NOT IN (SELECT DISTINCT ID FROM c)
    AND ManagerID <> 0

答案 1 :(得分:2)

declare @xml xml = '
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345" 
      CustomerName="Bob" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345" 
      CustomerName="Pat" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName=""     EmployeeManagerId="12345" 
      CustomerName="Sally" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName=""     EmployeeManagerId="12345" 
      CustomerName="Sue" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId=""      
      CustomerName="Jack" SaleNumber="..." />
<Sale EmployeeId="58203" EmployeeName="Fred" EmployeeManagerId=""      
      CustomerName="Bill" SaleNumber="..." />'

-- "E1 is all employees"
;with E1 as      
(
  select T.N.value('@EmployeeId', 'int') as Id,
         T.N.value('@EmployeeName', 'nvarchar(100)') as Name,
         T.N.value('@EmployeeManagerId', 'int') as ManagerID
  from @xml.nodes('/Sale') as T(N)
),
-- E2 groups on id to get only one emp for each id
E2 as
(
  select Id, max(Name) as Name, nullif(max(ManagerID), 0) as ManagerID
  from E1 
  group by Id
),
-- "All manager id's"
M as
(
  select distinct T.N.value('@EmployeeManagerId', 'int') as Id
  from @xml.nodes('/Sale') as T(N)
  where T.N.value('@EmployeeManagerId', 'int') <> 0       
)
-- "All unique employees"
select Id, Name, ManagerID
from E2
union all
-- "Add managers with a lookup against emp for name and manager id"
select M.Id, E2.Name, E2.ManagerID
from M
  left outer join E2 
    on M.Id = E2.ID