我需要从XML文件中收集不同员工的列表,该文件包含每个员工的销售日志。不幸的是,XML文件中的数据并不完全“一致”。该文件的结构如下:
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345"
CustomerName="Bob" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345"
CustomerName="Pat" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="" EmployeeManagerId="12345"
CustomerName="Sally" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="" EmployeeManagerId="12345"
CustomerName="Sue" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId=""
CustomerName="Jack" SaleNumber="..." />
<Sale EmployeeId="58203" EmployeeName="Fred" EmployeeManagerId=""
CustomerName="Bill" SaleNumber="..." />
此XML文件上载到Web应用程序,Web应用程序将其内容(作为XML)传递给SQL Server中的存储过程进行处理。由于此文件的大小(最多30,000个元素),我希望尽可能少地处理Web应用程序。
到目前为止,我提出的最佳解决方案是为每个不同的EmployeeId和ManagerId值创建一个临时表,其中包含一行。然后,对于表中的每一行,循环遍历具有匹配的EmployeeId的XML元素,直到找到名称不为null的条目(然后对ManagerId重复)。
因此,对于每个唯一的员工ID,我会迭代结果两次,看看我是否能找到他们的名字和经理的ID。
处理完文件后,我希望Employee表看起来像这样:
+---------+------+------------+
| Id (PK) | Name | ManagerId |
+---------+------+------------+
| 12345 | NULL | NULL |
| 67890 | John | 12345 |
| 58203 | Fred | NULL |
+---------+------+------------+
对此有更有效(且程序性更小)的解决方案吗?
答案 0 :(得分:3)
这会得到结果,但如果样本数据不同,可能需要进行一些清理工作。
DECLARE @T TABLE ( x XML )
INSERT INTO @T
( x )
VALUES ( '<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345" CustomerName="Bob" SaleNumber="..." />' )
, ( '<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345" CustomerName="Pat" SaleNumber="..." />' ),
( '<Sale EmployeeId="67890" EmployeeName="" EmployeeManagerId="12345" CustomerName="Sally" SaleNumber="..." />' )
, ( '<Sale EmployeeId="67890" EmployeeName="" EmployeeManagerId="12345" CustomerName="Sue" SaleNumber="..." />' ),
( '<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="" CustomerName="Jack" SaleNumber="..." />' ),
( '<Sale EmployeeId="58203" EmployeeName="Fred" EmployeeManagerId="" CustomerName="Bill" SaleNumber="..." />' )
;WITH c
AS (
SELECT DISTINCT ID = x.value('(/Sale/@EmployeeId)[1]', 'int')
, NAME = x.value('(/Sale/@EmployeeName)[1]', 'varchar(4)')
, ManagerID = x.value('(/Sale/@EmployeeManagerId)[1]', 'int')
FROM @t
WHERE x.value('(/Sale/@EmployeeName)[1]', 'varchar(4)') <> ''
)
SELECT ID, NAME, ManagerID =MIN( NULLIF(ManagerID, 0))
FROM c
GROUP BY ID, Name
UNION
SELECT ManagerID, NULL, NULL
FROM c
WHERE ManagerID NOT IN (SELECT DISTINCT ID FROM c)
AND ManagerID <> 0
答案 1 :(得分:2)
declare @xml xml = '
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345"
CustomerName="Bob" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId="12345"
CustomerName="Pat" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="" EmployeeManagerId="12345"
CustomerName="Sally" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="" EmployeeManagerId="12345"
CustomerName="Sue" SaleNumber="..." />
<Sale EmployeeId="67890" EmployeeName="John" EmployeeManagerId=""
CustomerName="Jack" SaleNumber="..." />
<Sale EmployeeId="58203" EmployeeName="Fred" EmployeeManagerId=""
CustomerName="Bill" SaleNumber="..." />'
-- "E1 is all employees"
;with E1 as
(
select T.N.value('@EmployeeId', 'int') as Id,
T.N.value('@EmployeeName', 'nvarchar(100)') as Name,
T.N.value('@EmployeeManagerId', 'int') as ManagerID
from @xml.nodes('/Sale') as T(N)
),
-- E2 groups on id to get only one emp for each id
E2 as
(
select Id, max(Name) as Name, nullif(max(ManagerID), 0) as ManagerID
from E1
group by Id
),
-- "All manager id's"
M as
(
select distinct T.N.value('@EmployeeManagerId', 'int') as Id
from @xml.nodes('/Sale') as T(N)
where T.N.value('@EmployeeManagerId', 'int') <> 0
)
-- "All unique employees"
select Id, Name, ManagerID
from E2
union all
-- "Add managers with a lookup against emp for name and manager id"
select M.Id, E2.Name, E2.ManagerID
from M
left outer join E2
on M.Id = E2.ID