我有超过500,000个XML文件存储在MS SQL数据库中,例如下面的那个(已经过编辑以节省问题中的空间)。
<?xml version="1.0"?>
<PROJECTS xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<APPLICATION_ID>7000518</APPLICATION_ID>
<ACTIVITY>C06</ACTIVITY>
<ADMINISTERING_IC>RR</ADMINISTERING_IC>
<APPLICATION_TYPE>1</APPLICATION_TYPE>
<BUDGET_START>09/01/2009</BUDGET_START>
<BUDGET_END>09/30/2013</BUDGET_END>
<FULL_PROJECT_NUM>1C06RR020539-01A1</FULL_PROJECT_NUM>
<FY>2009</FY>
<ORG_STATE>CA</ORG_STATE>
<ORG_ZIPCODE>900952000</ORG_ZIPCODE>
<PIS>
<PI>
<PI_NAME>JONES,MARY</PI_NAME>
<PI_ID>9876543</PI_ID>
</PI>
<PI>
<PI_NAME>DOE, JOHN</PI_NAME>
<PI_ID>1234567</PI_ID>
</PI>
</PIS>
<PROJECT_TERMSX>
<TERM>Extramural Activities</TERM>
<TERM>Extramural Research Facilities Construction Project</TERM>
</PROJECT_TERMSX>
<PROJECT_TITLE>The Center for Oral/Research</PROJECT_TITLE>
<SUPPORT_YEAR>1</SUPPORT_YEAR>
</row>
</PROJECTS>
我可以使用以下内容搜索任何单个节点:
SELECT nref.value('(APPLICATION_ID)[1]', 'Int') APPLICATION_ID,
nref.value('(ACTIVITY)[1]', 'varchar(3)') ACTIVITY
FROM [XML_2010] cross apply XMLData.nodes('//PROJECTS/row') as R(nref)
WHERE nref.value('(CORE_PROJECT_NUM)[1]', 'varchar(25)') LIKE '%CA187342%'
但是,如何找到与DOE,JOHN作为PIS子节点的PI的所有XML文件相关联的数据?例如APPLICATION_ID和BUDGET_START等? 谢谢你的帮助
答案 0 :(得分:0)
XML非常适合存档和数据交换,但是存储有效使用/过滤/搜索数据的错误容器。因此,我强烈建议您将所有数据传输到经典的索引表中,如下所示:
注意我将XML缩减为每个级别的一些示例,其余的遵循相同的方法并由您决定。声明的表变量是模拟测试场景:
DECLARE @YourTable TABLE(ID INT IDENTITY,YourXml XML);
INSERT INTO @YourTable VALUES
('<PROJECTS xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<row>
<APPLICATION_ID>7000518</APPLICATION_ID>
<ACTIVITY>C06</ACTIVITY>
<!-- more first level elements like above -->
<!-- Here there are multiple PIs -->
<PIS>
<PI>
<PI_NAME>JONES,MARY</PI_NAME>
<PI_ID>9876543</PI_ID>
</PI>
<PI>
<PI_NAME>DOE, JOHN</PI_NAME>
<PI_ID>1234567</PI_ID>
</PI>
</PIS>
<!-- Here there are multiple PROJECT_TERMS -->
<PROJECT_TERMSX>
<TERM>Extramural Activities</TERM>
<TERM>Extramural Research Facilities Construction Project</TERM>
</PROJECT_TERMSX>
<!-- These are normal first level elements again -->
<PROJECT_TITLE>The Center for Oral/Research</PROJECT_TITLE>
<SUPPORT_YEAR>1</SUPPORT_YEAR>
</row>
</PROJECTS>');
- 此SELECT
将所有第一级数据与部分XML一起读入临时表#Projects
:
SELECT r.value('(APPLICATION_ID/text())[1]','bigint') AS APPLICATION_ID
,r.value('(ACTIVITY/text())[1]','nvarchar(max)') AS ACTIVITY
--more columns like above
,r.query('PIS') AS AllPis
,r.query('PROJECT_TERMSX') AS AllProjectTerms
--more first level columns
INTO #Projects
FROM @YourTable AS t
OUTER APPLY t.YourXml.nodes('/PROJECTS/row') AS A(r);
- 此SELECT
从#Projects
读取并将所有相关的 PI数据存储在另一个临时表#PIs
SELECT APPLICATION_ID
,p.value('(PI_ID/text())[1]','bigint') AS PI_ID
,p.value('(PI_NAME/text())[1]','nvarchar(max)') AS PI_NAME
INTO #PIs
FROM #Projects AS p
OUTER APPLY p.AllPis.nodes('PIS/PI') AS A(p);
- 与#Terms
SELECT APPLICATION_ID
,t.value('(./text())[1]','nvarchar(max)') AS TERM
INTO #Terms
FROM #Projects AS p
OUTER APPLY p.AllProjectTerms.nodes('PROJECT_TERMSX/TERM') AS A(t);
- 现在是临时表的内容
SELECT * FROM #Projects;
SELECT * FROM #PIs;
SELECT * FROM #Terms;
--Clean up
GO
DROP TABLE #Projects;
DROP TABLE #PIs;
DROP TABLE #Terms;
在Clean up
之前,您将输入一些代码,这些代码将您的数据从这些临时表中写入实际表中。定义关系的ID与数据一起存储。这应该很容易。您需要INSERT INTO
或MERGE
,具体取决于您是否必须处理现有数据。
您可能会考虑m:n
与projects and PIs
之间的projects and terms
- 关系。为此,您要编写一个单独的PI表和一个单独的Term表,其中包含一个映射表(保存application_id和第二个id,两者都作为外键)