使用SSIS从多级XML到单个表的数据迁移

时间:2016-03-29 20:14:09

标签: sql-server xml ssis database-migration

目标

我正在尝试将数据从具有嵌套元素的多层XML文件迁移到单个表。

系统参数

  • MS SQL Server Management Studio
  • Microsoft Visual Studio SSIS

XML文件

Here's the XSD for the XML file I have

正如您所看到的,它不仅仅是一个简单的布局。整个事情都包含在“人物”标签中,大约有1000个人。每个'Person'标签都包含以下信息元素。 XML是这样的:

  • 名字
  • 鉴定
  • 图像
  • 链接
  • 书籍
      • 详细
      • 详细
    • ......(可能有很多这些)
  • 文章
    • 文章
      • 详细
    • 文章
      • 详细
    • ......(可能有很多这些)
  • 论文
  • 艺术品
  • 网站

作为旁注,可以有多个

问题

现在,这是我的问题。如何将所有这些信息放入带有SSIS的单个 SQL表中?我知道XML文件的拓扑不直接映射到表的拓扑,但我想强制它。我想为每个'人'分别设一行。我还想要足够的列来捕获我的数据集中任何一个人拥有的最大数量的“书”。也许这意味着在决赛桌中创建'Book_1','Book_2','Book_3'等列。我想要一系列带有外键和主键的表。我想为每个'Book','Year''Details'分别对应每个元素。更清楚的是,让我告诉你我想要一个例子。

示例XML文件

如果我有一个带有3个'Book'元素的'Books'标签,我想为每本书创建一个单独的列:

  • 约翰
  • 斯坦贝克
  • ...
  • 书籍
    • 伊甸园以东
      • 1952
      • 一本好书
    • 老鼠和男人
      • 1937
      • 关于老鼠的书
    • 愤怒的葡萄
      • 1939
      • 关于愤怒的葡萄的书
  • 文章
    • 文章
  • ...

SQL数据库中的结果表示例

我希望该表看起来像this,对于XML文件的所有嵌套元素看起来都是这样的。是否可以使用SSIS以这种方式对数据库进行一种扁平导入?

谢谢!我真的很感激。

附加说明

  • XML文件中的某些条目最多包含60,000个字符。我应该使用哪种数据类型?

实际XML文件的片段

以下是XML文件的示例。实际的XML有很多<Person>个。

<?xml version="1.0" encoding="UTF-8" ?>
<People>
    <Person>
        <FirstName>Eliza</FirstName>
        <LastName>Ablovatski</LastName>
        <Biography>
            <![CDATA[<p>Eliza Ablovatski joined the Kenyon history department in 2003, after graduate work in East Central European history at Columbia University and research and fellowships in Munich and Berlin, Germany and Budapest, Hungary. She teaches classes on Europe from 1500 to the present, focusing on the nineteenth and twentieth centuries, Germany, Russia, the Habsburg Monarchy, film, nationalism and identity, gender, race, and the interwar period.</p>
<p>Her dissertation and first book,&nbsp;<em>Revolution and Political Violence in Central Europe: The Deluge of 1919</em> (forthcoming from Cambridge University Press), focus on the revolutionary upheavals in Munich and Budapest following the First World War, and their relationship to political violence and antisemitism. She is currently researching the occupation of Austria (1945-1955) at the end of the Second World War, and the nuclear idea in postwar Europe. She has also researched and written extensively on the history of Jews in the former Habsburg regional capital of Czernowitz (now Ukraine).</p>]]>
        </Biography>
        <Expertise>
            <![CDATA[<p>Modern Europe, especially Germany and Central/East Central Europe in the nineteenth and twentieth centuries; European Jewish and women's history, East European and German film and literature, socialism, war, and revolution.</p>]]>
        </Expertise>
        <Image>http://www.kenyon.edu/images/directory/ablovatski.jpg</Image>
        <Link>http://www.kenyon.edu/directories/campus-directory/biography/eliza-ablovatski/</Link>
        <Books>
            <Book>
                <Year></Year>
                <Details>
                    <![CDATA[<p><em>Zwischen Pruth und Jordan. Lebenserinnerungen Czernowitzer Juden</em><em>&nbsp;,&nbsp;</em>with Gaby Coldewey and others K&ouml;ln: B&ouml;hlau Verlag, 2003</p>]]>
                </Details>
            </Book>
            <Book>
                <Year></Year>
                <Details>
                    <![CDATA[<p><em>Czernowitz ist gewen an alt jiddische Stdt: &Uuml;berlebende berichten,</em>&nbsp;With Gaby Coldewey and others. First Edition: Czernowitz,Ukraine: distributed by the Heinrich-B&ouml;ll-Stiftung, 1998 Second Edition: Berlin, 1999 (Third edition: Potsdam, forthcoming 2009)</p>]]>
                </Details>
            </Book>
        </Books>
        <Articles>
            <Article>
                <Year></Year>
                <Details>
                    <![CDATA[<p>"The Central European Revolutions of 1919 and the Myth of Judeo-Bolshevism,"&nbsp;<em>European Review of History, Vol. 17/ Issue 3: Cosmopolitanism, Nationalism and the Jews of East Central Europe (2010), 473-489.</em></p>]]>
                </Details>
            </Article>
            <Article>
                <Year></Year>
                <Details>
                    <![CDATA[<p>"Between Red Army and White Guard: Women in Budapest, 1918-1919," in&nbsp;<em>Gender and War in Twentieth-Century Eastern Europe,</em>&nbsp;edited by Maria Bucur and Nancy Wingfield&nbsp;Bloomington: Indiana University Press 2006</p>]]>
                </Details>
            </Article>
            <Article>
                <Year></Year>
                <Details>
                    <![CDATA[<p>"The Girl with the Titus-head: Women in Revolution in Munich and Budapest, 1919"&nbsp;<em>Nationalities Papers&nbsp;</em>28/3 (September 2000), 541-550</p>]]>
                </Details>
            </Article>
        </Articles>
        <Papers>
        </Papers>
        <Artwork>
        </Artwork>
        <Websites>
        </Websites>
    </Person>
...This goes on to include many <Person> elements. (About 1000)
</People>

1 个答案:

答案 0 :(得分:1)

实际XML的Thx!以下查询将从XML中获取您的值。它将为它们生成ID以将所有数据存储在相关表中。

注意:我不得不将'中的woman's符号加倍,我添加了第二个person以显示该方法:

DECLARE @x XML=
'<?xml version="1.0" encoding="UTF-8" ?>
<People>
    <Person>
        <FirstName>Eliza</FirstName>
        <LastName>Ablovatski</LastName>
        <Biography>
            <![CDATA[<p>Eliza Ablovatski joined the Kenyon history department in 2003, after graduate work in East Central European history at Columbia University and research and fellowships in Munich and Berlin, Germany and Budapest, Hungary. She teaches classes on Europe from 1500 to the present, focusing on the nineteenth and twentieth centuries, Germany, Russia, the Habsburg Monarchy, film, nationalism and identity, gender, race, and the interwar period.</p>
<p>Her dissertation and first book,&nbsp;<em>Revolution and Political Violence in Central Europe: The Deluge of 1919</em> (forthcoming from Cambridge University Press), focus on the revolutionary upheavals in Munich and Budapest following the First World War, and their relationship to political violence and antisemitism. She is currently researching the occupation of Austria (1945-1955) at the end of the Second World War, and the nuclear idea in postwar Europe. She has also researched and written extensively on the history of Jews in the former Habsburg regional capital of Czernowitz (now Ukraine).</p>]]>
        </Biography>
        <Expertise>
            <![CDATA[<p>Modern Europe, especially Germany and Central/East Central Europe in the nineteenth and twentieth centuries; European Jewish and women''s history, East European and German film and literature, socialism, war, and revolution.</p>]]>
        </Expertise>
        <Image>http://www.kenyon.edu/images/directory/ablovatski.jpg</Image>
        <Link>http://www.kenyon.edu/directories/campus-directory/biography/eliza-ablovatski/</Link>
        <Books>
            <Book>
                <Year></Year>
                <Details>
                    <![CDATA[<p><em>Zwischen Pruth und Jordan. Lebenserinnerungen Czernowitzer Juden</em><em>&nbsp;,&nbsp;</em>with Gaby Coldewey and others K&ouml;ln: B&ouml;hlau Verlag, 2003</p>]]>
                </Details>
            </Book>
            <Book>
                <Year></Year>
                <Details>
                    <![CDATA[<p><em>Czernowitz ist gewen an alt jiddische Stdt: &Uuml;berlebende berichten,</em>&nbsp;With Gaby Coldewey and others. First Edition: Czernowitz,Ukraine: distributed by the Heinrich-B&ouml;ll-Stiftung, 1998 Second Edition: Berlin, 1999 (Third edition: Potsdam, forthcoming 2009)</p>]]>
                </Details>
            </Book>
        </Books>
        <Articles>
            <Article>
                <Year></Year>
                <Details>
                    <![CDATA[<p>"The Central European Revolutions of 1919 and the Myth of Judeo-Bolshevism,"&nbsp;<em>European Review of History, Vol. 17/ Issue 3: Cosmopolitanism, Nationalism and the Jews of East Central Europe (2010), 473-489.</em></p>]]>
                </Details>
            </Article>
            <Article>
                <Year></Year>
                <Details>
                    <![CDATA[<p>"Between Red Army and White Guard: Women in Budapest, 1918-1919," in&nbsp;<em>Gender and War in Twentieth-Century Eastern Europe,</em>&nbsp;edited by Maria Bucur and Nancy Wingfield&nbsp;Bloomington: Indiana University Press 2006</p>]]>
                </Details>
            </Article>
            <Article>
                <Year></Year>
                <Details>
                    <![CDATA[<p>"The Girl with the Titus-head: Women in Revolution in Munich and Budapest, 1919"&nbsp;<em>Nationalities Papers&nbsp;</em>28/3 (September 2000), 541-550</p>]]>
                </Details>
            </Article>
        </Articles>
        <Papers>
        </Papers>
        <Artwork>
        </Artwork>
        <Websites>
        </Websites>
    </Person>
    <Person>
        <FirstName>One</FirstName>
        <LastName>More</LastName>
        <Biography>Biography: Some interesting facts...</Biography>
        <Expertise>Expertise: Some interesting facts...</Expertise>
        <Image>somepicture.jpg</Image>
        <Link>somelink.com</Link>
        <Books>
            <Book>
                <Year>2001</Year>
                <Details>Book1</Details>
            </Book>
            <Book>
                <Year>2002</Year>
                <Details>Book2</Details>
            </Book>
        </Books>
        <Articles>
            <Article>
                <Year>2001</Year>
                <Details>Article1</Details>
            </Article>
        </Articles>
        <Papers>
        </Papers>
        <Artwork>
        </Artwork>
        <Websites>
        </Websites>
    </Person>
</People>';

With MyPersonCTE AS
(
    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS PersonID
          ,p.value('FirstName[1]','varchar(max)') AS FirstName
          ,p.value('LastName[1]','varchar(max)') AS LastName
          ,p.value('Biography[1]','varchar(max)') AS Biography
          ,p.value('Expertise[1]','varchar(max)') AS Expertise
          ,p.value('Image[1]','varchar(max)') AS Image
          ,p.value('Link[1]','varchar(max)') AS Link
          ,p.query('Books') AS BookNode
          ,p.query('Articles') AS ArticleNode
          --same for Papers, Artwork...
    FROM @x.nodes('/People/Person') AS A(p) 
)
,MyBooksCTE AS
(
    SELECT MyPersonCTE.*
          ,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS BookID
          ,x.value('Year[1]','int') AS BookYear
          ,x.value('Details[1]','varchar(max)') AS BookDetails
    FROM MyPersonCTE
    CROSS APPLY MyPersonCTE.BookNode.nodes('/Books/Book') A(x)  
)
,MyArticlesCTE AS
(
    SELECT MyPersonCTE.*
          ,ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS ArticleID
          ,x.value('Year[1]','int') AS ArticleYear
          ,x.value('Details[1]','varchar(max)') AS ArticleDetails
    FROM MyPersonCTE
    CROSS APPLY MyPersonCTE.ArticleNode.nodes('/Articles/Article') A(x)  
)
--same for Papers, Artwork...
SELECT p.*
      ,b.BookID
      ,b.BookYear
      ,b.BookDetails
      ,a.ArticleID
      ,a.ArticleYear
      ,a.ArticleDetails  
INTO #tempAllData
FROM MyPersonCTE AS p
LEFT JOIN MyBooksCTE AS b ON p.PersonID=b.PersonID
LEFT JOIN MyArticlesCTE AS a ON p.PersonID=a.PersonID ;

--#tempAllData is now filled with all data, copied in all combination: much to much
--but DISTINCT is your friend
--in this case you'd use the PersonID as FK in all related tables

SELECT DISTINCT PersonID,FirstName,LastName,Biography,Expertise --other fields
FROM #tempAllData;

SELECT DISTINCT PersonID,BookID,BookYear,BookDetails
FROM #tempAllData;

SELECT DISTINCT PersonID,ArticleID,ArticleYear,ArticleDetails
FROM #tempAllData;

DROP TABLE #tempAllData;

结果

人:

1   Eliza    Ablovatski     <p>Eliza Ablovatski joined ...
2   One      More           Biography: Some interesting facts...    

图书

1   1   0       <p><em>Zwischen Pruth und ...
1   2   0       <p><em>Czernowitz ist gewen ...
2   3   2001    Book1
2   4   2002    Book2

文章

1   1   0       <p>"The Central European ...
1   2   0       <p>"Between Red Army and White ...
1   3   0       <p>"The Girl with the Titus-head: ...
2   4   2001    Article1

但你真正想要实现的是 one,big table

这仅适用于动态SQL。从上面开始,将查询更改为以下内容。它会首先自动找到列名,然后使用UNION ALL强制所有数据进入同一个结构,最后有一个很大的动态PIVOT

注意:我在与CTE相关的ROW_NUMBERs中添加了PARTITION BY PersonID。这是为每个人获取以1开头的ID

With MyPersonCTE AS
(
    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS PersonID
          ,p.value('FirstName[1]','varchar(max)') AS FirstName
          ,p.value('LastName[1]','varchar(max)') AS LastName
          ,p.value('Biography[1]','varchar(max)') AS Biography
          ,p.value('Expertise[1]','varchar(max)') AS Expertise
          ,p.value('Image[1]','varchar(max)') AS Image
          ,p.value('Link[1]','varchar(max)') AS Link
          ,p.query('Books') AS BookNode
          ,p.query('Articles') AS ArticleNode
          --same for Papers, Artwork...
    FROM @x.nodes('/People/Person') AS A(p) 
)
,MyBooksCTE AS
(
    SELECT MyPersonCTE.*
          ,ROW_NUMBER() OVER(PARTITION BY PersonID ORDER BY (SELECT NULL)) AS BookID
          ,x.value('Year[1]','int') AS BookYear
          ,x.value('Details[1]','varchar(max)') AS BookDetails
    FROM MyPersonCTE
    CROSS APPLY MyPersonCTE.BookNode.nodes('/Books/Book') A(x)  
)
,MyArticlesCTE AS
(
    SELECT MyPersonCTE.*
          ,ROW_NUMBER() OVER(PARTITION BY PersonID ORDER BY (SELECT NULL)) AS ArticleID
          ,x.value('Year[1]','int') AS ArticleYear
          ,x.value('Details[1]','varchar(max)') AS ArticleDetails
    FROM MyPersonCTE
    CROSS APPLY MyPersonCTE.ArticleNode.nodes('/Articles/Article') A(x)  
)
--same for Papers, Artwork...
SELECT p.*
      ,b.BookID
      ,b.BookYear
      ,b.BookDetails
      ,a.ArticleID
      ,a.ArticleYear
      ,a.ArticleDetails  
INTO #tempAllData
FROM MyPersonCTE AS p
LEFT JOIN MyBooksCTE AS b ON p.PersonID=b.PersonID
LEFT JOIN MyArticlesCTE AS a ON p.PersonID=a.PersonID ;

--#tempAllData is now filled with all data, copied in all combination: much to much
--but DISTINCT is your friend
--in this case you'd use the PersonID as FK in all related tables

SELECT DISTINCT PersonID,FirstName,LastName,Biography,Expertise --other fields
INTO #tempPerson
FROM #tempAllData;

SELECT DISTINCT PersonID,BookID,BookYear,BookDetails
INTO #tempBooks
FROM #tempAllData;

SELECT DISTINCT PersonID,ArticleID,ArticleYear,ArticleDetails
INTO #tempArticles
FROM #tempAllData;

DECLARE @columnNames VARCHAR(MAX)=
 STUFF((SELECT DISTINCT ',Book_'+CAST(BookID AS VARCHAR(10)) FROM #tempBooks FOR XML PATH('')),1,1,'')
+(SELECT DISTINCT ',Article_'+CAST(ArticleID AS VARCHAR(10)) FROM #tempArticles FOR XML PATH(''));

DECLARE @cmd VARCHAR(MAX)=
'SELECT p.*
FROM
(
    SELECT p.*
          ,''Book_''+CAST(BookID AS VARCHAR(10)) AS ColumnName
          ,ISNULL(CAST(BookYear AS VARCHAR(4)),'''') + '' '' + BookDetails AS Data
    FROM #tempPerson AS p
    INNER JOIN #tempBooks AS b ON p.PersonID=b.PersonID
    UNION ALL
    SELECT p.*
          ,''Article_''+CAST(ArticleID AS VARCHAR(10)) AS ColumnName
          ,ISNULL(CAST(ArticleYear AS VARCHAR(4)),'''') + '' '' + ArticleDetails AS Data
    FROM #tempPerson AS p
    INNER JOIN #tempArticles AS a ON p.PersonID=a.PersonID
) AS tbl
PIVOT
(
    MAX(Data) FOR ColumnName IN(' +  @columnNames + ')
) AS p;'


EXEC(@cmd);

DROP TABLE #tempArticles
DROP TABLE #tempBooks 
DROP TABLE #tempPerson
DROP TABLE #tempAllData;