批量插入嵌套的xml,外键为第一个表的标识列

时间:2014-09-30 15:36:32

标签: sql-server xml tsql azure-sql-database sqlxml

我有一个xml如下:

<Records>
  <Record>
    <Name>Best of Pop</Name>
    <Studio>ABC studio</Studio>
    <Artists>
      <Artist>
        <ArtistName>John</ArtistName>
        <Age>36</Age>            
      </Artist> 
      <Artist>
        <ArtistName>Jessica</ArtistName>
        <Age>20</Age>            
      </Artist>
    </Artists>
  </Record>
  <Record>
    <Name>Nursery rhymes</Name>
    <Studio>XYZ studio</Studio>
    <Artists>
      <Artist>
        <ArtistName>Judy</ArtistName>
        <Age>10</Age>            
      </Artist> 
      <Artist>
        <ArtistName>Rachel</ArtistName>
        <Age>15</Age>            
      </Artist>
    </Artists>
  </Record>
</Records>

此文件可能包含数百万条记录。我在Azure SQL Database上运行的MS SQL数据库有以下2个表来存储这些记录:

  1. Record(RecordId [PK,identity,auto-increment],Name,Studio)

  2. Artist(RecordId [外键指Record.RecordId],ArtistName,年龄)

  3. 是否可以将记录批量插入Record表,获取RecordIds,然后使用xml节点方法在xml的单次遍历中将艺术家信息批量插入Artist表?

    我一直在寻找一种有效的方法,但是徒劳无功。

    我尝试过类似于herehere所述的方法,但我无法找到解决方案。

    解决方案的任何指示都会有很大的帮助。

    更新: @srutzky:谢谢你的解决方案。这完全符合我的要求。但有一个问题。我必须使用节点方法来解决问题。我已经更改了查询的第一部分。但是我在下半场陷入了困境。这就是我的目标。

    DECLARE @Record TABLE (RecordId INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
                           Name NVARCHAR(400) UNIQUE,
                           Studio NVARCHAR(400));
    DECLARE @Artist TABLE (ArtistId INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
                           RecordId INT NOT NULL,
                           ArtistName NVARCHAR(400), Age INT);
    
    INSERT INTO @Record (Name, Studio)
       SELECT  T.c.value(N'(Name/text())[1]', 'NVARCHAR(400)'),
               T.c.value(N'(Studio/text())[1]', 'NVARCHAR(400)')
     FROM @ImportData.nodes('/Records/Record') T(c);
    
    SELECT * FROM @Record
    

    你能帮我解决第二部分吗?我是这种xml处理方法的新手。

    UPDATE2 :我明白了......我绞尽脑汁待了几个小时,尝试了一些事情,最终找到了解决方案。

    DECLARE @Record TABLE (RecordId INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
                           Name NVARCHAR(400) UNIQUE,
                           Studio NVARCHAR(400));
    DECLARE @Artist TABLE (ArtistId INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
                           RecordId INT NOT NULL,
                           ArtistName NVARCHAR(400), 
                           Age INT);
    
    INSERT INTO @Record (Name, Studio)
       SELECT  T.c.value(N'(Name/text())[1]', 'NVARCHAR(400)'),
               T.c.value(N'(Studio/text())[1]', 'NVARCHAR(400)')
     FROM @ImportData.nodes('/Records/Record') T(c);
    
    INSERT INTO @Artist (RecordId, ArtistName, Age)
        SELECT  (SELECT RecordId FROM @Record WHERE Name=T.c.value(N'(../../Name/text())[1]', 'NVARCHAR(400)')),
                T.c.value(N'(ArtistName/text())[1]', 'NVARCHAR(400)'),
               T.c.value(N'(Age/text())[1]', 'INT')
     FROM @ImportData.nodes('/Records/Record/Artists/Artist') T(c);
    
     SELECT * FROM @Record
     SELECT * FROM @Artist
    

    @srutzky:非常感谢我指出了正确的方向。欢迎任何改进此解决方案的建议。

1 个答案:

答案 0 :(得分:3)

无论如何都不能在单个传递中完成,因为你不能在同一个DML语句中插入两个表(好吧,在Triggers和OUTPUT子句之外,这两个都没有帮助)。但它可以在两次通过中有效地完成。 <Name> <Record>元素内的事实是唯一的关键,因为它允许我们使用Record表作为第二遍的查找表(即当我们得到Artist行)。

首先,您需要(好吧,应该)在UNIQUE INDEX上创建Record (Name ASC)。在下面的示例中,我使用的是UNIQUE CONSTRAINT,但这只是因为我使用了表变量而不是临时表来使示例代码更容易重新运行(不需要在顶部显式的IF EXISTS DROP) 。该指数将有助于第二轮的表现。

该示例使用OPENXML,因为使用.nodes()函数更有效,因为需要遍历同一文档两次。 OPENXML函数的最后一个参数2指定文档是“基于元素的”,因为默认解析正在查找“基于属性”。

DECLARE @DocumentID INT, @ImportData XML;

SET @ImportData = N'
<Records>
  <Record>
    <Name>Best of Pop</Name>
    <Studio>ABC studio</Studio>
    <Artists>
      <Artist>
        <ArtistName>John</ArtistName>
        <Age>36</Age>            
      </Artist> 
      <Artist>
        <ArtistName>Jessica</ArtistName>
        <Age>20</Age>            
      </Artist>
    </Artists>
  </Record>
  <Record>
    <Name>Nursery rhymes</Name>
    <Studio>XYZ studio</Studio>
    <Artists>
      <Artist>
        <ArtistName>Judy</ArtistName>
        <Age>10</Age>            
      </Artist> 
      <Artist>
        <ArtistName>Rachel</ArtistName>
        <Age>15</Age>            
      </Artist>
    </Artists>
  </Record>
</Records>';


DECLARE @Record TABLE (RecordId INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
                       Name NVARCHAR(400) UNIQUE,
                       Studio NVARCHAR(400));
DECLARE @Artist TABLE (ArtistId INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
                       RecordId INT NOT NULL,
                       ArtistName NVARCHAR(400), Age INT);

EXEC sp_xml_preparedocument @DocumentID OUTPUT, @ImportData;

-- First pass: extract "Record" rows
INSERT INTO @Record (Name, Studio)
   SELECT Name, Studio
   FROM   OPENXML (@DocumentID, N'/Records/Record', 2) 
             WITH (Name    NVARCHAR(400)  './Name/text()', 
                   Studio  NVARCHAR(400)  './Studio/text()');


-- Second pass: extract "Artist" rows
INSERT INTO @Artist (RecordId, ArtistName, Age)
   SELECT rec.RecordId, art.ArtistName, art.Age
   FROM   OPENXML (@DocumentID, N'/Records/Record/Artists/Artist', 2) 
             WITH (Name        NVARCHAR(400)  '../../Name/text()',
                   ArtistName  NVARCHAR(400)  './ArtistName/text()', 
                   Age         INT  './Age/text()') art
   INNER JOIN @Record rec
           ON rec.[Name] = art.[Name];


EXEC sp_xml_removedocument @DocumentID;
-------------------

SELECT * FROM @Record ORDER BY [RecordID];
SELECT * FROM @Artist ORDER BY [RecordID];

参考文献:

修改
根据使用.nodes()函数而不是OPENXML的新要求,以下内容将起作用:

DECLARE @ImportData XML;

SET @ImportData = N'
<Records>
  <Record>
    <Name>Best of Pop</Name>
    <Studio>ABC studio</Studio>
    <Artists>
      <Artist>
        <ArtistName>John</ArtistName>
        <Age>36</Age>            
      </Artist> 
      <Artist>
        <ArtistName>Jessica</ArtistName>
        <Age>20</Age>            
      </Artist>
    </Artists>
  </Record>
  <Record>
    <Name>Nursery rhymes</Name>
    <Studio>XYZ studio</Studio>
    <Artists>
      <Artist>
        <ArtistName>Judy</ArtistName>
        <Age>10</Age>            
      </Artist> 
      <Artist>
        <ArtistName>Rachel</ArtistName>
        <Age>15</Age>            
      </Artist>
    </Artists>
  </Record>
</Records>';

IF (OBJECT_ID('tempdb..#Record') IS NOT NULL)
BEGIN
   DROP TABLE #Record;
END;
IF (OBJECT_ID('tempdb..#Artist') IS NOT NULL)
BEGIN
   DROP TABLE #Artist;
END;

CREATE TABLE #Record (RecordId INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
                      Name NVARCHAR(400) UNIQUE,
                      Studio NVARCHAR(400));
CREATE TABLE #Artist (ArtistId INT NOT NULL IDENTITY(1, 1) PRIMARY KEY,
                      RecordId INT NOT NULL,
                      ArtistName NVARCHAR(400),
                      Age INT);


-- First pass: extract "Record" rows
INSERT INTO #Record (Name, Studio)
   SELECT col.value(N'(./Name/text())[1]', N'NVARCHAR(400)') AS [Name],
          col.value(N'(./Studio/text())[1]', N'NVARCHAR(400)') AS [Studio]
   FROM   @ImportData.nodes(N'/Records/Record') tab(col);


-- Second pass: extract "Artist" rows
;WITH artists AS
(
   SELECT col.value(N'(../../Name/text())[1]', N'NVARCHAR(400)') AS [RecordName],
          col.value(N'(./ArtistName/text())[1]', N'NVARCHAR(400)') AS [ArtistName],
          col.value(N'(./Age/text())[1]', N'INT') AS [Age]
   FROM   @ImportData.nodes(N'/Records/Record/Artists/Artist') tab(col)
)
INSERT INTO #Artist (RecordId, ArtistName, Age)
   SELECT rec.RecordId, art.ArtistName, art.Age
   FROM artists art
   INNER JOIN #Record rec
           ON rec.[Name] = art.RecordName;

-- OR --
-- INSERT INTO #Artist (RecordId, ArtistName, Age)
   SELECT rec.RecordId,
          col.value(N'(./ArtistName/text())[1]', N'NVARCHAR(400)') AS [ArtistName],
          col.value(N'(./Age/text())[1]', N'INT') AS [Age]
   FROM   @ImportData.nodes(N'/Records/Record/Artists/Artist') tab(col)
   INNER JOIN #Record rec
           ON rec.Name = col.value(N'(../../Name/text())[1]', N'NVARCHAR(400)');

-------------------

SELECT * FROM #Record ORDER BY [RecordID];
SELECT * FROM #Artist ORDER BY [RecordID];

有两种方法可以插入上面显示的#Artist。第一个使用CTE从INSERT / SELECT查询中抽象出XML提取。另一个是简化版,类似于您在问题的更新2 中的查询。