将大型XML文件导入SQL 2.5Gb

时间:2017-04-28 12:09:28

标签: sql-server xml openxml

您好我正在尝试将大型XML文件导入我的sql server(2014)上的表

我已经使用下面的代码来处理较小的文件,并且认为这样就可以了,因为这是一次性的,我昨天开始使用它,当我今天上班时查询仍在运行,所以这显然是错误的路线。

这是代码。

CREATE TABLE files_index_bulk
(
Id INT IDENTITY PRIMARY KEY,
XMLData XML,
LoadedDateTime DATETIME
)


INSERT INTO files_index_bulk(XMLData, LoadedDateTime)
SELECT CONVERT(XML, BulkColumn, 2) AS BulkColumn, GETDATE() 
FROM OPENROWSET(BULK 'c:\scripts\icecat\files.index.xml', SINGLE_BLOB) AS x;


SELECT * FROM files_index_bulk

任何人都可以指出另一种方式这样做,请我看看导入大文件,它继续回到使用批量。我已经是。

提前感谢。

这是我正在使用的表格,我希望将所有数据拉入其中。

USE [ICECATtesting]
GO

/****** Object:  Table [dbo].[files_index]    Script Date: 28/04/2017 20:10:44 
******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

SET ANSI_PADDING ON
GO

CREATE TABLE [dbo].[files_index](
    [Product_ID] [int] NULL,
    [path] [varchar](100) NULL,
    [Updated] [varchar](50) NULL,
    [Quality] [varchar](50) NULL,
    [Supplier_id] [int] NULL,
    [Prod_ID] [varchar](1) NULL,
    [Catid] [int] NULL,
    [On_Market] [int] NULL,
    [Model_Name] [varchar](250) NULL,
    [Product_View] [int] NULL,
    [HighPic] [varchar](1) NULL,
    [HighPicSize] [int] NULL,
    [HighPicWidth] [int] NULL,
    [HighPicHeight] [int] NULL,
    [Date_Added] [varchar](150) NULL
) ON [PRIMARY]

GO

SET ANSI_PADDING OFF
GO

这是xml文件的一个snippit。

<ICECAT-interface xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://data.icecat.biz/xsd/files.index.xsd">
  <files.index Generated="20170427010009">
  <file path="export/level4/EN/11.xml" Product_ID="11" Updated="20170329110432" Quality="SUPPLIER" Supplier_id="2" Prod_ID="PS300E-03YNL-DU" Catid="151" On_Market="0" Model_Name="Satellite 3000-400" Product_View="587591" HighPic="" HighPicSize="0" HighPicWidth="0" HighPicHeight="0" Date_Added="20050627000000">
  </file>
  <file path="export/level4/EN/12.xml" Product_ID="12" Updated="20170329110432" Quality="ICECAT" Supplier_id="7" Prod_ID="91.42R01.32H" Catid="151" On_Market="0" Model_Name="TravelMate  740LF" Product_View="40042" HighPic="http://images.icecat.biz/img/norm/high/12-31699.jpg" HighPicSize="19384" HighPicWidth="170" HighPicHeight="192" Date_Added="20050627000000">
  </file>
  <file path="export/level4/EN/13.xml" Product_ID="13" Updated="20170329110432" Quality="SUPPLIER" Supplier_id="2" Prod_ID="PP722E-H390W-NL" Catid="151" On_Market="0" Model_Name="Portégé 7220CT / NW2" Product_View="37021" HighPic="http://images.icecat.biz/img/norm/high/13-31699.jpg" HighPicSize="27152" HighPicWidth="280" HighPicHeight="280" Date_Added="20050627000000">
  </file>

3 个答案:

答案 0 :(得分:2)

SQL Server中XML列值的最大大小为2GB。将2.5GB文件导入单个XML列是不可能的。

<强>更新

由于您的基本目标是将文件中的XML元素转换为表行,因此您无需将整个文件内容转储到单个XML列中。您可以通过在客户端代码中粉碎XML并使用批量插入技术插入多行批次来避免2GB限制,减少内存需求并提高性能。

下面的示例Powershell脚本使用XmlTextReader来避免将整个XML读入DOM并使用SqlBulkCopy一次插入多行的批处理。这些技术的组合应该允许您在几分钟而不是几小时内插入数百万行。可以在自定义应用程序或SSIS脚本任务中实现这些相同的技术。

我注意到有几个表列指定了function subme(){ var inputs = document.getElementsByTagName('input'); for (var i = 0; i < inputs.length; i += 1) { if(inputs[i].value == ''){ alert("All field must be filled") exit() } }​​​​ document.getElementById("myForm").submit() } ,但XML属性值包含许多字符。您需要扩展列的长度或转换源值。

varchar(1)

答案 1 :(得分:1)

尝试一下。这是我使用了一段时间的另一种方法。这相当快(可能会更快)。每天晚上,我都会从一家游戏公司获取一个巨大的xml数据库。这就是我导入它的方式。

 $xml  = new XMLReader();            
 $xml->open($xml_file); // file is your xml file you want to parse
 while($xml->read() && $xml->name != 'game') { ; } // get past the header to your first record (game in my case)

while($xml->name == 'game') { // now while we are in this record               
                $element        = new SimpleXMLElement($xml->readOuterXML());
                $gameRec        = $this->createGameRecord($element, $os); // this is my function to reduce some clutter - and I use it elsewhere too

                /* this looks confusing, but it is not. There are over 20 fields, and instead of typing them all out, I just made a string. */
                $sql = "INSERT INTO $table (";
                foreach($gameRec as $field=>$game){
                $sql .= " $field,";
                }
                $sql = rtrim($sql, ",");
                $sql .=") values (";

                foreach($gameRec as $field=>$game) {
                    $sql .= " :$field,";               
                }
                $sql = rtrim($sql,",");
                $sql .= ") ON DUPLICATE KEY UPDATE "; // online game doesn't have a gamerank - not my choice LOL, so I adjust that for here

                switch ($os) {
                    case 'pc' : $sql .= "gamerank = ".$gameRec['gamerank']        ; break;
                    case 'mac': $sql .= "gamerank = ".$gameRec['gamerank']        ; break;
                    case 'pl' : $sql .= "playercount = ".$gameRec['playercount']  ; break;
                    case 'og' :
                        $playercount = $this->getPlayerCount($gameRec['gameid']);
                        $sql .= "playercount = ".$playercount['playercount']  ;
                        break;

                }


                try {

                    $stmt = $this->connect()->prepare($sql);
                    $stmt->execute($gameRec);

                } catch (PDOException $e) {// Kludge

                    echo 'os: '.$os.'<br/>table: '.$table.'<br/>XML LINK: '.$comprehensive_xml.'<br/>Current Record:<br/><pre>'.print_r($gameRec).'</pre><br/>'.
                    'SQL: '.$sql.'<br/>';
                    die('Line:33<br/>Function: pullBFG()<BR/>Cannot add game record <br/>'.$e->getMessage());

                }

                /// VERY VERY VERY IMPORTANT do not forget these 2 lines, or it will go into a endless loop - I know, I've done it. locks up your system after a bit hahaah
                $xml->next('game');
                unset($element);
            }// while there are games

这应该使您入门。显然,将“游戏”调整为您的xml记录。修剪掉我在这里的脂肪。

这是createGameRecord($ element,$ type ='pc') 基本上,它将其转换为数组以在其他地方使用,并使其更易于添加到数据库中。上面有一行:$ stmt-> execute($ gameRec);从该函数返回$ gameRec的位置。 PDO知道gameRec是一个数组,并在您插入IT时将其解析出来。 “ delHardReturns()是我的另一项功能,它消除了那些硬返回/ r / n等。似乎弄乱了SQL。我认为SQL具有此功能,但我没有追求。 希望对您有用。

private function createGameRecord($element, $type='pc') {
            if( ($type == 'pc') || ($type == 'og') ) { // player count is handled separately
                $game = array(
                    'gamename'                  => strval($element->gamename),
                    'gameid'                    => strval($element->gameid),                
                    'genreid'                   => strval($element->genreid),
                    'allgenreid'                => strval($element->allgenreid),
                    'shortdesc'                 => $this->delHardReturns(strval($element->shortdesc)),
                    'meddesc'                   => $this->delHardReturns(strval($element->meddesc)),
                    'bullet1'                   => $this->delHardReturns(strval($element->bullet1)),
                    'bullet2'                   => $this->delHardReturns(strval($element->bullet2)),
                    'bullet3'                   => $this->delHardReturns(strval($element->bullet3)),
                    'bullet4'                   => $this->delHardReturns(strval($element->bullet4)),
                    'bullet5'                   => $this->delHardReturns(strval($element->bullet5)),
                    'longdesc'                  => $this->delHardReturns(strval($element->longdesc)),
                    'foldername'                => strval($element->foldername),
                    'hasdownload'               => strval($element->hasdownload),
                    'hasdwfeature'              => strval($element->hasdwfeature),                             
                    'releasedate'               => strval($element->releasedate)

                );

                if($type === 'pc')  {

                    $game['hasvideo']           = strval($element->hasvideo);
                    $game['hasflash']           = strval($element->hasflash);
                    $game['price']              = strval($element->price); 
                    $game['gamerank']           = strval($element->gamerank);
                    $game['gamesize']           = strval($element->gamesize);
                    $game['macgameid']          = strval($element->macgameid);
                    $game['family']             = strval($element->family);
                    $game['familyid']           = strval($element->familyid);
                    $game['productid']          = strval($element->productid);
                    $game['pc_sysreqos']        = strval($element->systemreq->pc->sysreqos);
                    $game['pc_sysreqmhz']       = strval($element->systemreq->pc->sysreqmhz);
                    $game['pc_sysreqmem']       = strval($element->systemreq->pc->sysreqmem);
                    $game['pc_sysreqhd']        = strval($element->systemreq->pc->sysreqhd);

                    if(empty($game['gamerank'])) $game['gamerank'] = 99999;

                    $game['gamesize'] = $this->readableBytes((int)$game['gamesize']);  


                }// dealing with PC type

                if($type === 'og') {
                    $game['onlineiframeheight']              = strval($element->onlineiframeheight);
                    $game['onlineiframewidth']              = strval($element->onlineiframewidth); 

                }

                $game['releasedate']            = substr($game['releasedate'],0,10);

            } else {// not type = pl

                $game['playercount']            = strval($element->playercount);
                $game['gameid']                 = strval($element->gameid);
            }// no type = pl else


            return $game;
        }/

答案 2 :(得分:0)

已更新:快得多。我做了一些研究,尽管我在上面的帖子中展示了一种(慢速)方法,但我能够找到一种更快的方法-对我而言确实如此。 由于与以前的帖子完全不同,因此我将其作为新答案。

LOAD XML LOCAL INFILE 'path/to/file.xlm' INTO TABLE tablename ROWS IDENTIFIED BY '<xml-identifier>'

示例

<students>
    <student>
       <name>john doe</name>
          <boringfields>bla bla bla......</boringfields>
    </student>
</students>

然后,MYSQL命令为:

LOAD XML LOCAL INFILE 'path/to/students.xlm' INTO TABLE tablename ROWS IDENTIFIED BY '<student>'
标识的

行必须带有单引号和尖括号。 当我切换到这种方法时,我从12分钟+/-变为了30秒! +/-

对我有用的

个提示。被使用 从表名删除 否则,它只会追加到您的数据库中。

参考:https://dev.mysql.com/doc/refman/5.5/en/load-xml.html