您好我正在尝试将大型XML文件导入我的sql server(2014)上的表
我已经使用下面的代码来处理较小的文件,并且认为这样就可以了,因为这是一次性的,我昨天开始使用它,当我今天上班时查询仍在运行,所以这显然是错误的路线。
这是代码。
CREATE TABLE files_index_bulk
(
Id INT IDENTITY PRIMARY KEY,
XMLData XML,
LoadedDateTime DATETIME
)
INSERT INTO files_index_bulk(XMLData, LoadedDateTime)
SELECT CONVERT(XML, BulkColumn, 2) AS BulkColumn, GETDATE()
FROM OPENROWSET(BULK 'c:\scripts\icecat\files.index.xml', SINGLE_BLOB) AS x;
SELECT * FROM files_index_bulk
任何人都可以指出另一种方式这样做,请我看看导入大文件,它继续回到使用批量。我已经是。
提前感谢。
这是我正在使用的表格,我希望将所有数据拉入其中。
USE [ICECATtesting]
GO
/****** Object: Table [dbo].[files_index] Script Date: 28/04/2017 20:10:44
******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
SET ANSI_PADDING ON
GO
CREATE TABLE [dbo].[files_index](
[Product_ID] [int] NULL,
[path] [varchar](100) NULL,
[Updated] [varchar](50) NULL,
[Quality] [varchar](50) NULL,
[Supplier_id] [int] NULL,
[Prod_ID] [varchar](1) NULL,
[Catid] [int] NULL,
[On_Market] [int] NULL,
[Model_Name] [varchar](250) NULL,
[Product_View] [int] NULL,
[HighPic] [varchar](1) NULL,
[HighPicSize] [int] NULL,
[HighPicWidth] [int] NULL,
[HighPicHeight] [int] NULL,
[Date_Added] [varchar](150) NULL
) ON [PRIMARY]
GO
SET ANSI_PADDING OFF
GO
这是xml文件的一个snippit。
<ICECAT-interface xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="http://data.icecat.biz/xsd/files.index.xsd">
<files.index Generated="20170427010009">
<file path="export/level4/EN/11.xml" Product_ID="11" Updated="20170329110432" Quality="SUPPLIER" Supplier_id="2" Prod_ID="PS300E-03YNL-DU" Catid="151" On_Market="0" Model_Name="Satellite 3000-400" Product_View="587591" HighPic="" HighPicSize="0" HighPicWidth="0" HighPicHeight="0" Date_Added="20050627000000">
</file>
<file path="export/level4/EN/12.xml" Product_ID="12" Updated="20170329110432" Quality="ICECAT" Supplier_id="7" Prod_ID="91.42R01.32H" Catid="151" On_Market="0" Model_Name="TravelMate 740LF" Product_View="40042" HighPic="http://images.icecat.biz/img/norm/high/12-31699.jpg" HighPicSize="19384" HighPicWidth="170" HighPicHeight="192" Date_Added="20050627000000">
</file>
<file path="export/level4/EN/13.xml" Product_ID="13" Updated="20170329110432" Quality="SUPPLIER" Supplier_id="2" Prod_ID="PP722E-H390W-NL" Catid="151" On_Market="0" Model_Name="Portégé 7220CT / NW2" Product_View="37021" HighPic="http://images.icecat.biz/img/norm/high/13-31699.jpg" HighPicSize="27152" HighPicWidth="280" HighPicHeight="280" Date_Added="20050627000000">
</file>
答案 0 :(得分:2)
SQL Server中XML列值的最大大小为2GB。将2.5GB文件导入单个XML列是不可能的。
<强>更新强>
由于您的基本目标是将文件中的XML元素转换为表行,因此您无需将整个文件内容转储到单个XML列中。您可以通过在客户端代码中粉碎XML并使用批量插入技术插入多行批次来避免2GB限制,减少内存需求并提高性能。
下面的示例Powershell脚本使用XmlTextReader来避免将整个XML读入DOM并使用SqlBulkCopy一次插入多行的批处理。这些技术的组合应该允许您在几分钟而不是几小时内插入数百万行。可以在自定义应用程序或SSIS脚本任务中实现这些相同的技术。
我注意到有几个表列指定了function subme(){
var inputs = document.getElementsByTagName('input');
for (var i = 0; i < inputs.length; i += 1) {
if(inputs[i].value == ''){
alert("All field must be filled")
exit()
}
}
document.getElementById("myForm").submit()
}
,但XML属性值包含许多字符。您需要扩展列的长度或转换源值。
varchar(1)
答案 1 :(得分:1)
尝试一下。这是我使用了一段时间的另一种方法。这相当快(可能会更快)。每天晚上,我都会从一家游戏公司获取一个巨大的xml数据库。这就是我导入它的方式。
$xml = new XMLReader();
$xml->open($xml_file); // file is your xml file you want to parse
while($xml->read() && $xml->name != 'game') { ; } // get past the header to your first record (game in my case)
while($xml->name == 'game') { // now while we are in this record
$element = new SimpleXMLElement($xml->readOuterXML());
$gameRec = $this->createGameRecord($element, $os); // this is my function to reduce some clutter - and I use it elsewhere too
/* this looks confusing, but it is not. There are over 20 fields, and instead of typing them all out, I just made a string. */
$sql = "INSERT INTO $table (";
foreach($gameRec as $field=>$game){
$sql .= " $field,";
}
$sql = rtrim($sql, ",");
$sql .=") values (";
foreach($gameRec as $field=>$game) {
$sql .= " :$field,";
}
$sql = rtrim($sql,",");
$sql .= ") ON DUPLICATE KEY UPDATE "; // online game doesn't have a gamerank - not my choice LOL, so I adjust that for here
switch ($os) {
case 'pc' : $sql .= "gamerank = ".$gameRec['gamerank'] ; break;
case 'mac': $sql .= "gamerank = ".$gameRec['gamerank'] ; break;
case 'pl' : $sql .= "playercount = ".$gameRec['playercount'] ; break;
case 'og' :
$playercount = $this->getPlayerCount($gameRec['gameid']);
$sql .= "playercount = ".$playercount['playercount'] ;
break;
}
try {
$stmt = $this->connect()->prepare($sql);
$stmt->execute($gameRec);
} catch (PDOException $e) {// Kludge
echo 'os: '.$os.'<br/>table: '.$table.'<br/>XML LINK: '.$comprehensive_xml.'<br/>Current Record:<br/><pre>'.print_r($gameRec).'</pre><br/>'.
'SQL: '.$sql.'<br/>';
die('Line:33<br/>Function: pullBFG()<BR/>Cannot add game record <br/>'.$e->getMessage());
}
/// VERY VERY VERY IMPORTANT do not forget these 2 lines, or it will go into a endless loop - I know, I've done it. locks up your system after a bit hahaah
$xml->next('game');
unset($element);
}// while there are games
这应该使您入门。显然,将“游戏”调整为您的xml记录。修剪掉我在这里的脂肪。
这是createGameRecord($ element,$ type ='pc') 基本上,它将其转换为数组以在其他地方使用,并使其更易于添加到数据库中。上面有一行:$ stmt-> execute($ gameRec);从该函数返回$ gameRec的位置。 PDO知道gameRec是一个数组,并在您插入IT时将其解析出来。 “ delHardReturns()是我的另一项功能,它消除了那些硬返回/ r / n等。似乎弄乱了SQL。我认为SQL具有此功能,但我没有追求。 希望对您有用。
private function createGameRecord($element, $type='pc') {
if( ($type == 'pc') || ($type == 'og') ) { // player count is handled separately
$game = array(
'gamename' => strval($element->gamename),
'gameid' => strval($element->gameid),
'genreid' => strval($element->genreid),
'allgenreid' => strval($element->allgenreid),
'shortdesc' => $this->delHardReturns(strval($element->shortdesc)),
'meddesc' => $this->delHardReturns(strval($element->meddesc)),
'bullet1' => $this->delHardReturns(strval($element->bullet1)),
'bullet2' => $this->delHardReturns(strval($element->bullet2)),
'bullet3' => $this->delHardReturns(strval($element->bullet3)),
'bullet4' => $this->delHardReturns(strval($element->bullet4)),
'bullet5' => $this->delHardReturns(strval($element->bullet5)),
'longdesc' => $this->delHardReturns(strval($element->longdesc)),
'foldername' => strval($element->foldername),
'hasdownload' => strval($element->hasdownload),
'hasdwfeature' => strval($element->hasdwfeature),
'releasedate' => strval($element->releasedate)
);
if($type === 'pc') {
$game['hasvideo'] = strval($element->hasvideo);
$game['hasflash'] = strval($element->hasflash);
$game['price'] = strval($element->price);
$game['gamerank'] = strval($element->gamerank);
$game['gamesize'] = strval($element->gamesize);
$game['macgameid'] = strval($element->macgameid);
$game['family'] = strval($element->family);
$game['familyid'] = strval($element->familyid);
$game['productid'] = strval($element->productid);
$game['pc_sysreqos'] = strval($element->systemreq->pc->sysreqos);
$game['pc_sysreqmhz'] = strval($element->systemreq->pc->sysreqmhz);
$game['pc_sysreqmem'] = strval($element->systemreq->pc->sysreqmem);
$game['pc_sysreqhd'] = strval($element->systemreq->pc->sysreqhd);
if(empty($game['gamerank'])) $game['gamerank'] = 99999;
$game['gamesize'] = $this->readableBytes((int)$game['gamesize']);
}// dealing with PC type
if($type === 'og') {
$game['onlineiframeheight'] = strval($element->onlineiframeheight);
$game['onlineiframewidth'] = strval($element->onlineiframewidth);
}
$game['releasedate'] = substr($game['releasedate'],0,10);
} else {// not type = pl
$game['playercount'] = strval($element->playercount);
$game['gameid'] = strval($element->gameid);
}// no type = pl else
return $game;
}/
答案 2 :(得分:0)
已更新:快得多。我做了一些研究,尽管我在上面的帖子中展示了一种(慢速)方法,但我能够找到一种更快的方法-对我而言确实如此。 由于与以前的帖子完全不同,因此我将其作为新答案。
LOAD XML LOCAL INFILE 'path/to/file.xlm' INTO TABLE tablename ROWS IDENTIFIED BY '<xml-identifier>'
示例
<students>
<student>
<name>john doe</name>
<boringfields>bla bla bla......</boringfields>
</student>
</students>
然后,MYSQL命令为:
LOAD XML LOCAL INFILE 'path/to/students.xlm' INTO TABLE tablename ROWS IDENTIFIED BY '<student>'
标识的行必须带有单引号和尖括号。 当我切换到这种方法时,我从12分钟+/-变为了30秒! +/-
对我有用的个提示。被使用 从表名删除 否则,它只会追加到您的数据库中。