Question

如何将带有600MB xml文件（超过300,000＆＃34;＆lt;＆＃34; abc：ABCRecord＆＃34;＆gt;＆＃34;）的50MB zip文件放入mysql数据表中？ xml文件本身具有以下结构：

<?xml version='1.0' encoding='UTF-8'?>
<abc:ABCData xmlns:abc="http://www.abc-example.com" xmlns:xyz="http:/www.xyz-example.com">
<abc:ABCHeader>
<abc:ContentDate>2015-08-15T09:03:29.379055+00:00</abc:ContentDate>
<abc:FileContent>PUBLISHED</abc:FileContent>
<abc:RecordCount>310598</abc:RecordCount>
<abc:Extension>
  <xyz:Sources>
    <xyz:Source>
      <xyz:ABC>5967007LIEEXZX4LPK21</xyz:ABC>
      <xyz:Name>Bornheim Register Centre</xyz:Name>
      <xyz:ROCSponsorCountry>NO</xyz:ROCSponsorCountry>
      <xyz:RecordCount>398</xyz:RecordCount>
      <xyz:ContentDate>2015-08-15T05:00:02.952+02:00</xyz:ContentDate>
      <xyz:LastAttemptedDownloadDate>2015-08-15T09:00:01.885686+00:00</xyz:LastAttemptedDownloadDate>
      <xyz:LastSuccessfulDownloadDate>2015-08-15T09:00:02.555222+00:00</xyz:LastSuccessfulDownloadDate>
      <xyz:LastValidDownloadDate>2015-08-15T09:00:02.555222+00:00</xyz:LastValidDownloadDate>
     </xyz:Source>
    </xyz:Sources>
   </abc:Extension>
 </abc:ABCHeader>
<abc:ABCRecords>
 <abc:ABCRecord>
 <abc:ABC>5967007LIEEXZX4LPK21</abc:ABC>
  <abc:Entity>
    <abc:LegalName>REGISTERENHETEN I Bornheim</abc:LegalName>
    <abc:LegalAddress>
      <abc:Line1>Havnegata 48</abc:Line1>
      <abc:City>Bornheim</abc:City>
      <abc:Country>NO</abc:Country>
      <abc:PostalCode>8900</abc:PostalCode>
    </abc:LegalAddress>
    <abc:HeadquartersAddress>
      <abc:Line1>Havnegata 48</abc:Line1>
      <abc:City>Bornheim</abc:City>
      <abc:Country>NO</abc:Country>
      <abc:PostalCode>8900</abc:PostalCode>
    </abc:HeadquartersAddress>
    <abc:BusinessRegisterEntityID register="Enhetsregisteret">974757873</abc:BusinessRegisterEntityID>
    <abc:LegalForm>Organisasjonsledd</abc:LegalForm>
    <abc:EntityStatus>Active</abc:EntityStatus>
  </abc:Entity>
  <abc:Registration>
    <abc:InitialRegistrationDate>2014-06-15T12:03:33.000+02:00</abc:InitialRegistrationDate>
    <abc:LastUpdateDate>2015-06-15T20:45:32.000+02:00</abc:LastUpdateDate>
    <abc:RegistrationStatus>ISSUED</abc:RegistrationStatus>
    <abc:NextRenewalDate>2016-06-15T12:03:33.000+02:00</abc:NextRenewalDate>
    <abc:ManagingLOU>59670054IEEXZX44PK21</abc:ManagingLOU>
  </abc:Registration>
</abc:ABCRecord>
<abc:ABCRecord>
  <abc:ABC>5967007LIE45ZX4MHC90</abc:ABC>
  <abc:Entity>
    <abc:LegalName>SUNNDAL HOSTBANK</abc:LegalName>
    <abc:LegalAddress>
      <abc:Line1>Sunfsalsvegen 15</abc:Line1>
      <abc:City>SUNNDALSPRA</abc:City>
      <abc:Country>NO</abc:Country>
      <abc:PostalCode>6600</abc:PostalCode>
    </abc:LegalAddress>
    <abc:HeadquartersAddress>
      <abc:Line1>Sunndalsvegen 15</abc:Line1>
      <abc:City>SUNNDALSPRA</abc:City>
      <abc:Country>NO</abc:Country>
      <abc:PostalCode>6600</abc:PostalCode>
    </abc:HeadquartersAddress>
    <abc:BusinessRegisterEntityID register="Foretaksregisteret">9373245963</abc:BusinessRegisterEntityID>
    <abc:LegalForm>Hostbank</abc:LegalForm>
    <abc:EntityStatus>Active</abc:EntityStatus>
  </abc:Entity>
  <abc:Registration>
    <abc:InitialRegistrationDate>2014-06-26T15:01:02.000+02:00</abc:InitialRegistrationDate>
    <abc:LastUpdateDate>2015-06-27T15:02:39.000+02:00</abc:LastUpdateDate>
    <abc:RegistrationStatus>ISSUED</abc:RegistrationStatus>
    <abc:NextRenewalDate>2016-06-26T15:01:02.000+02:00</abc:NextRenewalDate>
    <abc:ManagingLOU>5967007LIEEXZX4LPK21</abc:ManagingLOU>
  </abc:Registration>
</abc:ABCRecord>
</abc:ABCRecords>
</abc:ABCData>

mysql表是如何看起来的，我该如何实现？目标是在表中包含所有abc标记的内容。此外，每天都会有一个新的zip文件通过下载链接提供，它应该每天更新表格。 zip文件以以下结构命名：＆＃34; 20150815-XYZ-concatenated-file.zip＆＃34;。一步一步的提示会很棒吗？我试过这个：Importing XML file with special tags & namespaces <abc:xyz> in mysql截至目前，但它还没有完成工作！

根据以下的解释，我现在做了以下事情：

<?php

// open input
$reader = new XMLReader();
$reader->open('./xmlreader.xml');

// open output
$output = fopen('./xmlreader.csv', 'w');
fputcsv($output, ['id', 'name']);

$xmlns = [
  'a' => 'http://www.abc-example.com'
];

// prepare DOM
$dom   = new DOMDocument;
$xpath = new DOMXpath($dom);
foreach ($xmlns as $prefix => $namespaceURI) {
  $xpath->registerNamespace($prefix, $namespaceURI);
}

// look for the first record element
while (
  $reader->read() && 
  (
    $reader->localName !== 'ABCRecord' || 
    $reader->namespaceURI !== $xmlns['a']
  )
) {
  continue;
}

// while you have an record element
while ($reader->localName === 'ABCRecord') {
  if ($reader->namespaceURI === 'http://www.abc-example.com') {
    // expand record element node
    $node = $reader->expand($dom);
    // fetch data and write it to output
    fputcsv(
      $output, 
      [
        $xpath->evaluate('string(a:ABC)', $node),
        $xpath->evaluate('string(a:Entity/a:LegalName)', $node)
      ]
    );
  }

  // move to the next record sibling
  $reader->next('ABCRecord');
}

这是对的吗？！我在哪里可以找到输出？！我如何在mysql中获取输出。对不起我的菜鸟问题，这是我第一次这样做......

$dbHost = "localhost";
$dbUser = "root";
$dbPass = "password";
$dbName = "new_xml_extract";

$dbConn = mysqli_connect($dbHost, $dbUser, $dbPass, $dbName);

$delete = $dbConn->query("TRUNCATE TABLE `test_xml`");

....

$sql = "INSERT INTO `test_xml` (`.....`, `.....`)" . "VALUES ('". $dbConn->real_escape_string($.....) ."', '".$dbConn->real_escape_string($.....)."')";

$result = $dbConn->query($sql);
}

Answer 1

MySQL不了解您的XML结构。虽然它可以直接导入简单，格式良好的XML结构，但您需要自己转换更复杂的结构。您可以生成CSV，SQL或（支持的）XML。

对于大型文件，XMLReader是最好的API。首先创建一个实例并打开文件：

$comment = mysql_real_escape_string($_POST['comment']);

您正在使用命名空间，因此我建议为它们定义一个映射数组：

<p><?=htmlspecialchars($Rs['comment'])?></p>

可以使用与XML文件中相同的前缀/别名，但您也可以使用自己的前缀/别名。

接下来遍历XML节点，直到找到第一个记录元素节点：

$reader = new XMLReader();
$reader->open('php://stdin');

您需要比较本地名称（不带名称空间前缀的标记名称）和名称空间URI。这样，您的程序不依赖于XML文件中的实际前缀。

找到第一个节点后，您可以使用相同的本地名称遍历到下一个兄弟节点。

$xmlns = [
  'a' => 'http://www.abc-example.com'
];

您可以使用XMLReader读取记录数据，但使用DOM和XPath表达式会更容易。 XMLReader可以将当前节点扩展为DOM节点。因此，准备一个DOM文档，为它创建一个XPath对象并注册名称空间。扩展节点会将节点和所有后代加载到内存中，但不会加载父节点或兄弟节点。

while (
  $reader->read() && 
  ($reader->localName !== 'ABCRecord' ||  $reader->namespaceURI !== $xmlns['a'])
) {
  continue;
}

while ($reader->localName === 'ABCRecord') { if ($reader->namespaceURI === 'http://www.abc-example.com') { // read data for the record ... } // move to the next record sibling $reader->next('ABCRecord'); }允许您使用Xpath表达式从DOM中获取标量值或节点列表。

$dom = new DOMDocument; $xpath = new DOMXpath($dom); foreach ($xmlns as $prefix => $namespaceURI) { $xpath->registerNamespace($prefix, $namespaceURI); } while ($reader->localName === 'ABCRecord') { if ($reader->namespaceURI === 'http://www.abc-example.com') { $node = $reader->expand($dom); var_dump( $xpath->evaluate('string(a:ABC)', $node), $xpath->evaluate('string(a:Entity/a:LegalName)', $node) ); } $reader->next('ABCRecord'); }将数据写入CSV非常容易。

放在一起：

DOMXPath::evaluate()

输出：

fputcsv()

如何将带有600MB xml文件的50MB zip文件放入mysql数据表中？

1 个答案: