使用DOM使用PHP解析复杂的XML文件

时间:2014-10-30 19:41:36

标签: php xml parsing

我想从XML文件中解析和提取信息,例如我想提取以下内容:

    标题中的
  • uiobatchIdcreationDate

  • 身体中的所有accountTokenIdsetIdAmount等等

  • 来自页脚的
  • batchCountTotalAmount

这是我的XML文件:

<?xml version="1.0" encoding="UTF-8"?>
<c:Instructions xmlns:c="http://www.localhost.com/platform">
  <c:Header uio="a881-aa05-1231391408a2" batchId="c7-8ef6-eb81b345e736" creationDate="2014-08-10T00:00:00.000Z" />
  <c:Instructions accountToken="0001578066518896635248066746078163233357907196" Id="4178- a6dd-d1459cda71c3" setId="132530196846" Amount="27.00" Description="GoulSalons and Spas" Timestamp="2014-08-10T05:37:56.000Z" TransactionId="1324300196883" TransactionTimestamp="2014-08-07T18:32:30.000Z" merchant="1307" consumer="1_4f13eb-4efb-b450- ca747763fbc4" store="363" campaign="Partner, Parnd Spas, Partner, Pilot, 5/30/14" />
  <c:Instructions accountToken="000227229359641325887385737985006" Id="-08eb-43dd-884b-ccae980372f8" setId="2271109667569" Amount="12.24" Description="Pyro's Pi" Timestamp="2014-08-10T03:00:05.000Z" TransactionId="291153267592" TransactionTimestamp="2014-08-07T00:00:00.000Z" merchant="13" consumer="0d3-4ef3-8922-932f0d860012" store="31" campaign=" Challenge Pyro&amp;#39;s Partner, Pilot, 4/4/14" />
  <c:Instructions accountToken="0002108430726669005078952425" Id="bf48-4f86-84f6-df69432ef65b" setId="1211100232621" Amount="26.95" Description="Blue" Timestamp="2014-08-10T05:37:20.000Z" TransactionId="121030232642" TransactionTimestamp="2014-08-07T17:48:29.000Z" merchant="104880" consumer="2-4d32-a2b4-f0b54a8e50b5" store="39" campaign="Partner Challenge Blue Fin, Pilot, 5/30/14" />
  <c:Instructions accountToken="000341863769868297728447318744937673" Id="bf48-4f86-84f6-df69432ef65b" setId="1260320211819" Amount="52.00" Description="Fin" Timestamp="2014-08-10T05:37:41.000Z" TransactionId="1259211836" TransactionTimestamp="2014-08-08T02:41:47.000Z" merchant="180" consumer="6be4-46cd-95b8-244ab78c50ce" store="52" campaign="Partner Challenge Blue Fin, Partner, Pilot, 5/30/14" />
  <c:Instructions accountToken="000521692104031759552776822005" Id="42f0-4850-9e33-54e7d79927d9" setId="29126329667269" Amount="17.00" Description=" Bear" Timestamp="2014-08-10T03:00:05.000Z" TransactionId="291259667289" TransactionTimestamp="2014-08-08T00:00:00.000Z" merchant="137" consumer="71bb-46d2-8e42-c9798d7dd0d7" store="39" campaign="Partner Challenge Blind Bear, Partner, Pilot, 5/22/14" />
  <c:Instructions accountToken="0005216177101271759552776822005" Id="42f0-4850-9e33-54e7d79927d9" setId="29134327117182" Amount="9.00" Description="Bear" Timestamp="2014-08-10T03:00:05.000Z" TransactionId="29124667297" TransactionTimestamp="2014-08-08T00:00:00.000Z" merchant="132" consumer="71bb-46d2-8e42-c9798d7dd0d7" store="398" campaign="   Bear, Partner, Pilot, 5" />
  <c:Footer batchCount="6" totalAmount="144" />
</c:Instructions>

所以,我编写此代码至少检索AccountTokenId,但我得到一个空白页:

<?php
$doc = new DOMDocument;
$doc->load("sample.xml");
$rows = $doc->getElementsByTagNameNS('http://www.localhost.com/platform', 'Instruction');
foreach ($rows as $row) {
$AToken = $row->getAttribute('accountToken');
$Id    = $row->getAttribute('Id');
var_dump($AToken, $Id);
}
?>

但我还没能从XML文件中获取任何内容。

2 个答案:

答案 0 :(得分:1)

这应该是getElementsByTagNameNS调用中的指令而不是指令吗?

答案 1 :(得分:0)

您拥有的XML文档有点奇怪,因为顶部元素与其直接子元素相同。出于这个原因,我会使用DOMXPath从文档中检索您想要的元素(好的,XPath很棒!)。要使用DOMXPath,您需要创建一个新的DOMXPath对象并注册名称空间http://www.localhost.com/platform,以便它可以搜索这些元素。

注意:由于文档中没有Instruction元素,您的脚本无法运行 - 它们全部是Instructions:)

这是一个简单且易于扩展的脚本,可以将数据从您发布的文档中提取出来。它只是打印数据,但您可能希望用它做一些更有趣的事情。

$doc = new DOMDocument;
$doc->load( $your_xml_here );

# create the DOMXPath object
$xp = new DOMXPath($doc);
# registers the namespace; to search for nodes in this namespace, prefix them with "c"
$xp->registerNamespace("c", 'http://www.localhost.com/platform');

# search for all c:Header nodes under the top node, c:Instructions
foreach ($xp->query("/c:Instructions/c:Header") as $h) {
    # array of attributes to retrieve...
    foreach (array('uio', 'batchId', 'creationDate') as $ha) {
        print "header attribute $ha: " . $h->getAttribute($ha) . PHP_EOL;
    }
}

# retrieves c:Instructions nodes that are under the c:Instructions node
foreach ($xp->query("/c:Instructions/c:Instructions") as $i) {
    # you can expand the list of attributes here
    foreach (array("Id", "accountToken") as $ia) {
        print "Instruction attrib $ia: " . $i->getAttribute($ia) . PHP_EOL;
    }
}

# footer information    
foreach ($xp->query("/c:Instructions/c:Footer") as $f) {
    foreach (array("batchCount", "totalAmount") as $fa) {
        print "footer attribute $fa: " . $f->getAttribute($fa) . PHP_EOL;
    }
}

您发布的XML的输出:

header attribute uio: a881-aa05-1231391408a2
header attribute batchId: c7-8ef6-eb81b345e736
header attribute creationDate: 2014-08-10T00:00:00.000Z
Instruction attrib Id: 4178- a6dd-d1459cda71c3
Instruction attrib accountToken: 0001578066518896635248066746078163233357907196
Instruction attrib Id: -08eb-43dd-884b-ccae980372f8
Instruction attrib accountToken: 000227229359641325887385737985006
Instruction attrib Id: bf48-4f86-84f6-df69432ef65b
Instruction attrib accountToken: 0002108430726669005078952425
Instruction attrib Id: bf48-4f86-84f6-df69432ef65b
Instruction attrib accountToken: 000341863769868297728447318744937673
Instruction attrib Id: 42f0-4850-9e33-54e7d79927d9
Instruction attrib accountToken: 000521692104031759552776822005
Instruction attrib Id: 42f0-4850-9e33-54e7d79927d9
Instruction attrib accountToken: 0005216177101271759552776822005
footer attribute batchCount: 6
footer attribute totalAmount: 144

附录:如果您获得所有属性,使用SimpleXMLElement运行等效代码可能会更快:

$sxe = new SimpleXMLElement( $xml_source );
$sxe->registerXPathNamespace("c", 'http://www.localhost.com/platform');

# e.g. get the header data
foreach ($sxe->xpath("/c:Instructions/c:Header") as $i) {
    # iterate through all the element attributes
    foreach ($i->attributes() as $name => $value) {
        print "header attribute $name is $value" . PHP_EOL;
    }
}

输出:

header attribute uio is a881-aa05-1231391408a2
header attribute batchId is c7-8ef6-eb81b345e736
header attribute creationDate is 2014-08-10T00:00:00.000Z