xml属性作为php中的数组索引

时间:2012-07-05 16:42:41

标签: php simplexml

我有一些XML,其中包含属性中的大量信息,这里有一个小例子。

<?xml version="1.0" encoding="UTF-8"?>
 <collection xmlns="http://www.loc.gov/MARC21/slim">
  <record>
    <leader>04170npc a22003613u 4500</leader>
    <controlfield tag="001">vtls003932502</controlfield>
    <controlfield tag="003">WlAbNL</controlfield>
    <datafield tag="035" ind1=" " ind2=" ">
        <subfield code="a">(WlAbNL)1002</subfield>
    </datafield>
    <datafield tag="040" ind1=" " ind2=" ">
        <subfield code="a">WlAbNL</subfield>
        <subfield code="b">eng</subfield>
        <subfield code="c">WlAbNL</subfield>
    </datafield>
    <datafield tag="245" ind1="0" ind2="0">
        <subfield code="a">Scott Blair Collection,</subfield>
        <subfield code="f">1910 -</subfield>
    </datafield>
    <datafield tag="653" ind1=" " ind2=" ">
        <subfield code="a">rheology</subfield>
    </datafield>
  </record>
  <record>
    <leader>04229npc a22005893u 4500</leader>
    <controlfield tag="001">vtls003932503</controlfield>
    <datafield tag="035" ind1=" " ind2=" ">
        <subfield code="a">(WlAbNL)1004</subfield>
    </datafield>
    <datafield tag="040" ind1=" " ind2=" ">
       <subfield code="a">WlAbNL</subfield>
       <subfield code="b">eng</subfield>
       <subfield code="c">WlAbNL</subfield>
    </datafield>
    <datafield tag="245" ind1="0" ind2="0">
       <subfield code="a">Celtic Collection,</subfield>
       <subfield code="f">17th century -</subfield>
    </datafield>
    <datafield tag="653" ind1=" " ind2=" ">
        <subfield code="a">Scottish Gaelic language</subfield>
    </datafield>
 </record>
</collection>

目前我有一个只加载整个文档的PHP脚本

$xml = simplexml_load_file("Mapping_coll_wales.xml");
$records = $xml->record;

这会创建一个类似于此的记录数组(我已经将其减少到一个记录)

  SimpleXMLElement Object
(
[leader] => 04170npc a22003613u 4500
[controlfield] => Array
    (
        [0] => vtls003932502
        [1] => WlAbNL
    )
 [datafield] => Array
    (
        [0] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [tag] => 035
                        [ind1] =>  
                        [ind2] =>  
                    )

                [subfield] => (WlAbNL)1002
            )
        [1] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [tag] => 040
                        [ind1] =>  
                        [ind2] =>  
                    )

                [subfield] => Array
                    (
                        [0] => WlAbNL
                        [1] => eng
                        [2] => WlAbNL
                    )

            )

        [2] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [tag] => 245
                        [ind1] => 0
                        [ind2] => 0
                    )

                [subfield] => Array
                    (
                        [0] => Scott Blair Collection,
                        [1] => 1910 -
                    )
            )
        [3] => SimpleXMLElement Object
            (
                [@attributes] => Array
                    (
                        [tag] => 653
                        [ind1] =>  
                        [ind2] =>  
                    )

                [subfield] => rheology
            )
    )

)

目前我只是通过假设它在数组中的位置来拉动我需要的字段,并在每个记录上循环(大约有500个)

for ($i =0; $i <5; $i++) {

echo '<strong>Title</strong> = : ' . $records[$i]->datafield[2]->subfield . '<br />';
echo '<strong>tag</strong>  = :' . $records[$i]->datafield[3]->subfield . '<br />';


echo '<br />------------------------------------------------------------------------<br />';
}

然而,xml可能包含其他标签,所以我不想依赖它作为索引2的子字段等。理想情况下我希望能够使用像

这样的东西来调用它。
echo '<strong>Title</strong> = : ' . $records[$i]->datafield[245][a] . '<br />';

我确定它相当直接而且我只是遗漏了一些东西,但是能够将标记作为数组索引加载或者通过其标记直接获取数据字段以及通过其代码直接获取子字段是一件好事,因为那不会改变。

希望这是有道理的。

1 个答案:

答案 0 :(得分:1)

您可以使用XPath匹配符合特定条件的元素。

但是,由于您使用的是命名空间节点,因此必须在要使用xpath()的每个节点上使用命名空间路径表达式注册命名空间。

参见下面的例子,它在循环中起作用。

$nsp = 'marc';
$nsuri = 'http://www.loc.gov/MARC21/slim';


$records = $xml->record;


foreach($records as $record) {
    $record->registerXPathNamespace($nsp, $nsuri);
    $datafields = $record->xpath('marc:datafield[@tag=245]');
    foreach ($datafields as $datafield) {
        $datafield->registerXPathNamespace($nsp, $nsuri);
        $subfields = $datafield->xpath('marc:subfield[@code="a"]');
        var_dump($subfields);
    }
}

或者,您可以仅使用xpath而不是simplexml对象访问来向下递归。以下两种方法可以得到相同的结果:

$records = $xml->record;
$records->registerXPathNamespace($nsp, $nsuri);

$tags = array('245', '653');
$codes = array('a', 'f');

// METHOD 1: run an xpath for each tag/code combination
$desiredfields = array();
foreach ($tags as $tag) {
    $desiredsubfields = array();
    foreach($codes as $code) {
        $subfields = $records->xpath("marc:datafield[@tag='$tag']/marc:subfield[@code='$code']");
        $desiredsubfields[$code] = (string) $subfields[0];
    }
    $desiredfields[$tag] = $desiredsubfields;
}

var_export($desiredfields);

// METHOD 2: create a single xpath expression that matches every subfield you want
// Then visit each subfield retrieving tag from parent
$tagexpr = implode(' or ', array_map(function($t){return "@tag='{$t}'";}, $tags));
$codeexpr = implode(' or ', array_map(function($c){return "@code='{$c}'";}, $codes));
$xpath = "marc:datafield[{$tagexpr}]/marc:subfield[{$codeexpr}]";

$desiredfields = array();
$subfields = $records->xpath($xpath);

foreach ($subfields as $subfield) {
    $datafield = $subfield->xpath('..');
    $datafieldcode = (string) $datafield[0]['tag'];
    $desiredfields[$datafieldcode][(string) $subfield['code']] = (string) $subfield;
}

var_export($desiredfields);