从异常格式的xml文件中检索数据

时间:2015-03-18 15:36:54

标签: php xml

我是xml文件的新手,我想从xml文件中提取信息。为此,我按照一个示例介绍了如何从xml文件中的标记中检索值。

xml文件:

<?xml version="1.0"?>
<moldb>
  <molecule>
      <name>Alanine</name>
      <symbol>ala</symbol>
      <code>A</code>
      <type>hydrophobic</type>
  </molecule>

  <molecule>
      <name>Lysine</name>
      <symbol>lys</symbol>
      <code>K</code>
      <type>charged</type>
  </molecule>
</moldb> 

和提取数据的代码:

<?php

class AminoAcid {
    var $name;  // aa name
    var $symbol;    // three letter symbol
    var $code;  // one letter code
    var $type;  // hydrophobic, charged or neutral

    function AminoAcid ($aa)
    {
        foreach ($aa as $k=>$v)
            $this->$k = $aa[$k];
    }
}

function readDatabase($filename)
{
    // read the XML database of aminoacids
    $data = implode("", file($filename));
    $parser = xml_parser_create();
    xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, 0);
    xml_parser_set_option($parser, XML_OPTION_SKIP_WHITE, 1);
    xml_parse_into_struct($parser, $data, $values, $tags);
    xml_parser_free($parser);

    // loop through the structures
    foreach ($tags as $key=>$val) {
        if ($key == "molecule") {
            $molranges = $val;
            // each contiguous pair of array entries are the
            // lower and upper range for each molecule definition
            for ($i=0; $i < count($molranges); $i+=2) {
                $offset = $molranges[$i] + 1;
                $len = $molranges[$i + 1] - $offset;
                $tdb[] = parseMol(array_slice($values, $offset, $len));
            }
        } else {
            continue;
        }
    }
    return $tdb;
}

function parseMol($mvalues)
{
    for ($i=0; $i < count($mvalues); $i++) {
        $mol[$mvalues[$i]["tag"]] = $mvalues[$i]["value"];
    }
    return new AminoAcid($mol);
}

$db = readDatabase("moldb.xml");
echo "** Database of AminoAcid objects:\n";
print("<pre>".print_r($db,true)."</pre>");
?>

问题是我的xml文件看起来像这些(对不起,我不能像第一个那样缩进):

-<section label="Personal Information" id="f589cbc028c64fdaa783da01647e5e3c">


-<section label="Identification" id="2687e70e5d45487c93a8a02626543f64" recordId="760243c6296146c9a380a52600262dd1">


-<field label="Title" id="ee8beaea41f049d8bcfadfbfa89ac09e">

<lov id="00000000000000000000000000000321">Professor</lov>

</field>


-<field label="Family Name" id="5c6f17e8a67241e19667815a9e95d9d0">

<value type="String">Kpodjedo</value>

</field>


+<field label="First Name" id="98ad36fee26a4d6b8953ea764f4fed04">


-<field label="Middle Name" id="4ca83c1aaa6a42a78eac0290368e70f3">

<value type="String">Jean-Luc</value>

</field>


-<field label="Previous Family Name" id="84e9fa08f7334db79ed5310e5f7a961b">

<value type="String"/>

</field>


-<field label="Previous First Name" id="0fb359a7d809457d9392bb1ca577f1b3">

<value type="String"/>

</field>


-<field label="Date of Birth" id="41ed5ea3ae974428b3fcb592161b6423">

<value type="MonthDay" format="MM/dd">10/16</value>

</field>


+<field label="Sex" id="3d258d8ceb174d3eb2ae1258a780d91b">


-<field label="Designated Group" id="d51292265505483b9424e0e3dc9878a0">

<lov id="00000000000000000000000000000080">Visible Minority</lov>

</field>


-<field label="Correspondence language" id="2b72a344523c467da0c896656b5290c0">

<lov id="00000000000000000000000000000054">English</lov>

</field>


+<field label="Canadian Residency Status" id="3432b76fe25844b4a9cb993256c6b61b">

<field label="Applied for Permanent Residency?" id="d61c6d30955b46af84b73cdc43f3ecdf"/>


-<field label="Permanent Residency Start Date" id="f5d3c8bea17f45af976e596a6a6e78c8">

<value type="Date" format="yyyy-MM-dd">2008-03-09</value>

</field>


-<section label="Country of Citizenship" id="5e7109632d874dc285c0f562d6c2e7d6" recordId="fd125512efad4e2dbbf23c93383741c4">


+<field label="Country of Citizenship" id="29d628e7bc6941bca2deb936f120fe88">

</section>

</section>

所以格式不一样。第一个太容易了,因为子标签不一样,但在第二个标签中,标签都是“字段”。我希望:我想找到一个解决方案,让我可以通过标签的名称访问标签。我试着理解,所以如果不清楚,我会做更多。

0 个答案:

没有答案