解析未知的XML

时间:2014-03-10 11:50:38

标签: php xml simplexml

我已经制作了一个简单的工具,可以让您使用XML文件的URL填写输入字段。 它应该显示所有节点,以便用户可以将它们与数据库字段匹配,我已经为具有2个“主要”节点的XML文件工作。 XML文件的示例:

<foods>
    <food>
        <name>ravioli</name>
        <recipe>food.com/ravioli</recipe>
        <time>10 minutes</time>
    </food>
    <food>
        <name>ravioli</name>
        <recipe>food.com/ravioli</recipe>
        <time>10 minutes</time>
    </food>
</foods>

这会返回一个显示

的列表

name recipe time

问题是当有人想要使用没有2个“主要”节点的XML文件时。例如,它缺少<food>节点。在这种情况下,它将无法显示结果,因为我的PHP代码期望2而不是1个主要。

我的代码如下:

// Fetch the XML from the URL
if (!$xml = simplexml_load_file($_GET['url'])) {
    // The XML file could not be reached
    echo 'Error loading XML. Please check the URL.';
} else {
    // Parse through the XML and fetch the nodes
    $child = $xml->children();
    foreach($child->children() as $key => $value) {
        echo $key."<br>";
    }
}

有没有办法从任何XML文件获取我想要的节点,无论父节点的数量是多少?

1 个答案:

答案 0 :(得分:2)

您可以使用Xpath从XML DOM查询数据。可以使用DOMXpath :: evaluate()方法在PHP中访问它。第二个参数是上下文,因此您的表达式可以相对于另一个节点。将其转换为记录列表(对于数据库,csv,...)。将需要几个步骤。从一些引导程序开始:

$xml = <<<'XML'
<foods>
    <food>
        <name>ravioli 1</name>
        <recipe>food.com/ravioli-1</recipe>
        <time unit="minutes">10</time>
    </food>
    <food>
        <name>ravioli 2</name>
        <recipe>food.com/ravioli-2</recipe>
        <time unit="minutes">11</time>
    </food>
</foods>
XML;

$dom = new DOMDocument();
$dom->loadXml($xml);
$xpath = new DOMXpath($dom);

首先,我们需要定义哪个xml元素定义记录,然后定义哪些元素。

因此,让我们构建一个可能的记录路径和字段路径列表:

$paths = [];
$leafs = [];
foreach ($xpath->evaluate('//*|//@*') as $node) {
  $isPath = $xpath->evaluate('count(@*|*) > 0', $node);
  $isLeaf = !($xpath->evaluate('count(*) > 0', $node));
  $path = '';
  foreach ($xpath->evaluate('ancestor::*', $node) as $parent) {
    $path .= '/'.$parent->nodeName;
  }
  $path .= '/'.($node instanceOf DOMAttr ? '@' : '').$node->nodeName;
  if ($isLeaf) {
    $leafs[$path] = TRUE;
  }
  if ($isPath) {
    $paths[$path] = TRUE;
  }
}
$paths = array_keys($paths);
$leafs = array_keys($leafs);
var_dump($paths, $leafs);

输出:

array(3) {
  [0] =>
  string(6) "/foods"
  [1] =>
  string(11) "/foods/food"
  [2] =>
  string(16) "/foods/food/time"
}
array(4) {
  [0] =>
  string(16) "/foods/food/name"
  [1] =>
  string(18) "/foods/food/recipe"
  [2] =>
  string(16) "/foods/food/time"
  [3] =>
  string(22) "/foods/food/time/@unit"
}

接下来显示用户可能的记录路径。用户需要选择一个。知道记录路径,从叶子数组中构建可能的字段路径列表:

$path = '/foods/food';

$fieldLeafs = [];
$pathLength = strlen($path) + 1;
foreach ($leafs as $leaf) {
  if (0 === strpos($leaf, $path.'/')) {
    $fieldLeafs[] = substr($leaf, $pathLength);
  }
}
var_dump($fieldLeafs);

输出:

array(4) {
  [0] =>
  string(4) "name"
  [1] =>
  string(6) "recipe"
  [2] =>
  string(4) "time"
  [3] =>
  string(10) "time/@unit"
}

设置一些对话框,允许用户为每个字段选择路径。

$fieldDefinition = [
  'title' => 'name',
  'url' => 'recipe',
  'needed_time' => 'time',
  'time_unit' => 'time/@unit'
];

现在使用路径和映射来构建记录数组:

$result = [];
foreach ($xpath->evaluate($path) as $node) {
  $record = [];
  foreach ($fieldDefinition as $field => $expression) {
    $record[$field] = $xpath->evaluate(
      'string('.$expression.')',
      $node
    );
  }
  $result[] = $record;
}
var_dump($result);

输出:

array(2) {
  [0] =>
  array(4) {
    'title' =>
    string(9) "ravioli 1"
    'url' =>
    string(18) "food.com/ravioli-1"
    'needed_time' =>
    string(2) "10"
    'time_unit' =>
    string(7) "minutes"
  }
  [1] =>
  array(4) {
    'title' =>
    string(9) "ravioli 2"
    'url' =>
    string(18) "food.com/ravioli-2"
    'needed_time' =>
    string(2) "11"
    'time_unit' =>
    string(7) "minutes"
  }
}

完整示例可在以下网址找到:https://eval.in/118012

示例中的XML永远不会转换为通用数组。这样做意味着丢失信息和双重存储。所以不要。从XML中提取结构信息,让用户定义映射。使用Xpath提取数据并直接以结果格式存储它们。