使用XPath和PHP过滤XML文档

时间:2015-12-05 14:46:31

标签: php xml xpath

我正在尝试使用PHP和XPath提取XML数据。请考虑以下XML文档:

<?xml version="1.0" encoding="UTF-8"?>
<root>
    <channel>
        <item>
            <title>My Second Great Title</title>
            <link>http://server.com/content/my-second-great-title</link>
            <tag>vuluptate</tag>
            <tag>id</tag>
            <tag>cras</tag>
            <tag>pretium</tag>
            <tag>conubia</tag>
            <tag>libero</tag>
            <description>This is a second great description</description>
            <publishedAt>Sat, 08 Nov 2015 10:00:52 +0000</publishedAt>
            <isVisible>true</isVisible>
            <content>Ut luctus auctor varius. Donec vitae erat felis. Nam ac erat vulputate, consequat elit id, dictum urna. Vestibulum dignissim eget felis vitae tempor. Suspendisse molestie lectus at est accumsan, et porta sapien elementum. Vivamus pretium imperdiet nisl id consequat. Sed gravida bibendum odio, et vehicula nibh hendrerit eget. Cras sit amet semper sem. Vivamus non lorem sed ex fringilla malesuada consequat non arcu. Etiam nec sodales tortor. In scelerisque massa vitae purus suscipit consectetur. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Cras ultrices eros tortor, eu sollicitudin eros pellentesque sit amet. Integer rutrum velit eget libero efficitur, non auctor lorem rutrum. Vivamus porta dolor ut enim dapibus, nec rutrum nisi sagittis.</content>
        </item>
        <item>
            <title>My Great Title</title>
            <link>http://server.com/content/my-great-title</link>
            <tag>lorem</tag>
            <tag>ipsum</tag>
            <tag>arcu</tag>
            <tag>sic</tag>
            <description>This is a great description</description>
            <publishedAt>Sat, 08 Nov 2015 10:00:52 +0000</publishedAt>
            <isVisible>true</isVisible>
            <content>Praesent consectetur, dolor non vehicula ultrices, nisl libero feugiat ligula, ut faucibus metus arcu et dui. Curabitur eleifend feugiat posuere. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec cursus blandit lorem, ullamcorper vestibulum massa molestie non. Maecenas erat enim, pretium eget velit dapibus, consequat placerat eros. Nam vulputate nisi at urna gravida accumsan. Fusce id ultrices nunc. Aenean varius quam in tincidunt cursus. Quisque sed arcu est. Etiam dignissim, neque at maximus feugiat, turpis nunc sollicitudin eros, et lobortis enim dui sed felis. Nulla rhoncus diam porttitor ullamcorper imperdiet.</content>
        </item>
        <item>
            <title>My Title</title>
            <link>http://server.com/content/my-title</link>
            <tag>auctor</tag>
            <tag>felis</tag>
            <description>This is a simple description</description>
            <publishedAt>Sat, 05 Nov 2015 16:07:23 +0000</publishedAt>
            <isVisible>true</isVisible>
            <content>Ut luctus auctor varius. Donec vitae erat felis. Nam ac erat vulputate, consequat elit id, dictum urna. Vestibulum dignissim eget felis vitae tempor. Suspendisse molestie lectus at est accumsan, et porta sapien elementum. Vivamus pretium imperdiet nisl id consequat. Sed gravida bibendum odio, et vehicula nibh hendrerit eget. Cras sit amet semper sem. Vivamus non lorem sed ex fringilla malesuada consequat non arcu. Etiam nec sodales tortor. In scelerisque massa vitae purus suscipit consectetur. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Cras ultrices eros tortor, eu sollicitudin eros pellentesque sit amet. Integer rutrum velit eget libero efficitur, non auctor lorem rutrum. Vivamus porta dolor ut enim dapibus, nec rutrum nisi sagittis.</content>
        </item>
    </channel>
</root>

到目前为止,我一直在尝试使用以下表达式:

//root/channel/item/title|//root/channel/item/link|//root/channel/item/tag

并且很遗憾,<item>标记在应用表达式后会丢失,那么有没有办法过滤保留项标记的数据?

2 个答案:

答案 0 :(得分:1)

您的XPath表达式是正确的。它提供了正确的输出 - 这意味着,你要求的。您全局(//)选择titlelinktag元素节点。这就是你从这个表达中得到的东西。您选择了任何item元素节点。

要为上述三个标记过滤每个项目节点,您必须遍历所有item个节点并过滤其子节点(并可能重建{{ 1}} - 元素)。不全局过滤所有三个元素(// ... | // ... | // ...。)。

由于您还没有给出PHP代码段,我将在XSLT中说明这一点:

你做了什么:

item

应该(可能)做什么:

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
 <xsl:output method="xml" version="1.0" encoding="UTF-8"/>
  <xsl:template match="/">
   <xsl:copy-of select="//root/channel/item/title|//root/channel/item/link|//root/channel/item/tag" />
  </xsl:template>
 </xsl:stylesheet>

答案 1 :(得分:1)

在需要重构整个XML文档时,请考虑XSLT解决方案。与其他通用语言一样,PHP维护着一个XSLT处理器。基本上,您需要写出不需要的节点。下面运行身份转换以按原样复制整个文档,然后将空模板匹配写入不需要的节点。我包括两个等效的解决方案。

XSLT 脚本(另存为.xsl或.xslt文件)

<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output version="1.0" encoding="UTF-8" indent="yes" />
<xsl:strip-space elements="*"/>

  <!-- Identity Transform -->
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>

  <!-- SOLUTION 1-->
  <!-- <xsl:template match="description|publishedAt|isVisible|content"/> -->

  <!-- SOLUTION 2-->
  <xsl:template match="item/*[not(name()='title' or name()='link' or name()='tag')]"/>

</xsl:transform>

PHP 脚本

<?php

// Load the XML source and XSLT file
$doc = new DOMDocument();    
$doc->load('Input.xml');

$xsl = new DOMDocument;
$xsl->load('XSLTScript.xsl');

// Configure the transformer
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl); 

// Transform XML source
$newXml = $proc->transformToXML($doc);

// Save output to file
$xmlfile = 'Output.xml';
file_put_contents($xmlfile, $newXml);

?>

<强>输出

<?xml version="1.0" encoding="UTF-8"?>
<root>
  <channel>
    <item>
      <title>My Second Great Title</title>
      <link>http://server.com/content/my-second-great-title</link>
      <tag>vuluptate</tag>
      <tag>id</tag>
      <tag>cras</tag>
      <tag>pretium</tag>
      <tag>conubia</tag>
      <tag>libero</tag>
    </item>
    <item>
      <title>My Great Title</title>
      <link>http://server.com/content/my-great-title</link>
      <tag>lorem</tag>
      <tag>ipsum</tag>
      <tag>arcu</tag>
      <tag>sic</tag>
    </item>
    <item>
      <title>My Title</title>
      <link>http://server.com/content/my-title</link>
      <tag>auctor</tag>
      <tag>felis</tag>
    </item>
  </channel>
</root>