解析XML:根据IDREF / ID

时间:2015-11-07 20:43:18

标签: php xml xml-parsing simplexml dtd

我一整天都在苦苦挣扎,实际上它可能非常简单......但是我可以完全熟悉PHP和XML的世界,所以可以真正做到一些帮助

我使用SimpleXML来解析我的数据并拥有两个二级组 - (年级列表)和(eplist)。我有(年)嵌套在里面(年级),它有一个属性" yid",在我的DTD中设置为ID。它还有(yearname)嵌套在(年)内,其中包含要显示为输出的更详细描述。我有(ep)嵌套在里面(eplist),属性" yearid" (直接与" yid"相关),在我的DTD中设置为IDREF。

基本上,当我解析(eplist)的数据时,我想使用(yearname)作为组头 - 使用yearid = yid> yearname作为路径。

我已经创建了一个我的数据示例,可能有助于更好地解释我的问题。

这是我的DTD:

<?xml encoding="UTF-8"?>

<!ELEMENT besteplist (yearlist,eplist)>

<!ELEMENT yearlist (year)+>
<!ELEMENT year (yearname)>
<!ATTLIST year
            yid ID #REQUIRED>
<!ELEMENT yearname (#PCDATA)>

<!ELEMENT eplist (ep)+>
<!ELEMENT ep (eptitle,eptnumber)>
<!ATTLIST ep
            eid ID #REQUIRED
            yearid IDREF #IMPLIED>
<!ELEMENT eptitle (#PCDATA)>
<!ELEMENT eptnumber (#PCDATA)>

这是我的XML:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE besteplist SYSTEM "example.dtd">
<besteplist>
    <yearlist>
        <year yid="y1">
            <yearname>1995, Season 1</yearname>
        </year>
        <year yid="y2">
            <yearname>1996, Season 2</yearname>
        </year>
        <year yid="y3">
            <yearname>1997, Season 3</yearname>
        </year>
    </yearlist>
    <eplist>
        <ep yearid="y1" eid="e1">
            <eptitle>The First Episode</eptitle>
            <eptnumber>1</eptnumber>
        </ep>
        <ep yearid="y2" eid="e2">
            <eptitle>Bla bla bla</eptitle>
            <eptnumber>21</eptnumber>
        </ep>
        <ep yearid="y2" eid="e3">
            <eptitle>Rar rar rar</eptitle>
            <eptnumber>39</eptnumber>
        </ep>
        <ep yearid="y2" eid="e4">
            <eptitle>Tra la la</eptitle>
            <eptnumber>45</eptnumber>
        </ep>
        <ep yearid="y3" eid="e5">
            <eptitle>Donkey</eptitle>
            <eptnumber>126</eptnumber>
        </ep>
    </eplist>
</besteplist>

以下是我希望输出显示的示例:

SEASON: 1995, Season 1

    EPISODE TITLE: The First Episode
    EPISODE NUMBER: 1

SEASON: 1996, Season 2

    EPISODE TITLE: Bla bla bla
    EPISODE NUMBER: 21

    EPISODE TITLE: Rar rar rar
    EPISODE NUMBER: 39

    EPISODE TITLE: Tra la la
    EPISODE NUMBER: 45

SEASON: 1997, Season 3

    EPISODE TITLE: Donkey
    EPISODE NUMBER: 126

我认为发布我已经尝试过的代码非常有用,因为它可能相当无用......我拥有设法做的是非常基础。一旦我解决了这个问题,我就可以进入下一阶段...格式化......

我没有以任何方式依赖于SimpleXML,所以如果有人能提出更有效的做事方式,那么我全心全意。

非常感谢任何花时间帮助我的人。 :)

萨姆

为了回应@michi,我一直坐着试图找出xpath并在线阅读各种语法/教程,似乎无法理解它。这就是我到目前为止所做的......但我已经注意到了xpath,因为它显然是错误的。

<?php
$xml=simplexml_load_file("example.xml") or die("Error: Cannot create object");

foreach($xml->yearlist->children() as $years) { 
    $xyid=$years[yid];
    echo "_____________________________________________<br>";
    echo "(yid= " . $xyid . " )<br>";
    echo "SEASON: " . $years->yearname . "<br>"; 
    echo "_____________________________________________<br>";
    foreach($xml->eplist->children() as $episodes) { 
    echo "EPISODE TITLE: " . $episodes->eptitle . "<br>"; 
    echo "EPISODE NUMBER: " . $episodes->eptnumber . "<br>"; 
    $xyearid=$episodes[yearid];
    echo "(yearid= " . $xyearid . " )<br>";
    // echo $xml->xpath('//year[@yid="$episodes[yearid]"]/yearname');
    echo "</p>"; 
    } 
}

?>

我希望你能引导我朝着正确的方向前进!

由于 萨姆

感谢帮助michi - 这绝对是朝着正确方向迈出的一步!

我试图想办法只显示季节名称一次......遇到迭代和数组但是对我来说它们看起来都太复杂了。是否可以在foreach命令中包含xpath?我想也许如果我在foreach季节中嵌套foreach剧集并使用xpath来匹配它可以工作的ID,但我似乎无法通过它来展示元素。我是在正确的轨道上吗?

<?php
$xml=simplexml_load_file("example.xml") or die("Error: Cannot create object");

foreach ($xml->yearlist->year as $season) {
    echo "SEASON: " . $season->yearname . PHP_EOL;
    foreach ($xml->xpath("//ep[@yearid='$season[yid]']")[0] as $episode) { 
        echo "EPISODE TITLE: " . $episode->eptitle . PHP_EOL;
        echo "EPISODE NUMBER: " . $episode->eptnumber . PHP_EOL; 
        echo PHP_EOL;
    }
}

?>

再次感谢!

2 个答案:

答案 0 :(得分:1)

您可以使用XSLT将XML重构为所需的格式。作为信息,XSLT是一种特殊用途的声明性编程语言,用于重构,重新设计样式,重新格式化XML文档以用于各种最终用途。几乎所有通用语言都维护着XSLT处理器:Java,C#,Python,Perl,VB,甚至是PHP。

XSLT脚本 (另存为下面将使用的.xsl文件)

<?xml version="1.0" ?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">  
<xsl:output method="xml" indent="yes"/>

<xsl:template match="besteplist">
  <besteplist>

     <xsl:for-each select="yearlist/year">
        <xsl:variable name="yearvar" select="@yid"/>
        SEASON: <xsl:value-of select="yearname"/>
        <xsl:for-each select="../../eplist/ep[@yearid=$yearvar]">      
            EPISODE TITLE: <xsl:value-of select="eptitle"/>
            EPISODE NUMEBR: <xsl:value-of select="eptnumber"/>
            <xsl:text>&#xa;</xsl:text>
        </xsl:for-each>
      </xsl:for-each>

  </besteplist>
</xsl:template>

</xsl:stylesheet>

PHP脚本

<?php   

// Set current directory
$cd = dirname(__FILE__);

// Load the XML source and XSLT file
$xml = new DOMDocument('1.0', 'UTF-8');
$xml->formatOutput = true;
$xml->preserveWhiteSpace = false;
$xml->load($cd.'/SeasonEpisodes.xml');

$xsl = new DOMDocument;
$xsl->load($cd.'/SeasonEpisodes.xsl');

// Configure transformer
$proc = new XSLTProcessor;
$proc->importStyleSheet($xsl);

// Transform XML source
$newXML = new DOMDocument;
$newXML = $proc->transformToXML($xml);

// Save output to file
$xmlfile = $cd.'/NewSeasonEpisodes.xml';
file_put_contents($xmlfile, $newXML);

?>

新的XML输出 (现在只是解析根节点数据)

<?xml version="1.0"?>
<besteplist>
        SEASON: 1995, Season 1      
            EPISODE TITLE: The First Episode
            EPISODE NUMEBR: 1

        SEASON: 1996, Season 2      
            EPISODE TITLE: Bla bla bla
            EPISODE NUMEBR: 21

            EPISODE TITLE: Rar rar rar
            EPISODE NUMEBR: 39

            EPISODE TITLE: Tra la la
            EPISODE NUMEBR: 45

        SEASON: 1997, Season 3      
            EPISODE TITLE: Donkey
            EPISODE NUMEBR: 126
</besteplist>

答案 1 :(得分:0)

你掌握了基本的SimpleXml技术,干得好。现在让我们开展工作:

  1. 我建议迭代<eplist>并仅回显所有<ep>

    $xml = simplexml_load_string($x); // assume XML in $x
    
    foreach ($xml->eplist->ep as $episode) { 
        echo $episode['yearid'] . PHP_EOL;
        echo "EPISODE TITLE: " . $episode->eptitle . PHP_EOL;
        echo "EPISODE NUMBER: " . $episode->eptnumber . PHP_EOL; 
        echo PHP_EOL;
    }
    

    PHP_EOL在不同平台上生成新行,请参阅When do I use the PHP constant "PHP_EOL"?

    在行动中看到它:https://eval.in/464970

    这看起来与你想要的相似,不是吗?

  2. 使用<ep> yearid属性作为密钥来访问并回显相应的<yearname>,并使用xpath()

    您的xpath表达式基本上是正确的,但需要进行一些更改:

    // old:
    echo $xml->xpath('//year[@yid="$episode[yearid]"]/yearname');
    
    // new:
    echo $xml->xpath("//year[@yid='$episode[yearid]']/yearname")[0];
    

    交换"',以便评估$episode。请注意,我在代码中将其名称从$episodes更改为$episodeWhat is the difference between single-quoted and double-quoted strings in PHP?

    xpath()返回arraySimpleXml元素,以访问我们需要使用[0]取消引用数组的1 st 值。< / p>

    当然,这段代码不是防错的,它不会检查数组是否为空等等。你需要将它添加到生产中,但这会使这些例子中的要点复杂化。

    echo $episode['yearid'] (...)替换为正确的xpath

    看到它有效:https://eval.in/464992

  3. up next:仅为属于该季节的1 st 剧集分组具有相同SEASON = echo SEASON的剧集。 (你的工作)

    <强>更新

    您发布了几乎完美的代码,请参阅我的评论。

    基本上,你有两个由yearid链接的表。 1集与1年相关,1年与许多剧集相关联。您可以通过迭代年份并选择链接的剧集(=您的最后一个代码示例)或迭代剧集并选择链接的年份(=我的代码示例)来实现它。

    这是在前面的例子中进行分组的方法:

    $xml = simplexml_load_string($x); // assume XML in $x
    $yid = "";
    
    foreach ($xml->eplist->ep as $episode) { 
    
        // check if last yearid is different from current yearid
        // only if yes, echo the yearname 
        if ($yid != (string)$episode['yearid']) {
            echo "SEASON: " . $xml->xpath("//year[@yid='$episode[yearid]']/yearname")[0] . PHP_EOL . PHP_EOL;
        }
        echo "  EPISODE TITLE: " . $episode->eptitle . PHP_EOL;
        echo "  EPISODE NUMBER: " . $episode->eptnumber . PHP_EOL . PHP_EOL; 
    
        // store current yearid in $yid for next iteration
        $yid = (string)$episode['yearid'];
    }
    

    注意:(string)注意评估是字符串而不是SimpleXml对象。

    输出:

    SEASON: 1995, Season 1
    
      EPISODE TITLE: The First Episode
      EPISODE NUMBER: 1
    
    SEASON: 1996, Season 2
    
      EPISODE TITLE: Bla bla bla
      EPISODE NUMBER: 21
    
      EPISODE TITLE: Rar rar rar
      EPISODE NUMBER: 39
    
      EPISODE TITLE: Tra la la
      EPISODE NUMBER: 45
    
    SEASON: 1997, Season 3
    
      EPISODE TITLE: Donkey
      EPISODE NUMBER: 126
    

    看到它有效:https://eval.in/465044

    进一步讨论:代码理所当然地认为<ep>节点已经在XML中分组。如果你在y3之后有一个<ep> y1 ...