如何使用xpath和domdocument切断<div>中的部分html并将其存储为html字符串?</div>

时间:2012-01-27 21:55:34

标签: php xpath domdocument

我想切断html的某些部分,我可以通过使用XPath和DomDocument来解决它,但问题是我需要将结果作为html代码字符串。通常我会使用reg。 EXPR。为此,我不想做一个复杂的搜索模式,它将标记标记的开始和结束。

这是示例输入:

some html code before
<div>this <b>is</b> what I want</div>
some html after

和输出:

<div>this <b>is</b> what I want</div>

我试过这样的事情:

 subject = 'some html code before
<div>this <b>is</b> what I want</div>
some html after';

$doc = new DOMDocument();                   
$doc->loadHTML($subject);
$xpath = new DOMXpath($doc);
$result = $xpath->query("//div/*");
echo $result->saveHTML();

但我只得到错误:     调用未定义的方法DOMNodeList :: saveHTML()

有没有人知道如何使用DomDocument和XPath将结果作为html字符串?

4 个答案:

答案 0 :(得分:2)

感谢Gentleman指出我对访问子对象中不可用的方法的误解。但是行:

echo $doc->saveHTML($result->item(0));

仅生成警告(没有我想要的html sting)。幸运的是,我找到了另一个解决方案,现在是:

<?php
$subject = '<html>
    <head>
        <title>A very short ebook</title>
        <meta name="charset" value="utf-8" />
    </head>
    <body>
        <h1 class="bookTitle">A very short ebook</h1>
        <p style="text-align:right">Written by Kovid Goyal</p>
        <div class="introduction">
            <p>A very short ebook to demonstrate the use of XPath.</p>
        </div>

        <h2 class="chapter">Chapter One</h2>
        <p>This is a truly fascinating chapter.</p>

        <h2 class="chapter">Chapter Two</h2>
        <p>A worthy continuation of a fine tradition.</p>
    </body>
</html>';


$doc = new DOMDocument();                   
$doc->loadHTML($subject);

$xpath = new DOMXpath($doc);
$result = $xpath->query("//div");

//echo $doc->saveHTML($result->item(0));

echo domNodeList_to_string($result);

function domNodeList_to_string($DomNodeList) {
    $output = '';
    $doc = new DOMDocument;
    while ( $node = $DomNodeList->item($i) ) {
        // import node
        $domNode = $doc->importNode($node, true);
        // append node
        $doc->appendChild($domNode);
        $i++;
    }
    $output = $doc->saveHTML();
    $output = print_r($output, 1);
    // I added this because xml output and ajax do not like each others
    //$output = htmlspecialchars($output);
    return $output;
}
php>

所以,如果有一个像这样的查询:

$result = $xpath->query("//div");

然后将获得原始html字符串输出:

<div class="introduction">
        <p>A very short ebook to demonstrate the use of XPath.</p>
    </div>

如果查询是:

$result = $xpath->query("//p");

然后输出将是:

<p style="text-align:right">Written by Kovid Goyal</p><p>A very short ebook to demonstrate the use of XPath.</p><p>This is a truly fascinating chapter.</p><p>A worthy continuation of a fine tradition.</p>

有没有人知道更简单(嵌入在php中)方法来获得相同的结果?

答案 1 :(得分:1)

试试这个:

$subject = 'some html code before
<div>this <b>is</b> what I want</div>
some html after';

$doc = new DOMDocument();                   
$doc->loadHTML($subject);
$xpath = new DOMXpath($doc);
$result = $xpath->query("//div");
echo $doc->saveHTML($result->item(0)); //echoes what you want :)

saveHTML函数属于DOMDocument对象,你不能直接在节点上调用它(更不用说在NodeList上,这是查询返回的内容),但你是什么? > do可以将节点作为参数传递给它。

此外,您的查询错误:您想要的是div元素(即//div),而不是其子元素(//div/*)。

答案 2 :(得分:1)

根据DOMXPath::querydocs上的php手册文档,函数:

  

返回包含与给定XPath匹配的所有节点的DOMNodeList   表达。任何不返回节点的表达式都将返回一个   空DOMNodeList。

这意味着以下代码中的$result将是DOMNodeListdocs对象。因此,如果您想从内部获取单独的HTML代码,则需要使用DOMNodeList对象可用的方法。在这种情况下,item方法:

$result = $xpath->query("//div");
echo $doc->saveHTML($result->item(0));

$result->item(0)返回xpath查询创建的DOMNode中的第一个DOMNodeList

答案 3 :(得分:1)

试试这个:

$subject = 'some html code before<div>this <b>is</b> what I want</div>some html after';
$doc = new DOMDocument('1.0');                   
$doc->loadHTML($subject);
$xpath = new DOMXpath($doc);
$result = $xpath->query("//div");
$docSave = new DOMDocument('1.0');
foreach ( $result as $node ) {
    $domNode = $docSave->importNode($node, true);
    $docSave->appendChild($domNode);
}
echo $docSave->saveHTML();