从包含html代码的字符串中提取html部分

时间:2011-11-06 12:07:40

标签: php regex joomla

我会从字符串中提取一些部分html代码。 此字符串从php joomla页面返回。 该字符串包含如下代码:

<!-- JoomlaWorks "Disqus Comment System for Joomla!" Plugin (v2.2) starts here -->

<div class="itp-fshare-floating" id="itp-fshare" style="position:fixed; top:30px !important; left:50px !important;">
</div>
<p>
    <span class="easy_img_caption"  style="display:inline-block;line-height:0.5;vertical-align:top;background-color:#F2F2F2;text-align:left;width:180px;float:left;margin:0px 10px;">

        <a href="/joomla/index.php?option=com_content&view=article&id=13:11111&catid=1:guide-sui-serivzi-cloud-computing&Itemid=3">
            <img src="/joomla/plugins/content/imagesresizecache/441a27b2a4d64b487a8e213a94f6466d.jpeg" border="0" alt="1" title="1"  style="width:180px; height:150px; ;margin:0;" />
        </a>
        <span class="easy_img_caption_inner" style="display:inline-block;line-height:normal;color:#000000;font-size:8pt;font-weight:normal;font-style:normal;padding:4px 8px;margin:0px;">1

        </span>
    </span>

    11111111111111111111111111111111111
</p>

<!-- Disqus comments counter and anchor link -->

<a class="jwDisqusListingCounterLink" href="http://clouderize.it/joomla/index.php?option=com_content&view=article&id=13:11111&catid=1:guide-sui-serivzi-cloud-computing&Itemid=3#disqus_thread" title="Add a comment">
    Add a comment
</a>

<!-- JoomlaWorks "Disqus Comment System for Joomla!" Plugin (v2.2) ends here -->

我会提取这一部分:

    <span class="easy_img_caption"  style="display:inline-block;line-height:0.5;vertical-align:top;background-color:#F2F2F2;text-align:left;width:180px;float:left;margin:0px 10px;">

        <a href="/joomla/index.php?option=com_content&view=article&id=13:11111&catid=1:guide-sui-serivzi-cloud-computing&Itemid=3">
            <img src="/joomla/plugins/content/imagesresizecache/441a27b2a4d64b487a8e213a94f6466d.jpeg" border="0" alt="1" title="1"  style="width:180px; height:150px; ;margin:0;" />
        </a>
        <span class="easy_img_caption_inner" style="display:inline-block;line-height:normal;color:#000000;font-size:8pt;font-weight:normal;font-style:normal;padding:4px 8px;margin:0px;">1

        </span>
    </span>

我该怎么办? 非常感谢。

MODIFY1:

$content="<html><head></head><body>".($this->item->text)."</body></html>";
        $dom = new DOMDocument();
        $dom->loadHTML($content);

        $xpath = new DOMXPath($dom);

        $tags = $xpath->query('//span[@class="easy_img_caption"]/');
        print_r($tags);

MODIFY2: 使用此代码:

$content="<html><head></head><body>".($this->item->text)."</body></html>";
        $content=($this->item->text);
        $dom = new DOMDocument();
        $dom->loadHTML($content);

        $xpath = new DOMXPath($dom);

        $tags = $xpath->query('//span[@class="easy_img_caption"]');
        //echo "<textarea>".print_r($dom->saveXml($tags->item(0)))."</textarea>";
        foreach ($tags as $tag) {
            $innerHTML = '';
            $children = $tag->childNodes;
            foreach ($children as $child) {
                $tmp_doc = new DOMDocument();
                $tmp_doc->appendChild($tmp_doc->importNode($child,true));       
                $innerHTML .= $tmp_doc->saveHTML();
            }

            echo $innerHTML;

我回来了:

<a href="/joomla/index.php?option=com_content&view=article&id=13:11111&catid=1:guide-sui-serivzi-cloud-computing&Itemid=3">
    <img src="/joomla/plugins/content/imagesresizecache/441a27b2a4d64b487a8e213a94f6466d.jpeg" border="0" alt="1" title="1" style="width:180px; height:150px; ;margin:0;">
</a>
<span class="easy_img_caption_inner" style="display:inline-block;line-height:normal;color:#000000;font-size:8pt;font-weight:normal;font-style:normal;padding:4px 8px;margin:0px;">1</span>

问题是我还想要上一个范围:

<span class="easy_img_caption"  style="display:inline-block;line-height:0.5;vertical-align:top;background-color:#F2F2F2;text-align:left;width:180px;float:left;margin:0px 10px;">

我在xpath查询时需要做哪些修改?

再次感谢。

1 个答案:

答案 0 :(得分:2)

以下是使用PHPQuery库http://code.google.com/p/phpquery/测试字符串的解决方案:

<?php

require('phpQuery/phpQuery.php');

$testString =

'<!-- JoomlaWorks "Disqus Comment System for Joomla!" Plugin (v2.2) starts here -->

<div class="itp-fshare-floating" id="itp-fshare" style="position:fixed; top:30px !        important; left:50px !important;">
</div>
<p>
<span class="easy_img_caption"  style="display:inline-block;line-height:0.5;vertical-align:top;background-color:#F2F2F2;text-align:left;width:180px;float:left;margin:0px 10px;">

    <a href="/joomla/index.php?option=com_content&view=article&id=13:11111&catid=1:guide-sui-serivzi-cloud-computing&Itemid=3">
        <img src="/joomla/plugins/content/imagesresizecache/441a27b2a4d64b487a8e213a94f6466d.jpeg" border="0" alt="1" title="1"  style="width:180px; height:150px; ;margin:0;" />
    </a>
    <span class="easy_img_caption_inner" style="display:inline-block;line-height:normal;color:#000000;font-size:8pt;font-weight:normal;font-style:normal;padding:4px 8px;margin:0px;">1

    </span>
</span>

11111111111111111111111111111111111
</p>

<!-- Disqus comments counter and anchor link -->

<a class="jwDisqusListingCounterLink" href="http://clouderize.it/joomla/index.php?option=com_content&view=article&id=13:11111&catid=1:guide-sui-serivzi-cloud-computing&Itemid=3#disqus_thread" title="Add a comment">
Add a comment
</a>

<!-- JoomlaWorks "Disqus Comment System for Joomla!" Plugin (v2.2) ends here -->';

$doc = phpQuery::newDocument($testString);

$extraction=pq('.easy_img_caption:eq(0)')->htmlOuter();

echo  $extraction;

/* outputs
<span class="easy_img_caption" style="display:inline-block;line-height:0.5;vertical-    align:top;background-color:#F2F2F2;text-align:left;width:180px;float:left;margin:0px 10px;">

    <a href="/joomla/index.php?option=com_content&amp;view=article&amp;id=13:11111&amp;catid=1:guide-sui-serivzi-cloud-computing&amp;Itemid=3">
        <img src="/joomla/plugins/content/imagesresizecache/441a27b2a4d64b487a8e213a94f6466d.jpeg" border="0" alt="1" title="1" style="width:180px; height:150px; ;margin:0;"></a>
    <span class="easy_img_caption_inner" style="display:inline-block;line-height:normal;color:#000000;font-size:8pt;font-weight:normal;font-style:normal;padding:4px 8px;margin:0px;">1

    </span>
</span>
*/

?>