使用类名,如何从有效的HTML文档中提取div元素及其innerHTML?

时间:2018-09-01 05:42:17

标签: php regex html-parsing domdocument preg-match-all

HTML标记

    <div class="entry-content entry-excerpt clearfix">
        <div class="simplesocialbuttons simplesocial-round-icon simplesocialbuttons_inline simplesocialbuttons-align-centered post-1 post  simplesocialbuttons-inline-no-animation simplesocialbuttons-inline-in">
<button class="simplesocial-fb-share" target="_blank" data-href="https://www.facebook.com/sharer/sharer.php?u=http://localhost/wp/hello-world/" onclick="javascript:window.open(this.dataset.href, '', 'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600');return false;"><span class="simplesocialtxt">Facebook </span> </button>
<button class="simplesocial-msng-share" onclick="javascript:window.open( this.dataset.href, '_blank',  'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600' );return false;" data-href="http://www.facebook.com/dialog/send?app_id=891268654262273&amp;redirect_uri=http%3A%2F%2Flocalhost%2Fwp%2Fhello-world%2F&amp;link=http%3A%2F%2Flocalhost%2Fwp%2Fhello-world%2F&amp;display=popup"><span class="simplesocialtxt">Messenger</span></button> 
<button onclick="javascript:window.open(this.dataset.href, '_blank' );return false;" class="simplesocial-whatsapp-share" data-href="https://api.whatsapp.com/send?text=http://localhost/wp/hello-world/"><span class="simplesocialtxt">WhatsApp</span></button>
<button class="simplesocial-tumblr-share" data-href="http://tumblr.com/widgets/share/tool?canonicalUrl=http%3A%2F%2Flocalhost%2Fwp%2Fhello-world%2F" onclick="javascript:window.open(this.dataset.href, '', 'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600');return false;"><span class="simplesocialtxt">Tumblr</span> </button>
<button class="simplesocial-twt-share" data-href="https://twitter.com/share?text=Hello+world%21&amp;url=http://localhost/wp/hello-world/" rel="nofollow" onclick="javascript:window.open(this.dataset.href, '', 'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600');return false;"><span class="simplesocialtxt">Twitter</span> </button>
<button onclick="javascript:window.location.href = this.dataset.href;return false;" class="simplesocial-email-share" data-href="mailto:?subject=Hello+world%21&amp;body=http://localhost/wp/hello-world/"><span class="simplesocialtxt">Email</span></button>
<button class="simplesocial-gplus-share" data-href="https://plus.google.com/share?url=http://localhost/wp/hello-world/" onclick="javascript:window.open(this.dataset.href, '', 'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600');return false;"><span class="simplesocialtxt">Google+</span></button>
<button target="popup" class="simplesocial-linkedin-share" data-href="https://www.linkedin.com/cws/share?url=http://localhost/wp/hello-world/" onclick="javascript:window.open(this.dataset.href, '', 'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600');return false;"><span class="simplesocialtxt">LinkedIn</span></button>
<button rel="nofollow" class="simplesocial-pinterest-share" onclick="var e=document.createElement('script');e.setAttribute('type','text/javascript');e.setAttribute('charset','UTF-8');e.setAttribute('src','//assets.pinterest.com/js/pinmarklet.js?r='+Math.random()*99999999);document.body.appendChild(e);return false;"><span class="simplesocialtxt">Pinterest</span></button>
<button class="simplesocial-reddit-share" data-href="https://reddit.com/submit?url=http://localhost/wp/hello-world/&amp;title=Hello+world%21" onclick="javascript:window.open(this.dataset.href, '', 'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600');return false;"><span class="simplesocialtxt">Reddit</span> </button>
</div>

<p>Wel&shy;come to Word&shy;Press.<br>
<img class="alignnone size-medium wp-image-8" src="http://localhost/wp/wp-content/uploads/2018/07/DZV_1UkX4AEUlTZ-300x202.jpg" alt="" width="300" height="202" srcset="http://localhost/wp/wp-content/uploads/2018/07/DZV_1UkX4AEUlTZ-300x202.jpg 300w, http://localhost/wp/wp-content/uploads/2018/07/DZV_1UkX4AEUlTZ-768x517.jpg 768w, http://localhost/wp/wp-content/uploads/2018/07/DZV_1UkX4AEUlTZ.jpg 789w" sizes="(max-width: 300px) 100vw, 300px"><br>
This is your first post. Edit or delete it, then start writ&shy;ing!</p>

            <a href="http://localhost/wp/hello-world/" class="more-link">Read more</a>

            </div>

这是我的php:

preg_match_all( '/<[^>]*class="[^"]*\bsimplesocialbuttons\b[^"]*"[^>]*>/', $original_text, $matches );

我当前的结果:

<div class="simplesocialbuttons simplesocial-round-icon  simplesocialbuttons_inline simplesocialbuttons-align-centered post-1 post   simplesocialbuttons-inline-no-animation simplesocialbuttons-inline-in">

我想要的结果:

<div class="simplesocialbuttons simplesocial-round-icon  simplesocialbuttons_inline simplesocialbuttons-align-centered post-1 post   simplesocialbuttons-inline-no-animation simplesocialbuttons-inline-in">
all content inside div
</div>

我也尝试搜索解决方案,但没有找到任何正确的解决方案。

1 个答案:

答案 0 :(得分:0)

使用DomDocument和Xpath使您的过程非常稳定和准确。即使将来html结构稍作更改,与正则表达式相比,您也有最大的机会维持所需的输出。

查询明细:

//                                            #from any level in the document
div                                           #match any div tag
[contains(@class, 'simplesocialbuttons')]     #which has a class called simplesocialbuttons

隔离目标节点后,$dom->saveHTML($node)将获取div和您要搜索的“ innerHTML”。

代码:(Demo

$html = <<<HTML
<div class="entry-content entry-excerpt clearfix">
        <div class="simplesocialbuttons simplesocial-round-icon simplesocialbuttons_inline simplesocialbuttons-align-centered post-1 post  simplesocialbuttons-inline-no-animation simplesocialbuttons-inline-in">
<button class="simplesocial-fb-share" target="_blank" data-href="https://www.facebook.com/sharer/sharer.php?u=http://localhost/wp/hello-world/" onclick="javascript:window.open(this.dataset.href, '', 'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600');return false;"><span class="simplesocialtxt">Facebook </span> </button>
<button class="simplesocial-msng-share" onclick="javascript:window.open( this.dataset.href, '_blank',  'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600' );return false;" data-href="http://www.facebook.com/dialog/send?app_id=891268654262273&amp;redirect_uri=http%3A%2F%2Flocalhost%2Fwp%2Fhello-world%2F&amp;link=http%3A%2F%2Flocalhost%2Fwp%2Fhello-world%2F&amp;display=popup"><span class="simplesocialtxt">Messenger</span></button> 
<button onclick="javascript:window.open(this.dataset.href, '_blank' );return false;" class="simplesocial-whatsapp-share" data-href="https://api.whatsapp.com/send?text=http://localhost/wp/hello-world/"><span class="simplesocialtxt">WhatsApp</span></button>
<button class="simplesocial-tumblr-share" data-href="http://tumblr.com/widgets/share/tool?canonicalUrl=http%3A%2F%2Flocalhost%2Fwp%2Fhello-world%2F" onclick="javascript:window.open(this.dataset.href, '', 'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600');return false;"><span class="simplesocialtxt">Tumblr</span> </button>
<button class="simplesocial-twt-share" data-href="https://twitter.com/share?text=Hello+world%21&amp;url=http://localhost/wp/hello-world/" rel="nofollow" onclick="javascript:window.open(this.dataset.href, '', 'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600');return false;"><span class="simplesocialtxt">Twitter</span> </button>
<button onclick="javascript:window.location.href = this.dataset.href;return false;" class="simplesocial-email-share" data-href="mailto:?subject=Hello+world%21&amp;body=http://localhost/wp/hello-world/"><span class="simplesocialtxt">Email</span></button>
<button class="simplesocial-gplus-share" data-href="https://plus.google.com/share?url=http://localhost/wp/hello-world/" onclick="javascript:window.open(this.dataset.href, '', 'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600');return false;"><span class="simplesocialtxt">Google+</span></button>
<button target="popup" class="simplesocial-linkedin-share" data-href="https://www.linkedin.com/cws/share?url=http://localhost/wp/hello-world/" onclick="javascript:window.open(this.dataset.href, '', 'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600');return false;"><span class="simplesocialtxt">LinkedIn</span></button>
<button rel="nofollow" class="simplesocial-pinterest-share" onclick="var e=document.createElement('script');e.setAttribute('type','text/javascript');e.setAttribute('charset','UTF-8');e.setAttribute('src','//assets.pinterest.com/js/pinmarklet.js?r='+Math.random()*99999999);document.body.appendChild(e);return false;"><span class="simplesocialtxt">Pinterest</span></button>
<button class="simplesocial-reddit-share" data-href="https://reddit.com/submit?url=http://localhost/wp/hello-world/&amp;title=Hello+world%21" onclick="javascript:window.open(this.dataset.href, '', 'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600');return false;"><span class="simplesocialtxt">Reddit</span> </button>
</div>

<p>Wel&shy;come to Word&shy;Press.<br>
<img class="alignnone size-medium wp-image-8" src="http://localhost/wp/wp-content/uploads/2018/07/DZV_1UkX4AEUlTZ-300x202.jpg" alt="" width="300" height="202" srcset="http://localhost/wp/wp-content/uploads/2018/07/DZV_1UkX4AEUlTZ-300x202.jpg 300w, http://localhost/wp/wp-content/uploads/2018/07/DZV_1UkX4AEUlTZ-768x517.jpg 768w, http://localhost/wp/wp-content/uploads/2018/07/DZV_1UkX4AEUlTZ.jpg 789w" sizes="(max-width: 300px) 100vw, 300px"><br>
This is your first post. Edit or delete it, then start writ&shy;ing!</p>

            <a href="http://localhost/wp/hello-world/" class="more-link">Read more</a>

            </div>
HTML;

$dom = new DOMDocument; 
$dom->loadHTML($html);
$xpath = new DOMXPath($dom);
// using a loop in case there are multiple occurrences
foreach ($xpath->query("//div[contains(@class, 'simplesocialbuttons')]") as $node) {
  $result[] = $dom->saveHTML($node);
}
var_export($result);

输出:

array (
  0 => '<div class="simplesocialbuttons simplesocial-round-icon simplesocialbuttons_inline simplesocialbuttons-align-centered post-1 post  simplesocialbuttons-inline-no-animation simplesocialbuttons-inline-in">
<button class="simplesocial-fb-share" target="_blank" data-href="https://www.facebook.com/sharer/sharer.php?u=http://localhost/wp/hello-world/" onclick="javascript:window.open(this.dataset.href, \'\', \'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600\');return false;"><span class="simplesocialtxt">Facebook </span> </button>
<button class="simplesocial-msng-share" onclick="javascript:window.open( this.dataset.href, \'_blank\',  \'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600\' );return false;" data-href="http://www.facebook.com/dialog/send?app_id=891268654262273&amp;redirect_uri=http%3A%2F%2Flocalhost%2Fwp%2Fhello-world%2F&amp;link=http%3A%2F%2Flocalhost%2Fwp%2Fhello-world%2F&amp;display=popup"><span class="simplesocialtxt">Messenger</span></button> 
<button onclick="javascript:window.open(this.dataset.href, \'_blank\' );return false;" class="simplesocial-whatsapp-share" data-href="https://api.whatsapp.com/send?text=http://localhost/wp/hello-world/"><span class="simplesocialtxt">WhatsApp</span></button>
<button class="simplesocial-tumblr-share" data-href="http://tumblr.com/widgets/share/tool?canonicalUrl=http%3A%2F%2Flocalhost%2Fwp%2Fhello-world%2F" onclick="javascript:window.open(this.dataset.href, \'\', \'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600\');return false;"><span class="simplesocialtxt">Tumblr</span> </button>
<button class="simplesocial-twt-share" data-href="https://twitter.com/share?text=Hello+world%21&amp;url=http://localhost/wp/hello-world/" rel="nofollow" onclick="javascript:window.open(this.dataset.href, \'\', \'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600\');return false;"><span class="simplesocialtxt">Twitter</span> </button>
<button onclick="javascript:window.location.href = this.dataset.href;return false;" class="simplesocial-email-share" data-href="mailto:?subject=Hello+world%21&amp;body=http://localhost/wp/hello-world/"><span class="simplesocialtxt">Email</span></button>
<button class="simplesocial-gplus-share" data-href="https://plus.google.com/share?url=http://localhost/wp/hello-world/" onclick="javascript:window.open(this.dataset.href, \'\', \'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600\');return false;"><span class="simplesocialtxt">Google+</span></button>
<button target="popup" class="simplesocial-linkedin-share" data-href="https://www.linkedin.com/cws/share?url=http://localhost/wp/hello-world/" onclick="javascript:window.open(this.dataset.href, \'\', \'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600\');return false;"><span class="simplesocialtxt">LinkedIn</span></button>
<button rel="nofollow" class="simplesocial-pinterest-share" onclick="var e=document.createElement(\'script\');e.setAttribute(\'type\',\'text/javascript\');e.setAttribute(\'charset\',\'UTF-8\');e.setAttribute(\'src\',\'//assets.pinterest.com/js/pinmarklet.js?r=\'+Math.random()*99999999);document.body.appendChild(e);return false;"><span class="simplesocialtxt">Pinterest</span></button>
<button class="simplesocial-reddit-share" data-href="https://reddit.com/submit?url=http://localhost/wp/hello-world/&amp;title=Hello+world%21" onclick="javascript:window.open(this.dataset.href, \'\', \'menubar=no,toolbar=no,resizable=yes,scrollbars=yes,height=600,width=600\');return false;"><span class="simplesocialtxt">Reddit</span> </button>
</div>',
)

编辑:要删除目标节点及其innerHTML ...

代码:(Demo

foreach ($xpath->query("//div[contains(@class, 'simplesocialbuttons')]") as $node) {
    $node->parentNode->removeChild($node);
}
echo $dom->saveHTML();

输出:

<div class="entry-content entry-excerpt clearfix">


<p>Wel&shy;come to Word&shy;Press.<br>
<img class="alignnone size-medium wp-image-8" src="http://localhost/wp/wp-content/uploads/2018/07/DZV_1UkX4AEUlTZ-300x202.jpg" alt="" width="300" height="202" srcset="http://localhost/wp/wp-content/uploads/2018/07/DZV_1UkX4AEUlTZ-300x202.jpg 300w, http://localhost/wp/wp-content/uploads/2018/07/DZV_1UkX4AEUlTZ-768x517.jpg 768w, http://localhost/wp/wp-content/uploads/2018/07/DZV_1UkX4AEUlTZ.jpg 789w" sizes="(max-width: 300px) 100vw, 300px"><br>
This is your first post. Edit or delete it, then start writ&shy;ing!</p>

            <a href="http://localhost/wp/hello-world/" class="more-link">Read more</a>

            </div>