获取带有标识符的html标记之间的内容

时间:2016-09-17 10:17:20

标签: php html text preg-match-all

我有这些span标签:

<div>
<span style="background: url('/wp-content/themes/minimum-child/img/address.png') 0px 2px no-repeat; padding-left: 20px;">CONTENT 1</span>
<span style="background: url('/wp-content/themes/minimum-child/img/email.png') 0px 2px no-repeat; padding-left: 20px;"><a href="mailto:post@post.com">CONTENT 2</a></span>
<span style="background: url('/wp-content/themes/minimum-child/img/tel.png') 0px 2px no-repeat; padding-left: 20px;">CONTENT 3</span>
</div>

我需要获取跨度之间的内容,但我需要将内容分为单个变量$address$email$phone$web等。很明显,我可以使用背景图像的名称作为模式,因为图像的名称仍然相同(address.png,email.png等等。)

到目前为止,我认为有必要使用preg_match_all函数,我已经尝试过了,但到目前为止我没有成功。

我尝试过(获取地址$address变量):

$url="'/wp-content/themes/minimum-child/img/address.png'";
$tag='span style="background: url('.$url.')';
$matches=array();
$pattern = "/<$tag ?.*>(.*)<\/span>/";
preg_match($pattern, $htmlcontent, $matches);
$address=$matches[1];

不幸的是,它不起作用。你知道如何实现它吗?

1 个答案:

答案 0 :(得分:0)

通常说用正则表达式解析html时会出现问题 - 所以我选择使用DOMDocument来帮助处理html片段的简单方法 - 然后你可以使用正则表达式如果需要,可以进一步完善一些结果。

$html='
<div>
    <span style="background: url(\'/wp-content/themes/minimum-child/img/address.png\') 0px 2px no-repeat; padding-left: 20px;">CONTENT 1</span>
    <span style="background: url(\'/wp-content/themes/minimum-child/img/email.png\') 0px 2px no-repeat; padding-left: 20px;"><a href="mailto:post@post.com">CONTENT 2</a></span>
    <span style="background: url(\'/wp-content/themes/minimum-child/img/tel.png\') 0px 2px no-repeat; padding-left: 20px;">CONTENT 3</span>
</div>';


$dom=new DOMDocument;
$dom->loadHTML( $html );

$col=$dom->getElementsByTagName('span');
$keep=array(
    'style'=>array(),
    'data' =>array(),
    'email'=>array()
);

foreach( $col as $node ){
    $keep['style'][]=str_replace( "'", "", $node->getAttribute('style') );
    $keep['data'][]=$node->nodeValue;
    if( $node->hasChildNodes() ){
        foreach( $node->childNodes as $child ){
            if( $child->nodeType==XML_ELEMENT_NODE && $child->hasAttribute('href') ) {
                list($mailto,$address)=explode(':',$child->getAttribute('href') );
                $keep['email'][]=$address;
            }
        }
    }
}
echo '<pre>',print_r($keep,true),'</pre>';


/* output
   ------

    Array
    (
        [style] => Array
            (
                [0] => background: url(/wp-content/themes/minimum-child/img/address.png) 0px 2px no-repeat; padding-left: 20px;
                [1] => background: url(/wp-content/themes/minimum-child/img/email.png) 0px 2px no-repeat; padding-left: 20px;
                [2] => background: url(/wp-content/themes/minimum-child/img/tel.png) 0px 2px no-repeat; padding-left: 20px;
            )

        [data] => Array
            (
                [0] => CONTENT 1
                [1] => CONTENT 2
                [2] => CONTENT 3
            )

        [email] => Array
            (
                [0] => post@post.com
            )

    )
*/