正则表达式根据自定义标签拆分一些html内容

时间:2013-05-08 12:47:31

标签: php regex

我需要根据自定义的html标签拆分我的HTML。

这就是我的html的样子:

<div>
    <div id="header">
        <h1>Document Title</h1>
    </div>

    <div id="content">
        <p>Lorem ipsum dolar sit</p>
        <magicheader type="2" class="someClass">Header</magicheader>
        <p>Lorem ipsum dolar sit</p>
        <span><magicheader type="3" class="someClass">Header</magicheader></span>
    </div>

    <div id="footer">

    </div>
</div>

这就是我需要的:

Array
(
    [0] => <div>
    <div id="header">
        <h1>Document Title</h1>
    </div>

    <div id="content">
        <p>Lorem ipsum dolar sit</p>
    [1] => <magicheader type="2" class="someClass">Header</magicheader>
    [2] => <p>Lorem ipsum dolar sit</p>
        <span>
    [3] => <magicheader type="3" class="someClass">Header</magicheader>
    [4] => </span>
    </div>

    <div id="footer">

    </div>
</div>
)

有人可以帮我解决这个模式吗?

1 个答案:

答案 0 :(得分:1)

您需要将preg_splitPREG_SPLIT_DELIM_CAPTURE

一起使用
$text=<<<EOD
<div>
    <div id="header">
        <h1>Document Title</h1>
    </div>

    <div id="content">
        <p>Lorem ipsum dolar sit</p>
        <magicheader type="2" class="someClass">Header</magicheader>
        <p>Lorem ipsum dolar sit</p>
        <span><magicheader type="3" class="someClass">Header</magicheader></span>
    </div>

    <div id="footer">

    </div>
</div>
EOD;

$regexp = '%(<magicheader [^>]*>Header</magicheader>)%';
$value = preg_split($regexp, $text, -1, PREG_SPLIT_DELIM_CAPTURE);

然后print_r($value)输出:

Array
(
    [0] => <div>
    <div id="header">
        <h1>Document Title</h1>
    </div>

    <div id="content">
        <p>Lorem ipsum dolar sit</p>

    [1] => <magicheader type="2" class="someClass">Header</magicheader>
    [2] => 
        <p>Lorem ipsum dolar sit</p>
        <span>
    [3] => <magicheader type="3" class="someClass">Header</magicheader>
    [4] => </span>
    </div>

    <div id="footer">

    </div>
</div>
)