使用正则表达式(或任何其他方式)匹配基本HTML

时间:2013-07-15 22:26:59

标签: php html regex preg-match preg-match-all

我有一些HTML如下:

    <b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)
    <b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
    ...
    <b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)

现在,使用php,我想拆分它并制作两个数组,如下所示:

数组1 - (这将包含<b>代码中的所有内容)

    [0] -> <b>This is a title: </b>
    [1] -> <b>Some more text: </b>
    ...
    [n] -> <b>Hello world!: </b>

数组2 - (这将之外的所有内容<{1}}标记)

<b>

我尝试使用正则表达式和 [0] -> 0091 + Two + 423 + Four + (Five, Six, Seven) [1] -> Abc + Hi + Random + Text + (Hello, 522, Four) ... [n] -> Test + Foo + 1122 + (120, 122, Four) ,但我似乎无法弄清楚它们。任何帮助将受到高度赞赏。

谢谢!

2 个答案:

答案 0 :(得分:1)

<?php 
$string = '    <b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)
    <b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
    ...
    <b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)';
preg_match_all("#(<b>[^<]+<\/b>)([^<]+)#", $string, $matches);
print_r($matches);
?> 

输出:

Array
(
    [0] => Array
        (
            [0] => <b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)

            [1] => <b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
    ...

            [2] => <b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)
        )

    [1] => Array
        (
            [0] => <b>This is a title: </b>
            [1] => <b>Some more text: </b>
            [2] => <b>Hello world!: </b>
        )

    [2] => Array
        (
            [0] =>  0091 + Two + 423 + Four + (Five, Six, Seven)

            [1] =>  Abc + Hi + Random + Text + (Hello, 522, Four)
    ...

            [2] =>  Test + Foo + 1122 + (120, 122, Four)
        )

)

答案 1 :(得分:1)

你可以试试这个:

<pre>
<?php

$subject =<<<LOD
<b>This is a title: </b> 0091 + Two + 423 + Four + (Five, Six, Seven)
<b>Some more text: </b> Abc + Hi + Random + Text + (Hello, 522, Four)
<b>Hello world!: </b> Test + Foo + 1122 + (120, 122, Four)
LOD;

$pattern = '~(<b>.*?</b>)((?>[^<]+|<(?!b))*)~';
preg_match_all($pattern, $subject, $matches);

array_shift($matches);
array_walk_recursive($matches,function (&$val){ $val=trim($val); });
list($array1, $array2) = $matches;

print_r($array1);
print_r($array2);