Question

我的正则表达式：

<([a-zA-Z0-9]+)>[\na-zA-Z0-9]*<\/\1+>

我的字符串：

<div>
<f>
</f>
</div>

结果是：

array(2
  0 =>  array(1
  0 =>  <f>
</f>
)
1   =>  array(1
0   =>  f
)
)

为什么它会抓取<f></f>，而忽略第一个<div>？

Answer 1

答案是使用A PARSER INSTEAD （抱歉我的喊叫）。虽然使用正则表达式来获取ID或URL字符串有时会更快，但是html标记需要通过正则表达式来理解错误。考虑下面的代码，是不是比具有特殊含义的德鲁伊字符更美观？

<?php
$str = "
<container>
    <div class='someclass' data='somedata'>
        <f>some content here</f>
    </div>
</container>";
$xml = simplexml_load_string($str);

echo $xml->div->f; // some content here
$attributes = $xml->div->attributes();
print_r($attributes); // class and data as keys
?>

Answer 2

我想这是因为你的第二个字符类语句试图在结束标记到来之前找到0个或更多字符，但这与<div>...</div>块不匹配。

PHP正则表达式解析HTML

2 个答案: