Question

从字符串中删除空html标记的最快方法是什么？

我编写了类似这样的东西来检测空锚标签：

                        $temp = strip_tags($string, "<blockquote><a>");
                        $cmatch = array();
                        if(preg_match_all("~<a.*><\/a>~iU", $temp, $cmatch, PREG_SET_ORDER))
                        {
                            foreach($cmatch as $cm)
                            {
                                foreach($cm as $t) //echo htmlentities($t)."<br />";
                                $temp = trim(str_replace($t, '', $temp));
                            }
                        }

                        if(!empty($temp))
                        {
                            echo '<div class="c" style="margin-top:20px;">';
                            echo $temp;
                            echo '</div>';
                        }
                        //do not output if empty tags (problem with div margin)

必须能够更有效地做到这一点。将字符串转换为html DOM并在那里进行检查会更快吗？

Answer 1

Regular expressions are not the right tool for parsing HTML.

作为一个非特定的答案，我强烈建议使用DOM解析库来完成此任务。列举一些会使正则表达式成为噩梦的陷阱：

您可以抓住<a></a>代码，但是会抓住<a />代码吗？
以下p标记是否为空？：<p><a></a></p>如果是，您的代码会抓住它吗？如果没有，那么在你有足够的信心抓住它们之前，你需要在字符串上运行多少次传递？
你会抓住没有正确关闭的标签吗？
你会抓住重叠的标签吗？

使用PHP检查和删除空标记

1 个答案: