Question

可能重复：
Cleaning HTML by removing extra/redundant formatting tags

我一直在尝试删除从HTML编辑器生成的冗余标记。这显然无法删除所有空的。我一直在看它，我无法弄明白。我可能会遗漏一些东西。

以下是代码。非常感谢ppl ..

//Check for reduntant tags
function removeRedundantTags($pathname) {
$dom = new DOMDocument();
$dom->loadHTMLFile($pathname);
$allTags = $dom->getElementsByTagName('*');
for($i = 0; $i < $allTags->length; $i++) {
    $currentTag = $allTags->item($i);
    echo "Accessed Tags: ".$currentTag->nodeName.'<br>';
    if($currentTag->hasChildNodes()) continue;
    if($currentTag->nodeName == 'br' || $currentTag->nodeName == 'img' || $currentTag->nodeName == 'meta') continue;
    if($currentTag->nodeValue == NULL) {                        
        $parentNode = $currentTag->parentNode;
        $oldChild = $parentNode->removeChild($currentTag);      
        echo "Removed Tags----: ".$oldChild->nodeName.'<br>';
    }
}   
echo "Redandant Removed<br>";
$dom->saveHTMLFile($pathname);
}

修改（已添加输出）让我们说我正在努力清理 span标签（抱歉，我无法发布HTML代码）它只是删除了一半.. 就像两个span标签一样，它只删除一个，同样适用于所有空标签

我使用的DOM结构恰好非常快，因为我将这段代码用于数百个HTML文件。因此，一些答案使用的正则表达式没有用。

Answer 1

function clean($txt)
{
    $txt=preg_replace("{(<br[\\s]*(>|\/>)\s*){2,}}i", "<br /><br />", $txt);
    $txt=preg_replace("{(<br[\\s]*(>|\/>)\s*)}i", "<br />", $txt);
    return $txt;
}

H9kDroid在How to remove redundant <br /> tags from HTML code using PHP?

中的回答

要删除的冗余标记

1 个答案: