Question

我想将html标签添加到PHP中的HTML字符串中，例如：

<h2><b>Hello World</b></h2>
<p>First</p>
Second
<p>Third</p>

第二没有包装任何html元素，因此系统会在其中添加p标签，这是预期结果：

<h2><b>Hello World</b></h2>
<p>First</p>
<p>Second</p>
<p>Third</p>

尝试使用PHP Simple HTML DOM解析器，但不知道如何处理它，这是我的想法示例：

function htmlParser($html)
{
    foreach ($html->childNodes() as $node) {
        if ($node->childNodes()) {
            htmlParser($node);
        }
        // Ideally: add p tag to node innertext if it does not wrapped with any tag
    }

    return $html;
}

但是childNode不会循环进入 Second ，因为它里面没有任何元素，并且不建议使用 regex 处理html标签，对此有什么想法吗？

非常感谢，谢谢。

Answer 1

使用一种愚蠢的方法来解决此问题，这是我的代码：

function addPTag($html)
{
    $contents = preg_split("/(<\/.*?>)/", $html, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
    foreach ($contents as &$content) {
        if (substr($content, 0, 1) != '<') {
            $chars = preg_split("/(<)/", $content, -1, PREG_SPLIT_DELIM_CAPTURE | PREG_SPLIT_NO_EMPTY);
            $chars[0] = '<p>' . $chars[0] . '</p>';
            $content = implode($chars);
        }
    }

    return implode($contents);
}

希望，还有别的优雅的方法，谢谢。

Answer 2

这是一个很酷的问题，因为它促进了对DoM的思考。

我提出了一个问题How do HTML Parsers process untagged text，该问题得到了 @sideshowbarker 的慷慨评论，这使我思考并提高了对DoM（尤其是文本节点）的了解。

以下是一种基于DoM的方法，用于查找候选文本节点并将其填充“ p”标签。我们应该单独放置许多文本节点，例如用于格式化的空格，回车符和换行符（“ uglifier”可能会去除它们）。

<?php

$html = file_get_contents("nodeTest.html"); // read the test file
$dom = new domDocument;            // a new dom object
$dom->loadHTML($html);             // build the DoM
$bodyNodes = $dom->getElementsByTagName('body');  // returns DOMNodeList object
foreach($bodyNodes[0]->childNodes as $child)      // assuming 1 <body> node
{
    $text="";
    // this tests for an untagged text node that has more than non-formatting characters
    if ( ($child->nodeType == 3) && ( strlen( $text = trim($child->nodeValue)) > 0 ) )
    { // its a candidate for adding tags
        $newText = "<p>".$text."</p>";  
        echo str_replace($text,$newText,$child->nodeValue);
    }
    else
    {   // not a candidate for adding tags
        echo $dom->saveHTML($child);
    }
}

nodeTest.html包含了这个。

<!DOCTYPE HTML> 
<html>
<body>
    <h2><b>Hello World</b></h2>
    <p>First</p>
    Second
    <p>Third</p>
    fourth
    <p>Third</p>
    <!-- comment -->
</body>
</html>

，结果是这样的。。。。请注意，注释和格式已正确处理。

<h2><b>Hello World</b></h2>
<p>First</p>
<p>Second</p>
<p>Third</p>
<p>fourth</p>
<p>Third</p>
<!-- comment -->

很明显，如果您想使DoM更通用，则需要遍历DoM并在每个元素节点处重复搜索/替换。在此示例中，我们仅在 Body 节点处停止，并处理每个直接子节点。

我不确定100％的代码是否是最有效的，我可能会对此进行更多思考，如果我发现更好的方法，请进行更新。

Answer 3

您可以尝试Simple HTML Dom Parser

$stringHtml = 'Your received html';

$html = str_get_html(stringHtml);

//Find necessary element and edit it
$exampleText = $html->find('Your selector here', 0)->last_child()->innertext

将html标记添加到PHP中的字符串

3 个答案: