Question

我正在编写一个应用程序，为用户提供一个简单的HTML编辑器。我面临的问题是，尽管我经常要求我的用户使用“标题2”（h2）样式来格式化标题，但他们要么使用h1（我可以处理它！），要么他们正在使用新的段落，然后为内容加粗段落。

即

<p><strong>This is a header</strong></p>
<p>Content content blah blah blah.</p>

我想要做的是找到的所有实例，其中包含少于八个字的实例，并用h2替换它们。

这样做的最佳方式是什么？

更新：感谢Jack的代码，我已经开发了一个简单的模块，可以完成我在这里描述的所有内容。 The code is here on GitHub

Answer 1

您可以使用DOMDocument。找到的{{1}}标记，计算单词数，并用＆gt;替换节点和父级：

<h2

Answer 2

由于您似乎精通PHP，因此您可能会发现PHP Simple HTML Dom Parser对此任务非常直观。这是文档中的一个片段，展示了在找到您要求的元素后更改标记名称的一种非常简单的方法：

$html = str_get_html("<div>foo <b>bar</b></div>");
$e = $html->find("div", 0);

echo $e->tag; // Returns: " div"
echo $e->outertext; // Returns: " <div>foo <b>bar</b></div>"
echo $e->innertext; // Returns: " foo <b>bar</b>"
echo $e->plaintext; // Returns: " foo bar"

Attribute Name  Usage
$e->tag     Read or write the tag name of element.
$e->outertext   Read or write the outer HTML text of element.
$e->innertext   Read or write the inner HTML text of element.
$e->plaintext   Read or write the plain text of element.

Answer 3

这是我参与过的代码。

<?php

$content_old = <<<'EOM'
<p>&nbsp; </p>
<p>lol<strong>test</strong></p>
<p><strong>This is a header</strong></p>
<p>Content content blah blah blah.</p>
EOM;

$content = preg_replace("/<p[^>]*>[\s|&nbsp;]*<\/p>/", '', $content_old);

$doc = new DOMDocument;
$doc->loadHTML($content);
$xp = new DOMXPath($doc);

foreach ($xp->query('//p/strong') as $node) {
    $parent = $node->parentNode;
    if ($parent->textContent == $node->textContent && 
            str_word_count($node->textContent) <= 8) {
        $header = $doc->createElement('h2');
        $parent->parentNode->replaceChild($header, $parent);
        $header->appendChild($doc->createTextNode( $node->textContent ));
    }
}

// just using saveXML() is not good enough, because it adds random html tags
$xp = new DOMXPath($doc);
$everything = $xp->query("body/*"); // retrieves all elements inside body tag
$output = '';
if ($everything->length > 0) { // check if it retrieved anything in there
    foreach ($everything as $thing) {
        $output .= $doc->saveXML($thing) . "\n";
    }
};

echo "--- ORIGINAL --\n\n";
echo $content_old;
echo "\n\n--- UPDATED ---\n\n";
echo $output;

当我运行脚本时，这是我得到的输出：

--- ORIGINAL --

<p>&nbsp; </p>
<p>lol<strong>test</strong></p>
<p><strong>This is a header</strong></p>
<p>Content content blah blah blah.</p>

--- UPDATED ---

<p>lol<strong>test</strong></p>
<h2>This is a header</h2>
<p>Content content blah blah blah.</p>

更新＃1

如果标记内有其他标记（例如，<a>），那么整个将被替换，这是不值得的，这不是我的意图。

通过将if更改为：

可以轻松解决此问题

        if ($parent->textContent == $node->textContent &&
                str_word_count($node->textContent) <= 8 &&
                $node->childNodes->item(0)->nodeType == XML_TEXT_NODE) {

更新＃2

还值得注意的是，如果中的内容包含应该转义的HTML字符（例如&），原始的createElement会导致问题。

旧代码是：

        $header = $doc->createElement('h2', $node->textContent);
        $parent->parentNode->replaceChild($header, $parent);

新代码（工作正常）是：

        $header = $doc->createElement('h2');
        $parent->parentNode->replaceChild($header, $parent);
        $header->appendChild($doc->createTextNode( $node->textContent ));

根据内容长度使用PHP更改标记

3 个答案:

更新＃1

更新＃2