Question

我在$ string中有一个文字，如this one。

我需要删除所有<empty-line>...</empty-line>，包括这些标记之间的文字。我试过用preg_replace（）做这个，但我不知道怎么写regexp模式。

编辑（添加代码）：

<span id="chapters">
   <div id="title">
    <p style="font-family: icons; font-size: 20px; padding: 5px 7px 10px 12px;">:</p>
   </div>

   <p style="display: none; width: 108px;">NOTE TO THE READER</p>

   <p style="display: none; width: 108px;">Part 1</p>
    <empty-line>
    <p>PROVENER</p>
   </empty-line>

    <p style="display: none; width: 108px;">Part 2</p>
    <empty-line>
    <p>APERT</p>
   </empty-line>

    <p style="display: none; width: 108px;">Part 3</p>
    <empty-line>
    <p>ELIGER</p>
   </empty-line>

    <p style="display: none; width: 108px;">GLOSSARY</p>

    <p style="display: none; width: 108px;">CALCA 1: Cutting the Cake</p>

    <p style="display: none; width: 108px;">CALCA 2: Hemn (Configuration) Space</p>

    <p style="display: none; width: 108px;">CALCA 3: Complex Versus Simple Protism</p>
</span>

Answer 1

HTML不够常规，无法使用正则表达式进行解析。您应该使用HTML解析器来正确解析它。您可以使用PHP的DOMDocument和DOMXpath加载HTML并删除标记及其中的所有内容：

$dom = new DOMDocument();

// suppress the warnings, load HTML and clear errors
libxml_use_internal_errors(true);
$dom->loadHTML($html);
libxml_clear_errors();

$xpath = new DOMXPath($dom);
foreach ($xpath->query('//empty-line') as $node) {
    $node->parentNode->removeChild($node);
}

echo $dom->saveHTML();

Demo.

如何从字符串中删除包含文本的标签

1 个答案: