在结束HTML标记之前添加缺少的标点符号

时间:2017-11-21 16:36:15

标签: php html regex preg-replace regex-group

我的字符串是一个HTML文档。我想在之前没有标点符号的HTML结束标记之前添加一个点。标点符号为.,?!:,我想使用preg_replace

<p>Today, not only we have so many breeds that are trained this and that.</p>

<h4><strong>We must add a dot after the closing strong</strong></h4>

<p>Hunting with your dog is a blah blah with each other.</p>

<h2>No need to change this one!</h2>

<p>Hunting with your dog is a blah blah with each other.</p>

我的功能:

$source = 'the above html';
$source = addMissingPunctuation( $source );

echo $source;

function addMissingPunctuation( $input ) {

    $tags = [ 'h1', 'h2', 'h3', 'h4', 'h5', 'h6' ];

    foreach ($tags as $tag) {

        $input = preg_replace(
            "/[^,.;!?](<\/".$tag.">)/mi",
            ".${0}",
            $input
        );

    }

    return $input;
}

我尝试了.${0}.$0.${1}.$1.\\0.\\1,但没有任何效果。充其量,它吞下了比赛,但没有用任何东西取而代之。我模式的匹配部分似乎适用于regex101和其他网站。

期望的结果是:

<p>Today, not only we have so many breeds that are trained this and that.</p>

<h4><strong>We must add a dot after the closing strong</strong>.</h4>

<p>Hunting with your dog is a blah blah with each other.</p>

<h2>No need to change this one!</h2>

<p>Hunting with your dog is a blah blah with each other.</p>

1 个答案:

答案 0 :(得分:2)

您不需要像这样迭代$tags,我要implode|进行迭代,或者在这种情况下恰好遵守规则所有可能的元素。

$source = '<p>Today, not only we have so many breeds that are trained this and that.</p>

<h4><strong>We must add a dot after the closing strong</strong></h4>

<p>Hunting with your dog is a blah blah with each other.</p>

<h2>No need to change this one!</h2>

<p>Hunting with your dog is a blah blah with each other.</p>';
$source = addMissingPunctuation( $source );
echo $source;
function addMissingPunctuation( $input ) {
    return preg_replace("/[^,.;!?]\K<\/h[1-6]>/mi", ".$0", $input);
}

演示:https://3v4l.org/6dNV7

您还需要忽略元素之前的任何字符,\K执行此操作。 ${}用于PHP变量,$0是捕获组,如果将来用\0编写,可能会更清楚。

正则表达式演示:https://regex101.com/r/xUvvuf/1/

(使用\0https://3v4l.org/jGZal

的示例

你可以采取的另一种方法是使用标点符号跳过所有元素,这会减少一些步骤。

https://regex101.com/r/xUvvuf/2/

[,.;!?]<\/h[1-6]>(*SKIP)(*FAIL)|<\/h[1-6]>

你也可以改变delimiter;这是个人偏好。如果你不介意逃避/你可以继续这样做,如果不是只交换前导和关闭/~

演示:https://regex101.com/r/xUvvuf/3/

preg_replace("~[^,.;!?]\K</h[1-6]>~mi"