用BR标签替换换行符,但仅在PRE标签内

时间:2009-10-04 18:51:37

标签: php regex html-parsing

在PHP5中,用于进行此转换的良好preg_replace表达式是什么:

使用<br />替换换行符,但仅限<pre>个阻止

(随意做出简化假设,并忽略极端情况。例如,我们可以 假设标签是一行,而不是病态的东西,如)

输入文字:

<div><pre class='some class'>1
2
3
</pre>
<pre>line 1
line 2
line 3
</pre>
</div>

输出:

<div><pre>1<br />2<br />3<br /></pre>
<pre>line 1<br />line 2<br />line 3<br /></pre>
</div>

(激励上下文:尝试在维基词典语法中修改错误20760的语法高亮度扩展,找到我的PHP技能(我主要做的是python)并不符合要求)。

除了regexen之外,我对其他解决方案持开放态度,但是较小的是首选(例如,构建html解析机制是过度的)。

2 个答案:

答案 0 :(得分:6)

这样的东西?

<?php

$content = "<div><pre class='some class'>1
2
3
</pre>
<pre>line 1
line 2
line 3
</pre>
</div>
";

function getInnerHTML($Node)
{
     $Body = $Node->ownerDocument->documentElement->firstChild->firstChild;
     $Document = new DOMDocument();    
     $Document->appendChild($Document->importNode($Body,true));
     return $Document->saveHTML();
}

$dom = new DOMDocument();
$dom->loadHTML( $content );
$preElements = $dom->getElementsByTagName('pre');

if ( count( $preElements ) ) {
    foreach ( $preElements as $pre ) {
    $value = preg_replace( '/\n|\r\n/', '<br/>', $pre->nodeValue  );
    $pre->nodeValue = $value;
    }

    echo html_entity_decode( getInnerHTML( $dom->documentElement ) );
}

答案 1 :(得分:0)

基于SilentGhost所说的内容(由于某种原因这里没有显示):

<?php
$str = "<div><pre class='some class' >1
2
3
< / pre>
<pre>line 1
line 2
line 3
</pre>
</div>";

$out = "<div><pre class='some class' >1<br />2<br />3<br />< / pre>
<pre>line 1<br />line 2<br />line 3<br /></pre>
</div>";

function protect_newlines($str) {
    // \n -> <br />, but only if it's in a pre block
    // protects newlines from Parser::doBlockLevels()
    /* split on <pre ... /pre>, basically.  probably good enough */
    $str = " ".$str;  // guarantee split will be in even positions
    //$parts = preg_split('/(<pre .*  pre>)/Umsxu',$str,-1,PREG_SPLIT_DELIM_CAPTURE);
    $parts = preg_split("/(< \s* pre .* \/ \s* pre \s* >)/Umsxu",$str,-1,PREG_SPLIT_DELIM_CAPTURE);
    foreach ($parts as $idx=>$part) {
        if ($idx % 2) {
            $parts[$idx] = preg_replace("/\n/", "<br />", $part);
        }
    }
    $str = implode('',$parts);
    /* chop off the first space, that we had added */
    return substr($str,1);
}

assert(protect_newlines($str) === $out);
?>