替换除<pre> tags with brackets(&lt;&gt;) inside <pre> tags

时间:2018-07-25 04:20:44

标签: regex regex-negation regex-greedy

I replaced all the line breaks outside pre tags using the answer available in the question.

\n(?![^<]*<\/pre>)

It was working fine until the content in pre tag had < or > brackets.

For example, with input of:

<p>Test contennt for regex
with line breaks</p>
<pre>code block 
with multi line content
working fine</pre>
<pre class="brush:C#">
test line break before 
open paranthesis < is not working fine
line breaks after paranthesis
is accepted
</pre>

Output is

<p>Test contennt for regexwith line breaks</p><pre>code block 
with multi line content
working fine</pre><pre class="brush:C#">test line break before open paranthesis < is not working fine
line breaks after paranthesis
is accepted
</pre>

which is not correct - not all line breaks are removed.

See this regex101.

2 个答案:

答案 0 :(得分:1)

Try this:

/\n(?=((?!<\/pre).)*?(<pre|$))/sg

The idea is to have a big lookahead. The

((?!<\/pre).)*?

repeatedly matches any character (including newlines with the .), and it is followed by

(<pre|$)

to require that the aforementioned character is not the < in </pre. Then, match either <pre (indicating that the original newline was not inside a <pre, or match the end of the file.

https://regex101.com/r/cjZQO9/2

With input of

<p>Test contennt for regex
with line breaks</p>
<pre>code block 
with multi line content
working fine</pre>
text
more text
<pre class="brush:C#">
test line break before 
open paranthesis < is not working fine
line breaks after paranthesis
is accepted
</pre>
text

output is

<p>Test contennt for regexwith line breaks</p><pre>code block 
with multi line content
working fine</pre>textmore text<pre class="brush:C#">
test line break before 
open paranthesis < is not working fine
line breaks after paranthesis
is accepted
</pre>text

答案 1 :(得分:0)

如果使用pcre,还可以(*SKIP)标签

/<pre.*?<\/pre>(*SKIP)(*F)|\n/s

See a demo at regex101